-- Logs begin at Mon 2019-12-09 06:12:24 PST, end at Tue 2020-03-17 14:16:01 PDT. -- Mar 08 14:05:10 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 23 previous similar messages Mar 08 14:06:49 fir-io7-s1 kernel: LNetError: 49627:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 08 14:06:49 fir-io7-s1 kernel: LNetError: 49627:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 231 previous similar messages Mar 08 14:08:01 fir-io7-s1 kernel: LNetError: 50372:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 08 14:08:01 fir-io7-s1 kernel: LNetError: 50372:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 351 previous similar messages Mar 08 14:13:13 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 1 seconds Mar 08 14:13:13 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 447 previous similar messages Mar 08 14:15:17 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 Mar 08 14:15:17 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 13 previous similar messages Mar 08 14:16:49 fir-io7-s1 kernel: LNetError: 49779:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 08 14:16:49 fir-io7-s1 kernel: LNetError: 49779:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 223 previous similar messages Mar 08 14:18:03 fir-io7-s1 kernel: LNetError: 50372:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 08 14:18:03 fir-io7-s1 kernel: LNetError: 50372:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 361 previous similar messages Mar 08 14:23:15 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.227@o2ib7: 7 seconds Mar 08 14:23:15 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 491 previous similar messages Mar 08 14:25:20 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 Mar 08 14:25:20 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 21 previous similar messages Mar 08 14:26:54 fir-io7-s1 kernel: LNetError: 50372:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 08 14:26:54 fir-io7-s1 kernel: LNetError: 50372:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 217 previous similar messages Mar 08 14:28:05 fir-io7-s1 kernel: LNetError: 51103:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 08 14:28:05 fir-io7-s1 kernel: LNetError: 51103:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 340 previous similar messages Mar 08 14:33:25 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 1 seconds Mar 08 14:33:25 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 593 previous similar messages Mar 08 14:35:22 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 Mar 08 14:35:22 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 9 previous similar messages Mar 08 14:36:59 fir-io7-s1 kernel: LNetError: 51103:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 08 14:36:59 fir-io7-s1 kernel: LNetError: 51103:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 236 previous similar messages Mar 08 14:38:07 fir-io7-s1 kernel: LNetError: 51456:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 08 14:38:07 fir-io7-s1 kernel: LNetError: 51456:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 346 previous similar messages Mar 08 14:43:26 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.227@o2ib7: 6 seconds Mar 08 14:43:26 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 479 previous similar messages Mar 08 14:45:22 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 Mar 08 14:45:22 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 13 previous similar messages Mar 08 14:46:59 fir-io7-s1 kernel: LNetError: 51456:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 08 14:46:59 fir-io7-s1 kernel: LNetError: 51456:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 216 previous similar messages Mar 08 14:48:09 fir-io7-s1 kernel: LNetError: 51807:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 08 14:48:09 fir-io7-s1 kernel: LNetError: 51807:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 342 previous similar messages Mar 08 14:53:35 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 2 seconds Mar 08 14:53:35 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 402 previous similar messages Mar 08 14:55:27 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 Mar 08 14:55:27 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 14 previous similar messages Mar 08 14:56:59 fir-io7-s1 kernel: LNetError: 51880:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 08 14:56:59 fir-io7-s1 kernel: LNetError: 51880:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 227 previous similar messages Mar 08 14:58:09 fir-io7-s1 kernel: LNetError: 51807:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 08 14:58:09 fir-io7-s1 kernel: LNetError: 51807:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 334 previous similar messages Mar 08 15:03:36 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 0 seconds Mar 08 15:03:36 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 444 previous similar messages Mar 08 15:05:27 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 Mar 08 15:05:27 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 13 previous similar messages Mar 08 15:06:59 fir-io7-s1 kernel: LNetError: 52199:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 08 15:06:59 fir-io7-s1 kernel: LNetError: 52199:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 225 previous similar messages Mar 08 15:08:13 fir-io7-s1 kernel: LNetError: 52199:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 08 15:08:13 fir-io7-s1 kernel: LNetError: 52199:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 363 previous similar messages Mar 08 15:13:38 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.219@o2ib7: 0 seconds Mar 08 15:13:38 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 495 previous similar messages Mar 08 15:15:28 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 Mar 08 15:15:28 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages Mar 08 15:17:04 fir-io7-s1 kernel: LNetError: 52199:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 08 15:17:04 fir-io7-s1 kernel: LNetError: 52199:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 249 previous similar messages Mar 08 15:18:16 fir-io7-s1 kernel: LNetError: 52199:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 08 15:18:16 fir-io7-s1 kernel: LNetError: 52199:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 363 previous similar messages Mar 08 15:24:08 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.233@o2ib7: 1 seconds Mar 08 15:24:08 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 395 previous similar messages Mar 08 15:25:32 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 Mar 08 15:25:32 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 12 previous similar messages Mar 08 15:27:04 fir-io7-s1 kernel: LNetError: 52199:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 08 15:27:04 fir-io7-s1 kernel: LNetError: 52199:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 210 previous similar messages Mar 08 15:28:18 fir-io7-s1 kernel: LNetError: 52199:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 08 15:28:18 fir-io7-s1 kernel: LNetError: 52199:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 426 previous similar messages Mar 08 15:34:10 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 0 seconds Mar 08 15:34:10 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 330 previous similar messages Mar 08 15:35:32 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 Mar 08 15:35:32 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 16 previous similar messages Mar 08 15:37:04 fir-io7-s1 kernel: LNetError: 53375:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 08 15:37:04 fir-io7-s1 kernel: LNetError: 53375:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 191 previous similar messages Mar 08 15:38:19 fir-io7-s1 kernel: LNetError: 53576:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 08 15:38:19 fir-io7-s1 kernel: LNetError: 53576:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 467 previous similar messages Mar 08 15:44:30 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 0 seconds Mar 08 15:44:30 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 88 previous similar messages Mar 08 15:45:32 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 Mar 08 15:45:32 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages Mar 08 15:46:19 fir-io7-s1 kernel: LustreError: 40801:0:(tgt_grant.c:758:tgt_grant_check()) fir-OST0052: cli 65fedb94-cd76-4 claims 28672 GRANT, real grant 24576 Mar 08 15:47:04 fir-io7-s1 kernel: LNetError: 52402:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 08 15:47:04 fir-io7-s1 kernel: LNetError: 52402:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 171 previous similar messages Mar 08 15:48:19 fir-io7-s1 kernel: LNetError: 53764:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 08 15:48:19 fir-io7-s1 kernel: LNetError: 53764:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 474 previous similar messages Mar 08 15:54:49 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 0 seconds Mar 08 15:54:49 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 116 previous similar messages Mar 08 15:55:37 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 Mar 08 15:55:37 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 12 previous similar messages Mar 08 15:57:09 fir-io7-s1 kernel: LNetError: 54218:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 08 15:57:09 fir-io7-s1 kernel: LNetError: 54218:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 167 previous similar messages Mar 08 15:58:23 fir-io7-s1 kernel: LNetError: 53975:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 08 15:58:23 fir-io7-s1 kernel: LNetError: 53975:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 470 previous similar messages Mar 08 16:05:15 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 0 seconds Mar 08 16:05:15 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 84 previous similar messages Mar 08 16:05:37 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 Mar 08 16:05:37 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 17 previous similar messages Mar 08 16:07:09 fir-io7-s1 kernel: LNetError: 53975:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 08 16:07:09 fir-io7-s1 kernel: LNetError: 53975:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 167 previous similar messages Mar 08 16:08:24 fir-io7-s1 kernel: LNetError: 53975:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 08 16:08:24 fir-io7-s1 kernel: LNetError: 53975:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 474 previous similar messages Mar 08 16:15:18 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 0 seconds Mar 08 16:15:18 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 112 previous similar messages Mar 08 16:15:42 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 Mar 08 16:15:42 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 13 previous similar messages Mar 08 16:17:14 fir-io7-s1 kernel: LNetError: 54353:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 08 16:17:14 fir-io7-s1 kernel: LNetError: 54353:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 173 previous similar messages Mar 08 16:18:28 fir-io7-s1 kernel: LNetError: 54882:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 08 16:18:28 fir-io7-s1 kernel: LNetError: 54882:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 471 previous similar messages Mar 08 16:25:39 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 1 seconds Mar 08 16:25:39 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 123 previous similar messages Mar 08 16:25:42 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 Mar 08 16:25:42 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 16 previous similar messages Mar 08 16:27:14 fir-io7-s1 kernel: LNetError: 55089:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 08 16:27:14 fir-io7-s1 kernel: LNetError: 55089:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 184 previous similar messages Mar 08 16:28:29 fir-io7-s1 kernel: LNetError: 54882:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 08 16:28:29 fir-io7-s1 kernel: LNetError: 54882:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 473 previous similar messages Mar 08 16:35:40 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.232@o2ib7: 0 seconds Mar 08 16:35:40 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 70 previous similar messages Mar 08 16:35:42 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 Mar 08 16:35:42 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 14 previous similar messages Mar 08 16:37:14 fir-io7-s1 kernel: LNetError: 55302:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 08 16:37:14 fir-io7-s1 kernel: LNetError: 55302:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 167 previous similar messages Mar 08 16:38:32 fir-io7-s1 kernel: LNetError: 55405:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 08 16:38:32 fir-io7-s1 kernel: LNetError: 55405:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 473 previous similar messages Mar 08 16:45:41 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.233@o2ib7: 0 seconds Mar 08 16:45:41 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 192 previous similar messages Mar 08 16:45:45 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 Mar 08 16:45:45 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 13 previous similar messages Mar 08 16:47:19 fir-io7-s1 kernel: LNetError: 55405:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 08 16:47:19 fir-io7-s1 kernel: LNetError: 55405:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 181 previous similar messages Mar 08 16:48:33 fir-io7-s1 kernel: LNetError: 56062:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 08 16:48:33 fir-io7-s1 kernel: LNetError: 56062:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 474 previous similar messages Mar 08 16:55:45 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.238@o2ib7: -125 Mar 08 16:55:45 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 14 previous similar messages Mar 08 16:56:40 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds Mar 08 16:56:40 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 108 previous similar messages Mar 08 16:57:19 fir-io7-s1 kernel: LNetError: 56062:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 08 16:57:19 fir-io7-s1 kernel: LNetError: 56062:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 169 previous similar messages Mar 08 16:58:34 fir-io7-s1 kernel: LNetError: 56412:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 08 16:58:34 fir-io7-s1 kernel: LNetError: 56412:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 475 previous similar messages Mar 08 17:05:48 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 Mar 08 17:05:48 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages Mar 08 17:07:10 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.217@o2ib7: 1 seconds Mar 08 17:07:10 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 83 previous similar messages Mar 08 17:07:24 fir-io7-s1 kernel: LNetError: 56642:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 08 17:07:24 fir-io7-s1 kernel: LNetError: 56642:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 170 previous similar messages Mar 08 17:08:38 fir-io7-s1 kernel: LNetError: 56642:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 08 17:08:38 fir-io7-s1 kernel: LNetError: 56642:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 470 previous similar messages Mar 08 17:15:50 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.234@o2ib7: -125 Mar 08 17:15:50 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 14 previous similar messages Mar 08 17:17:24 fir-io7-s1 kernel: LNetError: 56642:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 08 17:17:24 fir-io7-s1 kernel: LNetError: 56642:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 170 previous similar messages Mar 08 17:17:50 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.226@o2ib7: 0 seconds Mar 08 17:17:50 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 155 previous similar messages Mar 08 17:18:39 fir-io7-s1 kernel: LNetError: 57127:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 08 17:18:39 fir-io7-s1 kernel: LNetError: 57127:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 474 previous similar messages Mar 08 17:25:51 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.234@o2ib7: -125 Mar 08 17:25:51 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 13 previous similar messages Mar 08 17:27:24 fir-io7-s1 kernel: LNetError: 57127:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 08 17:27:24 fir-io7-s1 kernel: LNetError: 57127:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 171 previous similar messages Mar 08 17:28:10 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.226@o2ib7: 0 seconds Mar 08 17:28:10 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 87 previous similar messages Mar 08 17:28:39 fir-io7-s1 kernel: LNetError: 57474:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 08 17:28:39 fir-io7-s1 kernel: LNetError: 57474:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 472 previous similar messages Mar 08 17:35:56 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.237@o2ib7: -125 Mar 08 17:35:56 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages Mar 08 17:37:29 fir-io7-s1 kernel: LNetError: 57676:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 08 17:37:29 fir-io7-s1 kernel: LNetError: 57676:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 169 previous similar messages Mar 08 17:38:24 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.238@o2ib7: 0 seconds Mar 08 17:38:24 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 87 previous similar messages Mar 08 17:38:44 fir-io7-s1 kernel: LNetError: 57676:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 08 17:38:44 fir-io7-s1 kernel: LNetError: 57676:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 470 previous similar messages Mar 08 17:45:56 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.234@o2ib7: -125 Mar 08 17:45:56 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 9 previous similar messages Mar 08 17:47:29 fir-io7-s1 kernel: LNetError: 57874:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 08 17:47:29 fir-io7-s1 kernel: LNetError: 57874:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 172 previous similar messages Mar 08 17:48:24 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.237@o2ib7: 0 seconds Mar 08 17:48:24 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 176 previous similar messages Mar 08 17:48:44 fir-io7-s1 kernel: LNetError: 58184:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 08 17:48:44 fir-io7-s1 kernel: LNetError: 58184:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 473 previous similar messages Mar 08 17:55:57 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 Mar 08 17:55:57 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 13 previous similar messages Mar 08 17:57:29 fir-io7-s1 kernel: LNetError: 58184:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 08 17:57:29 fir-io7-s1 kernel: LNetError: 58184:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 173 previous similar messages Mar 08 17:58:35 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 0 seconds Mar 08 17:58:35 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 82 previous similar messages Mar 08 17:58:44 fir-io7-s1 kernel: LNetError: 58184:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 08 17:58:44 fir-io7-s1 kernel: LNetError: 58184:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 469 previous similar messages Mar 08 18:06:00 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.237@o2ib7: -125 Mar 08 18:06:00 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 11 previous similar messages Mar 08 18:07:34 fir-io7-s1 kernel: LNetError: 58807:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 08 18:07:34 fir-io7-s1 kernel: LNetError: 58807:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 177 previous similar messages Mar 08 18:08:35 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 1 seconds Mar 08 18:08:35 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 73 previous similar messages Mar 08 18:08:49 fir-io7-s1 kernel: LNetError: 58807:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 08 18:08:49 fir-io7-s1 kernel: LNetError: 58807:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 470 previous similar messages Mar 08 18:16:02 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 Mar 08 18:16:02 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 13 previous similar messages Mar 08 18:17:34 fir-io7-s1 kernel: LNetError: 59259:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 08 18:17:34 fir-io7-s1 kernel: LNetError: 59259:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 182 previous similar messages Mar 08 18:18:39 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.233@o2ib7: 0 seconds Mar 08 18:18:39 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 172 previous similar messages Mar 08 18:18:49 fir-io7-s1 kernel: LNetError: 59259:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 08 18:18:49 fir-io7-s1 kernel: LNetError: 59259:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 471 previous similar messages Mar 08 18:23:45 fir-io7-s1 kernel: LustreError: 40237:0:(tgt_grant.c:758:tgt_grant_check()) fir-OST004a: cli c404f1c3-1f95-4 claims 28672 GRANT, real grant 0 Mar 08 18:26:07 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.238@o2ib7: -125 Mar 08 18:26:07 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages Mar 08 18:27:34 fir-io7-s1 kernel: LNetError: 59487:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 08 18:27:34 fir-io7-s1 kernel: LNetError: 59487:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 176 previous similar messages Mar 08 18:28:41 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.232@o2ib7: 1 seconds Mar 08 18:28:41 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 72 previous similar messages Mar 08 18:28:49 fir-io7-s1 kernel: LNetError: 59487:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 08 18:28:49 fir-io7-s1 kernel: LNetError: 59487:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 471 previous similar messages Mar 08 18:36:07 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.238@o2ib7: -125 Mar 08 18:36:07 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 20 previous similar messages Mar 08 18:37:35 fir-io7-s1 kernel: LNetError: 59487:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 08 18:37:35 fir-io7-s1 kernel: LNetError: 59487:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 172 previous similar messages Mar 08 18:38:50 fir-io7-s1 kernel: LNetError: 59959:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 08 18:38:50 fir-io7-s1 kernel: LNetError: 59959:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 471 previous similar messages Mar 08 18:39:20 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.236@o2ib7: 0 seconds Mar 08 18:39:20 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 79 previous similar messages Mar 08 18:46:13 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 Mar 08 18:46:13 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 11 previous similar messages Mar 08 18:47:35 fir-io7-s1 kernel: LNetError: 57381:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 08 18:47:35 fir-io7-s1 kernel: LNetError: 57381:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 182 previous similar messages Mar 08 18:48:50 fir-io7-s1 kernel: LNetError: 60224:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 08 18:48:50 fir-io7-s1 kernel: LNetError: 60224:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 471 previous similar messages Mar 08 18:49:36 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.237@o2ib7: 0 seconds Mar 08 18:49:36 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 184 previous similar messages Mar 08 18:56:13 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 Mar 08 18:56:13 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 13 previous similar messages Mar 08 18:57:00 fir-io7-s1 kernel: LustreError: 40788:0:(tgt_grant.c:758:tgt_grant_check()) fir-OST004a: cli 932170e3-8d55-4 claims 16728064 GRANT, real grant 0 Mar 08 18:57:35 fir-io7-s1 kernel: LNetError: 60224:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 08 18:57:35 fir-io7-s1 kernel: LNetError: 60224:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 177 previous similar messages Mar 08 18:58:50 fir-io7-s1 kernel: LNetError: 56548:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 08 18:58:50 fir-io7-s1 kernel: LNetError: 56548:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 475 previous similar messages Mar 08 19:00:26 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.225@o2ib7: 0 seconds Mar 08 19:00:26 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 83 previous similar messages Mar 08 19:03:10 fir-io7-s1 kernel: Lustre: fir-OST004a: Connection restored to fbed62e0-c2c3-4 (at 10.50.6.35@o2ib2) Mar 08 19:03:10 fir-io7-s1 kernel: Lustre: fir-OST004e: Connection restored to fbed62e0-c2c3-4 (at 10.50.6.35@o2ib2) Mar 08 19:03:10 fir-io7-s1 kernel: Lustre: Skipped 3 previous similar messages Mar 08 19:06:13 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 Mar 08 19:06:13 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 16 previous similar messages Mar 08 19:07:40 fir-io7-s1 kernel: LNetError: 60678:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 08 19:07:40 fir-io7-s1 kernel: LNetError: 60678:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 172 previous similar messages Mar 08 19:08:55 fir-io7-s1 kernel: LNetError: 60666:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 08 19:08:55 fir-io7-s1 kernel: LNetError: 60666:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 470 previous similar messages Mar 08 19:10:40 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.235@o2ib7: 0 seconds Mar 08 19:10:40 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 63 previous similar messages Mar 08 19:16:18 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 Mar 08 19:16:18 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 13 previous similar messages Mar 08 19:17:40 fir-io7-s1 kernel: LNetError: 61333:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 08 19:17:40 fir-io7-s1 kernel: LNetError: 61333:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 176 previous similar messages Mar 08 19:18:55 fir-io7-s1 kernel: LNetError: 61086:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 08 19:18:55 fir-io7-s1 kernel: LNetError: 61086:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 470 previous similar messages Mar 08 19:20:41 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.237@o2ib7: 0 seconds Mar 08 19:20:41 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 224 previous similar messages Mar 08 19:26:18 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 Mar 08 19:26:18 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 16 previous similar messages Mar 08 19:27:40 fir-io7-s1 kernel: LNetError: 59913:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 08 19:27:40 fir-io7-s1 kernel: LNetError: 59913:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 176 previous similar messages Mar 08 19:28:55 fir-io7-s1 kernel: LNetError: 61086:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 08 19:28:55 fir-io7-s1 kernel: LNetError: 61086:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 471 previous similar messages Mar 08 19:30:42 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.233@o2ib7: 1 seconds Mar 08 19:30:42 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 92 previous similar messages Mar 08 19:36:18 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 Mar 08 19:36:18 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages Mar 08 19:37:41 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 08 19:37:41 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 171 previous similar messages Mar 08 19:39:00 fir-io7-s1 kernel: LNetError: 61086:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 08 19:39:00 fir-io7-s1 kernel: LNetError: 61086:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 471 previous similar messages Mar 08 19:41:40 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.217@o2ib7: 0 seconds Mar 08 19:41:40 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 84 previous similar messages Mar 08 19:46:21 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.238@o2ib7: -125 Mar 08 19:46:21 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 14 previous similar messages Mar 08 19:47:45 fir-io7-s1 kernel: LNetError: 62136:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 08 19:47:45 fir-io7-s1 kernel: LNetError: 62136:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 169 previous similar messages Mar 08 19:49:00 fir-io7-s1 kernel: LNetError: 62435:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 08 19:49:00 fir-io7-s1 kernel: LNetError: 62435:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 471 previous similar messages Mar 08 19:51:41 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.217@o2ib7: 0 seconds Mar 08 19:51:41 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 193 previous similar messages Mar 08 19:56:21 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.234@o2ib7: -125 Mar 08 19:56:21 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages Mar 08 19:57:45 fir-io7-s1 kernel: LNetError: 62677:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 08 19:57:45 fir-io7-s1 kernel: LNetError: 62677:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 174 previous similar messages Mar 08 19:59:00 fir-io7-s1 kernel: LNetError: 62677:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 08 19:59:00 fir-io7-s1 kernel: LNetError: 62677:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 471 previous similar messages Mar 08 20:02:01 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.217@o2ib7: 0 seconds Mar 08 20:02:01 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 80 previous similar messages Mar 08 20:06:23 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 Mar 08 20:06:23 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages Mar 08 20:07:45 fir-io7-s1 kernel: LNetError: 62894:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 08 20:07:45 fir-io7-s1 kernel: LNetError: 62894:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 178 previous similar messages Mar 08 20:09:00 fir-io7-s1 kernel: LNetError: 62894:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 08 20:09:00 fir-io7-s1 kernel: LNetError: 62894:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 470 previous similar messages Mar 08 20:12:41 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.232@o2ib7: 0 seconds Mar 08 20:12:41 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 70 previous similar messages Mar 08 20:16:27 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.237@o2ib7: -125 Mar 08 20:16:27 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 13 previous similar messages Mar 08 20:17:50 fir-io7-s1 kernel: LNetError: 63345:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 08 20:17:50 fir-io7-s1 kernel: LNetError: 63345:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 180 previous similar messages Mar 08 20:18:21 fir-io7-s1 kernel: Lustre: fir-OST004c: haven't heard from client d68f67f8-cb2d-4 (at 10.50.9.27@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6991e5b000, cur 1583723901 expire 1583723751 last 1583723674 Mar 08 20:18:21 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 08 20:19:05 fir-io7-s1 kernel: LNetError: 63315:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 08 20:19:05 fir-io7-s1 kernel: LNetError: 63315:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 470 previous similar messages Mar 08 20:19:42 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 2f501a16-5689-4 (at 10.50.9.27@o2ib2) Mar 08 20:19:42 fir-io7-s1 kernel: Lustre: Skipped 3 previous similar messages Mar 08 20:23:10 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.217@o2ib7: 0 seconds Mar 08 20:23:10 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 74 previous similar messages Mar 08 20:26:28 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 Mar 08 20:26:28 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages Mar 08 20:27:50 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 08 20:27:50 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 167 previous similar messages Mar 08 20:29:05 fir-io7-s1 kernel: LNetError: 63741:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 08 20:29:05 fir-io7-s1 kernel: LNetError: 63741:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 467 previous similar messages Mar 08 20:33:15 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.227@o2ib7: 0 seconds Mar 08 20:33:15 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 351 previous similar messages Mar 08 20:36:33 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 Mar 08 20:36:33 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages Mar 08 20:38:00 fir-io7-s1 kernel: LNetError: 64111:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 08 20:38:00 fir-io7-s1 kernel: LNetError: 64111:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 219 previous similar messages Mar 08 20:39:10 fir-io7-s1 kernel: LNetError: 64111:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 08 20:39:10 fir-io7-s1 kernel: LNetError: 64111:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 448 previous similar messages Mar 08 20:43:21 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 1 seconds Mar 08 20:43:21 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 434 previous similar messages Mar 08 20:46:33 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 Mar 08 20:46:33 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages Mar 08 20:48:10 fir-io7-s1 kernel: LNetError: 64510:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 08 20:48:10 fir-io7-s1 kernel: LNetError: 64510:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 198 previous similar messages Mar 08 20:49:20 fir-io7-s1 kernel: LNetError: 64510:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 08 20:49:20 fir-io7-s1 kernel: LNetError: 64510:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 401 previous similar messages Mar 08 20:53:26 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.235@o2ib7: 5 seconds Mar 08 20:53:26 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 589 previous similar messages Mar 08 20:56:39 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 Mar 08 20:56:39 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 16 previous similar messages Mar 08 20:58:10 fir-io7-s1 kernel: LNetError: 64975:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 08 20:58:10 fir-io7-s1 kernel: LNetError: 64975:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 179 previous similar messages Mar 08 20:59:20 fir-io7-s1 kernel: LNetError: 64510:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 08 20:59:20 fir-io7-s1 kernel: LNetError: 64510:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 368 previous similar messages Mar 08 21:03:36 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 4 seconds Mar 08 21:03:36 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 476 previous similar messages Mar 08 21:08:10 fir-io7-s1 kernel: LNetError: 65078:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 08 21:08:10 fir-io7-s1 kernel: LNetError: 65078:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 173 previous similar messages Mar 08 21:09:25 fir-io7-s1 kernel: LNetError: 65410:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 08 21:09:25 fir-io7-s1 kernel: LNetError: 65410:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 356 previous similar messages Mar 08 21:11:36 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 Mar 08 21:11:36 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 22 previous similar messages Mar 08 21:12:55 fir-io7-s1 kernel: LustreError: 90873:0:(tgt_grant.c:758:tgt_grant_check()) fir-OST0050: cli b6e74d6f-535a-4 claims 28672 GRANT, real grant 0 Mar 08 21:13:49 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.227@o2ib7: 0 seconds Mar 08 21:13:49 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 354 previous similar messages Mar 08 21:18:10 fir-io7-s1 kernel: LNetError: 58908:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 08 21:18:10 fir-io7-s1 kernel: LNetError: 58908:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 227 previous similar messages Mar 08 21:19:25 fir-io7-s1 kernel: LNetError: 65691:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 08 21:19:25 fir-io7-s1 kernel: LNetError: 65691:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 474 previous similar messages Mar 08 21:21:37 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 Mar 08 21:21:37 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages Mar 08 21:23:50 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 0 seconds Mar 08 21:23:50 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 154 previous similar messages Mar 08 21:25:25 fir-io7-s1 kernel: LustreError: 84741:0:(tgt_grant.c:758:tgt_grant_check()) fir-OST0050: cli e225f3d7-7aff-4 claims 2125824 GRANT, real grant 65536 Mar 08 21:28:10 fir-io7-s1 kernel: LNetError: 65691:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 08 21:28:10 fir-io7-s1 kernel: LNetError: 65691:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 174 previous similar messages Mar 08 21:29:25 fir-io7-s1 kernel: LNetError: 66128:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 08 21:29:25 fir-io7-s1 kernel: LNetError: 66128:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 473 previous similar messages Mar 08 21:31:41 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 Mar 08 21:31:41 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages Mar 08 21:34:06 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.227@o2ib7: 1 seconds Mar 08 21:34:06 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 79 previous similar messages Mar 08 21:38:15 fir-io7-s1 kernel: LNetError: 66128:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 08 21:38:15 fir-io7-s1 kernel: LNetError: 66128:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 175 previous similar messages Mar 08 21:39:30 fir-io7-s1 kernel: LNetError: 66128:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 08 21:39:30 fir-io7-s1 kernel: LNetError: 66128:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 472 previous similar messages Mar 08 21:41:41 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 Mar 08 21:41:41 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages Mar 08 21:44:10 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.226@o2ib7: 0 seconds Mar 08 21:44:10 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 336 previous similar messages Mar 08 21:48:15 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 08 21:48:15 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 187 previous similar messages Mar 08 21:49:30 fir-io7-s1 kernel: LNetError: 66824:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 08 21:49:30 fir-io7-s1 kernel: LNetError: 66824:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 473 previous similar messages Mar 08 21:51:48 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 Mar 08 21:51:48 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 13 previous similar messages Mar 08 21:54:15 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 0 seconds Mar 08 21:54:15 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 152 previous similar messages Mar 08 21:58:15 fir-io7-s1 kernel: LNetError: 66824:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 08 21:58:15 fir-io7-s1 kernel: LNetError: 66824:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 209 previous similar messages Mar 08 21:59:30 fir-io7-s1 kernel: LNetError: 66824:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 08 21:59:30 fir-io7-s1 kernel: LNetError: 66824:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 473 previous similar messages Mar 08 22:01:48 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 Mar 08 22:01:48 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 16 previous similar messages Mar 08 22:04:41 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.232@o2ib7: 0 seconds Mar 08 22:04:41 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 74 previous similar messages Mar 08 22:08:15 fir-io7-s1 kernel: LNetError: 66824:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 08 22:08:15 fir-io7-s1 kernel: LNetError: 66824:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 170 previous similar messages Mar 08 22:09:30 fir-io7-s1 kernel: LNetError: 66824:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 08 22:09:30 fir-io7-s1 kernel: LNetError: 66824:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 473 previous similar messages Mar 08 22:11:48 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 Mar 08 22:11:48 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 13 previous similar messages Mar 08 22:14:45 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 0 seconds Mar 08 22:14:45 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 72 previous similar messages Mar 08 22:18:16 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 08 22:18:16 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 175 previous similar messages Mar 08 22:19:35 fir-io7-s1 kernel: LNetError: 66824:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 08 22:19:35 fir-io7-s1 kernel: LNetError: 66824:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 474 previous similar messages Mar 08 22:21:51 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.234@o2ib7: -125 Mar 08 22:21:51 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages Mar 08 22:23:44 fir-io7-s1 kernel: LustreError: 68381:0:(tgt_grant.c:758:tgt_grant_check()) fir-OST0050: cli 4683044c-87cf-4 claims 28672 GRANT, real grant 0 Mar 08 22:24:56 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 0 seconds Mar 08 22:24:56 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 247 previous similar messages Mar 08 22:28:20 fir-io7-s1 kernel: LNetError: 66824:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 08 22:28:20 fir-io7-s1 kernel: LNetError: 66824:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 180 previous similar messages Mar 08 22:29:35 fir-io7-s1 kernel: LNetError: 68439:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 08 22:29:35 fir-io7-s1 kernel: LNetError: 68439:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 474 previous similar messages Mar 08 22:31:52 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.234@o2ib7: -125 Mar 08 22:31:52 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 8 previous similar messages Mar 08 22:35:05 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.232@o2ib7: 0 seconds Mar 08 22:35:05 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 160 previous similar messages Mar 08 22:38:20 fir-io7-s1 kernel: LNetError: 68439:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 08 22:38:20 fir-io7-s1 kernel: LNetError: 68439:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 186 previous similar messages Mar 08 22:39:35 fir-io7-s1 kernel: LNetError: 69217:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 08 22:39:35 fir-io7-s1 kernel: LNetError: 69217:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 474 previous similar messages Mar 08 22:41:58 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.238@o2ib7: -125 Mar 08 22:41:58 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 14 previous similar messages Mar 08 22:45:19 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 0 seconds Mar 08 22:45:19 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 90 previous similar messages Mar 08 22:48:20 fir-io7-s1 kernel: LNetError: 69217:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 08 22:48:20 fir-io7-s1 kernel: LNetError: 69217:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 186 previous similar messages Mar 08 22:49:35 fir-io7-s1 kernel: LNetError: 69579:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 08 22:49:35 fir-io7-s1 kernel: LNetError: 69579:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 471 previous similar messages Mar 08 22:51:58 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.238@o2ib7: -125 Mar 08 22:51:58 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 19 previous similar messages Mar 08 22:56:50 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 0 seconds Mar 08 22:56:50 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 85 previous similar messages Mar 08 22:58:20 fir-io7-s1 kernel: LNetError: 69834:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 08 22:58:20 fir-io7-s1 kernel: LNetError: 69834:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 173 previous similar messages Mar 08 22:59:36 fir-io7-s1 kernel: LNetError: 69834:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 08 22:59:36 fir-io7-s1 kernel: LNetError: 69834:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 472 previous similar messages Mar 08 23:02:03 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.238@o2ib7: -125 Mar 08 23:02:03 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 10 previous similar messages Mar 08 23:06:55 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.219@o2ib7: 0 seconds Mar 08 23:06:55 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 192 previous similar messages Mar 08 23:08:25 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 08 23:08:25 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 175 previous similar messages Mar 08 23:09:40 fir-io7-s1 kernel: LNetError: 70229:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 08 23:09:40 fir-io7-s1 kernel: LNetError: 70229:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 474 previous similar messages Mar 08 23:12:03 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.238@o2ib7: -125 Mar 08 23:12:03 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 19 previous similar messages Mar 08 23:17:06 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.232@o2ib7: 0 seconds Mar 08 23:17:06 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 161 previous similar messages Mar 08 23:18:25 fir-io7-s1 kernel: LNetError: 70229:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 08 23:18:25 fir-io7-s1 kernel: LNetError: 70229:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 196 previous similar messages Mar 08 23:19:40 fir-io7-s1 kernel: LNetError: 58908:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 08 23:19:40 fir-io7-s1 kernel: LNetError: 58908:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 473 previous similar messages Mar 08 23:22:03 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.234@o2ib7: -125 Mar 08 23:22:03 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 13 previous similar messages Mar 08 23:28:01 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 54 seconds Mar 08 23:28:01 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 87 previous similar messages Mar 08 23:28:25 fir-io7-s1 kernel: LNetError: 70229:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 08 23:28:25 fir-io7-s1 kernel: LNetError: 70229:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 173 previous similar messages Mar 08 23:29:40 fir-io7-s1 kernel: LNetError: 70229:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 08 23:29:40 fir-io7-s1 kernel: LNetError: 70229:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 474 previous similar messages Mar 08 23:32:08 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 Mar 08 23:32:08 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 11 previous similar messages Mar 08 23:38:25 fir-io7-s1 kernel: LNetError: 70229:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 08 23:38:25 fir-io7-s1 kernel: LNetError: 70229:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 175 previous similar messages Mar 08 23:38:41 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 0 seconds Mar 08 23:38:41 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 53 previous similar messages Mar 08 23:39:40 fir-io7-s1 kernel: LNetError: 70229:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 08 23:39:40 fir-io7-s1 kernel: LNetError: 70229:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 474 previous similar messages Mar 08 23:42:13 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.237@o2ib7: -125 Mar 08 23:42:13 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 20 previous similar messages Mar 08 23:46:37 fir-io7-s1 kernel: LustreError: 68360:0:(tgt_grant.c:758:tgt_grant_check()) fir-OST0050: cli 124e16c3-5aff-4 claims 28672 GRANT, real grant 0 Mar 08 23:48:30 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 08 23:48:30 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 168 previous similar messages Mar 08 23:48:50 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 0 seconds Mar 08 23:48:50 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 182 previous similar messages Mar 08 23:49:45 fir-io7-s1 kernel: LNetError: 71528:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 08 23:49:45 fir-io7-s1 kernel: LNetError: 71528:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 474 previous similar messages Mar 08 23:52:13 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 Mar 08 23:52:13 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 10 previous similar messages Mar 08 23:58:30 fir-io7-s1 kernel: LNetError: 71768:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 08 23:58:30 fir-io7-s1 kernel: LNetError: 71768:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 186 previous similar messages Mar 08 23:59:09 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 0 seconds Mar 08 23:59:09 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 185 previous similar messages Mar 08 23:59:45 fir-io7-s1 kernel: LNetError: 72100:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 08 23:59:45 fir-io7-s1 kernel: LNetError: 72100:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 473 previous similar messages Mar 09 00:02:13 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 Mar 09 00:02:13 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 16 previous similar messages Mar 09 00:05:53 fir-io7-s1 kernel: LustreError: 68775:0:(tgt_grant.c:758:tgt_grant_check()) fir-OST0048: cli 088d9725-dc40-4 claims 9842688 GRANT, real grant 0 Mar 09 00:08:30 fir-io7-s1 kernel: LNetError: 72100:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 00:08:30 fir-io7-s1 kernel: LNetError: 72100:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 186 previous similar messages Mar 09 00:09:11 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 0 seconds Mar 09 00:09:11 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 86 previous similar messages Mar 09 00:09:45 fir-io7-s1 kernel: LNetError: 72100:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 09 00:09:45 fir-io7-s1 kernel: LNetError: 72100:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 474 previous similar messages Mar 09 00:12:13 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 Mar 09 00:12:13 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages Mar 09 00:12:19 fir-io7-s1 kernel: LustreError: 68880:0:(tgt_grant.c:758:tgt_grant_check()) fir-OST004a: cli a9fb14eb-f20e-4 claims 28672 GRANT, real grant 0 Mar 09 00:18:30 fir-io7-s1 kernel: LNetError: 58908:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 00:18:30 fir-io7-s1 kernel: LNetError: 58908:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 174 previous similar messages Mar 09 00:19:20 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.219@o2ib7: 0 seconds Mar 09 00:19:20 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 67 previous similar messages Mar 09 00:19:45 fir-io7-s1 kernel: LNetError: 58908:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 09 00:19:45 fir-io7-s1 kernel: LNetError: 58908:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 474 previous similar messages Mar 09 00:22:16 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.238@o2ib7: -125 Mar 09 00:22:16 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 12 previous similar messages Mar 09 00:28:35 fir-io7-s1 kernel: LNetError: 72787:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 00:28:35 fir-io7-s1 kernel: LNetError: 72787:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 167 previous similar messages Mar 09 00:29:25 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 0 seconds Mar 09 00:29:25 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 186 previous similar messages Mar 09 00:29:50 fir-io7-s1 kernel: LNetError: 73175:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 09 00:29:50 fir-io7-s1 kernel: LNetError: 73175:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 474 previous similar messages Mar 09 00:32:17 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.234@o2ib7: -125 Mar 09 00:32:17 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 14 previous similar messages Mar 09 00:38:35 fir-io7-s1 kernel: LNetError: 73175:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 00:38:35 fir-io7-s1 kernel: LNetError: 73175:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 192 previous similar messages Mar 09 00:39:41 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 0 seconds Mar 09 00:39:41 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 115 previous similar messages Mar 09 00:39:50 fir-io7-s1 kernel: LNetError: 73524:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 09 00:39:50 fir-io7-s1 kernel: LNetError: 73524:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 474 previous similar messages Mar 09 00:42:18 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 Mar 09 00:42:18 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages Mar 09 00:48:35 fir-io7-s1 kernel: LNetError: 73524:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 00:48:35 fir-io7-s1 kernel: LNetError: 73524:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 180 previous similar messages Mar 09 00:49:50 fir-io7-s1 kernel: LNetError: 73912:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 09 00:49:50 fir-io7-s1 kernel: LNetError: 73912:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 474 previous similar messages Mar 09 00:50:45 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 0 seconds Mar 09 00:50:45 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 99 previous similar messages Mar 09 00:52:21 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.237@o2ib7: -125 Mar 09 00:52:21 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 13 previous similar messages Mar 09 00:58:39 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 00:58:39 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 173 previous similar messages Mar 09 00:59:55 fir-io7-s1 kernel: LNetError: 74104:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 09 00:59:55 fir-io7-s1 kernel: LNetError: 74104:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 474 previous similar messages Mar 09 01:01:29 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 0 seconds Mar 09 01:01:29 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 84 previous similar messages Mar 09 01:02:23 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 Mar 09 01:02:23 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages Mar 09 01:08:40 fir-io7-s1 kernel: LNetError: 74297:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 01:08:40 fir-io7-s1 kernel: LNetError: 74297:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 178 previous similar messages Mar 09 01:09:55 fir-io7-s1 kernel: LNetError: 74635:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 09 01:09:55 fir-io7-s1 kernel: LNetError: 74635:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 474 previous similar messages Mar 09 01:11:41 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 0 seconds Mar 09 01:11:41 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 243 previous similar messages Mar 09 01:12:28 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.238@o2ib7: -125 Mar 09 01:12:28 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 13 previous similar messages Mar 09 01:18:40 fir-io7-s1 kernel: LNetError: 74908:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 01:18:40 fir-io7-s1 kernel: LNetError: 74908:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 178 previous similar messages Mar 09 01:19:55 fir-io7-s1 kernel: LNetError: 74635:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 09 01:19:55 fir-io7-s1 kernel: LNetError: 74635:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 474 previous similar messages Mar 09 01:21:50 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.219@o2ib7: 0 seconds Mar 09 01:21:50 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 88 previous similar messages Mar 09 01:22:28 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.238@o2ib7: -125 Mar 09 01:22:28 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 19 previous similar messages Mar 09 01:28:40 fir-io7-s1 kernel: LNetError: 73247:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 01:28:40 fir-io7-s1 kernel: LNetError: 73247:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 177 previous similar messages Mar 09 01:29:55 fir-io7-s1 kernel: LNetError: 75024:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 09 01:29:55 fir-io7-s1 kernel: LNetError: 75024:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 474 previous similar messages Mar 09 01:32:20 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.219@o2ib7: 0 seconds Mar 09 01:32:20 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 65 previous similar messages Mar 09 01:32:33 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 Mar 09 01:32:33 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 11 previous similar messages Mar 09 01:38:45 fir-io7-s1 kernel: LNetError: 75436:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 01:38:45 fir-io7-s1 kernel: LNetError: 75436:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 173 previous similar messages Mar 09 01:40:00 fir-io7-s1 kernel: LNetError: 75678:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 09 01:40:00 fir-io7-s1 kernel: LNetError: 75678:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 473 previous similar messages Mar 09 01:42:23 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.219@o2ib7: 0 seconds Mar 09 01:42:23 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 199 previous similar messages Mar 09 01:42:36 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 Mar 09 01:42:36 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 21 previous similar messages Mar 09 01:48:45 fir-io7-s1 kernel: LNetError: 73247:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 01:48:45 fir-io7-s1 kernel: LNetError: 73247:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 186 previous similar messages Mar 09 01:50:00 fir-io7-s1 kernel: LNetError: 75965:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 09 01:50:00 fir-io7-s1 kernel: LNetError: 75965:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 473 previous similar messages Mar 09 01:52:30 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.227@o2ib7: 0 seconds Mar 09 01:52:30 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 151 previous similar messages Mar 09 01:52:38 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.237@o2ib7: -125 Mar 09 01:52:38 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 11 previous similar messages Mar 09 01:58:45 fir-io7-s1 kernel: LNetError: 75965:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 01:58:45 fir-io7-s1 kernel: LNetError: 75965:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 193 previous similar messages Mar 09 02:00:00 fir-io7-s1 kernel: LNetError: 76386:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 09 02:00:00 fir-io7-s1 kernel: LNetError: 76386:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 474 previous similar messages Mar 09 02:02:35 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 0 seconds Mar 09 02:02:35 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 86 previous similar messages Mar 09 02:02:38 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 Mar 09 02:02:38 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 10 previous similar messages Mar 09 02:08:45 fir-io7-s1 kernel: LNetError: 76386:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 02:08:45 fir-io7-s1 kernel: LNetError: 76386:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 188 previous similar messages Mar 09 02:10:00 fir-io7-s1 kernel: LNetError: 76757:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 09 02:10:00 fir-io7-s1 kernel: LNetError: 76757:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 473 previous similar messages Mar 09 02:12:38 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 Mar 09 02:12:38 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 17 previous similar messages Mar 09 02:12:42 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.232@o2ib7: 1 seconds Mar 09 02:12:42 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 85 previous similar messages Mar 09 02:18:45 fir-io7-s1 kernel: LNetError: 76757:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 02:18:45 fir-io7-s1 kernel: LNetError: 76757:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 199 previous similar messages Mar 09 02:20:00 fir-io7-s1 kernel: LNetError: 76757:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 09 02:20:00 fir-io7-s1 kernel: LNetError: 76757:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 472 previous similar messages Mar 09 02:22:38 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 Mar 09 02:22:38 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 13 previous similar messages Mar 09 02:22:50 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.227@o2ib7: 1 seconds Mar 09 02:22:50 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 75 previous similar messages Mar 09 02:28:50 fir-io7-s1 kernel: LNetError: 76757:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 02:28:50 fir-io7-s1 kernel: LNetError: 76757:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 183 previous similar messages Mar 09 02:30:05 fir-io7-s1 kernel: LNetError: 76757:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 09 02:30:05 fir-io7-s1 kernel: LNetError: 76757:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 471 previous similar messages Mar 09 02:32:41 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.234@o2ib7: -125 Mar 09 02:32:41 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages Mar 09 02:33:40 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 0 seconds Mar 09 02:33:40 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 83 previous similar messages Mar 09 02:38:50 fir-io7-s1 kernel: LNetError: 76757:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 02:38:50 fir-io7-s1 kernel: LNetError: 76757:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 173 previous similar messages Mar 09 02:40:05 fir-io7-s1 kernel: LNetError: 77805:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 09 02:40:05 fir-io7-s1 kernel: LNetError: 77805:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 467 previous similar messages Mar 09 02:42:43 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 Mar 09 02:42:43 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages Mar 09 02:43:41 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 0 seconds Mar 09 02:43:41 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 422 previous similar messages Mar 09 02:48:50 fir-io7-s1 kernel: LNetError: 78096:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 02:48:50 fir-io7-s1 kernel: LNetError: 78096:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 212 previous similar messages Mar 09 02:50:05 fir-io7-s1 kernel: LNetError: 78096:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 09 02:50:05 fir-io7-s1 kernel: LNetError: 78096:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 453 previous similar messages Mar 09 02:52:48 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 Mar 09 02:52:48 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages Mar 09 02:53:49 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 0 seconds Mar 09 02:53:49 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 234 previous similar messages Mar 09 02:58:50 fir-io7-s1 kernel: LNetError: 77091:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 02:58:50 fir-io7-s1 kernel: LNetError: 77091:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 235 previous similar messages Mar 09 03:00:05 fir-io7-s1 kernel: LNetError: 78441:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 09 03:00:05 fir-io7-s1 kernel: LNetError: 78441:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 415 previous similar messages Mar 09 03:02:48 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 Mar 09 03:02:48 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages Mar 09 03:03:55 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.238@o2ib7: 0 seconds Mar 09 03:03:55 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 214 previous similar messages Mar 09 03:08:50 fir-io7-s1 kernel: LNetError: 77091:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 03:08:50 fir-io7-s1 kernel: LNetError: 77091:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 242 previous similar messages Mar 09 03:10:10 fir-io7-s1 kernel: LNetError: 78634:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 09 03:10:10 fir-io7-s1 kernel: LNetError: 78634:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 409 previous similar messages Mar 09 03:12:48 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 Mar 09 03:12:48 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages Mar 09 03:14:02 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.237@o2ib7: 0 seconds Mar 09 03:14:02 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 281 previous similar messages Mar 09 03:18:50 fir-io7-s1 kernel: LNetError: 78934:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 03:18:50 fir-io7-s1 kernel: LNetError: 78934:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 274 previous similar messages Mar 09 03:20:10 fir-io7-s1 kernel: LNetError: 78934:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 09 03:20:10 fir-io7-s1 kernel: LNetError: 78934:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 448 previous similar messages Mar 09 03:22:51 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.238@o2ib7: -125 Mar 09 03:22:51 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 11 previous similar messages Mar 09 03:24:06 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 0 seconds Mar 09 03:24:06 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 463 previous similar messages Mar 09 03:28:50 fir-io7-s1 kernel: LNetError: 58908:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 03:28:50 fir-io7-s1 kernel: LNetError: 58908:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 274 previous similar messages Mar 09 03:30:10 fir-io7-s1 kernel: LNetError: 79321:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 09 03:30:10 fir-io7-s1 kernel: LNetError: 79321:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 440 previous similar messages Mar 09 03:32:51 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 Mar 09 03:32:51 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages Mar 09 03:34:18 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.227@o2ib7: 0 seconds Mar 09 03:34:18 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 401 previous similar messages Mar 09 03:38:50 fir-io7-s1 kernel: LNetError: 79321:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 03:38:50 fir-io7-s1 kernel: LNetError: 79321:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 242 previous similar messages Mar 09 03:40:10 fir-io7-s1 kernel: LNetError: 79924:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 09 03:40:10 fir-io7-s1 kernel: LNetError: 79924:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 370 previous similar messages Mar 09 03:42:58 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 Mar 09 03:42:58 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages Mar 09 03:44:21 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 3 seconds Mar 09 03:44:21 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 223 previous similar messages Mar 09 03:48:51 fir-io7-s1 kernel: LNetError: 79924:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 03:48:51 fir-io7-s1 kernel: LNetError: 79924:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 227 previous similar messages Mar 09 03:50:11 fir-io7-s1 kernel: LNetError: 80299:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 09 03:50:11 fir-io7-s1 kernel: LNetError: 80299:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 358 previous similar messages Mar 09 03:52:59 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 Mar 09 03:52:59 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 14 previous similar messages Mar 09 03:54:30 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 2 seconds Mar 09 03:54:30 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 250 previous similar messages Mar 09 03:58:55 fir-io7-s1 kernel: LNetError: 80299:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 03:58:55 fir-io7-s1 kernel: LNetError: 80299:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 241 previous similar messages Mar 09 04:00:15 fir-io7-s1 kernel: LNetError: 80299:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 09 04:00:15 fir-io7-s1 kernel: LNetError: 80299:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 351 previous similar messages Mar 09 04:03:04 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 Mar 09 04:03:04 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 12 previous similar messages Mar 09 04:04:40 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 5 seconds Mar 09 04:04:40 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 306 previous similar messages Mar 09 04:09:00 fir-io7-s1 kernel: LNetError: 80894:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 04:09:00 fir-io7-s1 kernel: LNetError: 80894:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 239 previous similar messages Mar 09 04:10:25 fir-io7-s1 kernel: LNetError: 80894:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 09 04:10:25 fir-io7-s1 kernel: LNetError: 80894:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 352 previous similar messages Mar 09 04:13:06 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 Mar 09 04:13:06 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 19 previous similar messages Mar 09 04:14:50 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 4 seconds Mar 09 04:14:50 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 481 previous similar messages Mar 09 04:19:05 fir-io7-s1 kernel: LNetError: 81378:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 04:19:05 fir-io7-s1 kernel: LNetError: 81378:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 209 previous similar messages Mar 09 04:20:35 fir-io7-s1 kernel: LNetError: 81378:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 09 04:20:35 fir-io7-s1 kernel: LNetError: 81378:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 345 previous similar messages Mar 09 04:23:09 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 Mar 09 04:23:09 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 11 previous similar messages Mar 09 04:25:05 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.217@o2ib7: 5 seconds Mar 09 04:25:05 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 394 previous similar messages Mar 09 04:29:05 fir-io7-s1 kernel: LNetError: 81378:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 04:29:05 fir-io7-s1 kernel: LNetError: 81378:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 222 previous similar messages Mar 09 04:30:45 fir-io7-s1 kernel: LNetError: 81729:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 09 04:30:45 fir-io7-s1 kernel: LNetError: 81729:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 352 previous similar messages Mar 09 04:33:10 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 Mar 09 04:33:10 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 17 previous similar messages Mar 09 04:35:06 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 1 seconds Mar 09 04:35:06 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 539 previous similar messages Mar 09 04:39:05 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 04:39:05 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 212 previous similar messages Mar 09 04:40:45 fir-io7-s1 kernel: LNetError: 81729:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 09 04:40:45 fir-io7-s1 kernel: LNetError: 81729:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 341 previous similar messages Mar 09 04:43:13 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 Mar 09 04:43:13 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages Mar 09 04:45:07 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 8 seconds Mar 09 04:45:07 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 830 previous similar messages Mar 09 04:49:05 fir-io7-s1 kernel: LNetError: 82134:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 04:49:05 fir-io7-s1 kernel: LNetError: 82134:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 162 previous similar messages Mar 09 04:50:50 fir-io7-s1 kernel: LNetError: 82429:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 09 04:50:50 fir-io7-s1 kernel: LNetError: 82429:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 344 previous similar messages Mar 09 04:53:13 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.237@o2ib7: -125 Mar 09 04:53:13 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 13 previous similar messages Mar 09 04:55:10 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.227@o2ib7: 0 seconds Mar 09 04:55:10 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 564 previous similar messages Mar 09 04:59:05 fir-io7-s1 kernel: LNetError: 82429:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 04:59:05 fir-io7-s1 kernel: LNetError: 82429:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 206 previous similar messages Mar 09 05:00:50 fir-io7-s1 kernel: LNetError: 82774:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 09 05:00:50 fir-io7-s1 kernel: LNetError: 82774:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 397 previous similar messages Mar 09 05:03:13 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 Mar 09 05:03:13 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 17 previous similar messages Mar 09 05:05:15 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.226@o2ib7: 0 seconds Mar 09 05:05:15 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 266 previous similar messages Mar 09 05:09:10 fir-io7-s1 kernel: LNetError: 82774:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 05:09:10 fir-io7-s1 kernel: LNetError: 82774:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 252 previous similar messages Mar 09 05:10:50 fir-io7-s1 kernel: LNetError: 82774:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 09 05:10:50 fir-io7-s1 kernel: LNetError: 82774:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 378 previous similar messages Mar 09 05:13:13 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 Mar 09 05:13:13 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 12 previous similar messages Mar 09 05:15:21 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds Mar 09 05:15:21 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 256 previous similar messages Mar 09 05:19:10 fir-io7-s1 kernel: LNetError: 83428:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 05:19:10 fir-io7-s1 kernel: LNetError: 83428:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 253 previous similar messages Mar 09 05:21:00 fir-io7-s1 kernel: LNetError: 83428:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 09 05:21:00 fir-io7-s1 kernel: LNetError: 83428:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 377 previous similar messages Mar 09 05:23:19 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 Mar 09 05:23:19 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 14 previous similar messages Mar 09 05:25:32 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 0 seconds Mar 09 05:25:32 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 482 previous similar messages Mar 09 05:29:10 fir-io7-s1 kernel: LNetError: 83043:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 05:29:10 fir-io7-s1 kernel: LNetError: 83043:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 239 previous similar messages Mar 09 05:31:05 fir-io7-s1 kernel: LNetError: 83813:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 09 05:31:05 fir-io7-s1 kernel: LNetError: 83813:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 354 previous similar messages Mar 09 05:33:21 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 Mar 09 05:33:21 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 22 previous similar messages Mar 09 05:35:36 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.227@o2ib7: 3 seconds Mar 09 05:35:36 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 467 previous similar messages Mar 09 05:39:10 fir-io7-s1 kernel: LNetError: 83988:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 05:39:10 fir-io7-s1 kernel: LNetError: 83988:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 250 previous similar messages Mar 09 05:41:07 fir-io7-s1 kernel: LNetError: 84192:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 09 05:41:07 fir-io7-s1 kernel: LNetError: 84192:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 351 previous similar messages Mar 09 05:43:23 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.239@o2ib7: -125 Mar 09 05:43:23 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 9 previous similar messages Mar 09 05:45:37 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.232@o2ib7: 2 seconds Mar 09 05:45:37 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 455 previous similar messages Mar 09 05:49:10 fir-io7-s1 kernel: LNetError: 84192:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 05:49:10 fir-io7-s1 kernel: LNetError: 84192:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 220 previous similar messages Mar 09 05:51:09 fir-io7-s1 kernel: LNetError: 84192:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 09 05:51:09 fir-io7-s1 kernel: LNetError: 84192:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 338 previous similar messages Mar 09 05:53:23 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 Mar 09 05:53:23 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 16 previous similar messages Mar 09 05:55:44 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 1 seconds Mar 09 05:55:44 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 223 previous similar messages Mar 09 05:59:10 fir-io7-s1 kernel: LNetError: 84691:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 05:59:10 fir-io7-s1 kernel: LNetError: 84691:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 239 previous similar messages Mar 09 06:01:11 fir-io7-s1 kernel: LNetError: 84897:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 09 06:01:11 fir-io7-s1 kernel: LNetError: 84897:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 352 previous similar messages Mar 09 06:03:28 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 Mar 09 06:03:28 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages Mar 09 06:05:46 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds Mar 09 06:05:46 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 265 previous similar messages Mar 09 06:09:10 fir-io7-s1 kernel: LNetError: 85186:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 06:09:10 fir-io7-s1 kernel: LNetError: 85186:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 252 previous similar messages Mar 09 06:11:13 fir-io7-s1 kernel: LNetError: 85186:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 09 06:11:13 fir-io7-s1 kernel: LNetError: 85186:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 350 previous similar messages Mar 09 06:13:28 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 Mar 09 06:13:28 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 12 previous similar messages Mar 09 06:15:50 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.236@o2ib7: 0 seconds Mar 09 06:15:50 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 295 previous similar messages Mar 09 06:19:14 fir-io7-s1 kernel: LNetError: 85425:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 06:19:14 fir-io7-s1 kernel: LNetError: 85425:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 214 previous similar messages Mar 09 06:21:15 fir-io7-s1 kernel: LNetError: 85425:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 09 06:21:15 fir-io7-s1 kernel: LNetError: 85425:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 341 previous similar messages Mar 09 06:23:28 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 Mar 09 06:23:28 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 14 previous similar messages Mar 09 06:25:51 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.232@o2ib7: 6 seconds Mar 09 06:25:51 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 533 previous similar messages Mar 09 06:29:15 fir-io7-s1 kernel: LNetError: 85940:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 06:29:15 fir-io7-s1 kernel: LNetError: 85940:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 216 previous similar messages Mar 09 06:31:17 fir-io7-s1 kernel: LNetError: 85716:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 09 06:31:17 fir-io7-s1 kernel: LNetError: 85716:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 334 previous similar messages Mar 09 06:33:33 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 Mar 09 06:33:33 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 13 previous similar messages Mar 09 06:35:56 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.232@o2ib7: 5 seconds Mar 09 06:35:56 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 484 previous similar messages Mar 09 06:39:18 fir-io7-s1 kernel: LNetError: 85716:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 06:39:18 fir-io7-s1 kernel: LNetError: 85716:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 238 previous similar messages Mar 09 06:41:19 fir-io7-s1 kernel: LNetError: 85716:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 09 06:41:19 fir-io7-s1 kernel: LNetError: 85716:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 360 previous similar messages Mar 09 06:43:33 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 Mar 09 06:43:33 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages Mar 09 06:45:57 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.237@o2ib7: 0 seconds Mar 09 06:45:57 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 488 previous similar messages Mar 09 06:49:20 fir-io7-s1 kernel: LNetError: 85716:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 06:49:20 fir-io7-s1 kernel: LNetError: 85716:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 227 previous similar messages Mar 09 06:51:20 fir-io7-s1 kernel: LNetError: 58908:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 09 06:51:20 fir-io7-s1 kernel: LNetError: 58908:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 339 previous similar messages Mar 09 06:53:33 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 Mar 09 06:53:33 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages Mar 09 06:56:06 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.219@o2ib7: 5 seconds Mar 09 06:56:06 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 405 previous similar messages Mar 09 06:59:20 fir-io7-s1 kernel: LNetError: 86690:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 06:59:20 fir-io7-s1 kernel: LNetError: 86690:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 220 previous similar messages Mar 09 07:01:20 fir-io7-s1 kernel: LNetError: 87037:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 09 07:01:20 fir-io7-s1 kernel: LNetError: 87037:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 335 previous similar messages Mar 09 07:03:39 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 Mar 09 07:03:39 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 12 previous similar messages Mar 09 07:06:07 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.217@o2ib7: 6 seconds Mar 09 07:06:07 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 472 previous similar messages Mar 09 07:09:24 fir-io7-s1 kernel: LNetError: 87037:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 07:09:24 fir-io7-s1 kernel: LNetError: 87037:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 217 previous similar messages Mar 09 07:11:25 fir-io7-s1 kernel: LNetError: 87037:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 09 07:11:25 fir-io7-s1 kernel: LNetError: 87037:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 346 previous similar messages Mar 09 07:13:41 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 Mar 09 07:13:41 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 20 previous similar messages Mar 09 07:16:11 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.232@o2ib7: 0 seconds Mar 09 07:16:11 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 453 previous similar messages Mar 09 07:19:25 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 07:19:25 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 208 previous similar messages Mar 09 07:21:27 fir-io7-s1 kernel: LNetError: 87500:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 09 07:21:27 fir-io7-s1 kernel: LNetError: 87500:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 336 previous similar messages Mar 09 07:23:43 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 Mar 09 07:23:43 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 11 previous similar messages Mar 09 07:26:26 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 10 seconds Mar 09 07:26:26 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 348 previous similar messages Mar 09 07:29:25 fir-io7-s1 kernel: LNetError: 87500:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 07:29:25 fir-io7-s1 kernel: LNetError: 87500:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 223 previous similar messages Mar 09 07:31:29 fir-io7-s1 kernel: LNetError: 88137:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 09 07:31:29 fir-io7-s1 kernel: LNetError: 88137:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 345 previous similar messages Mar 09 07:33:45 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 Mar 09 07:33:45 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 17 previous similar messages Mar 09 07:36:31 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 5 seconds Mar 09 07:36:31 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 406 previous similar messages Mar 09 07:38:23 fir-io7-s1 kernel: LustreError: 68849:0:(tgt_grant.c:758:tgt_grant_check()) fir-OST004e: cli 8b6b6a33-9ab5-4 claims 28672 GRANT, real grant 0 Mar 09 07:39:25 fir-io7-s1 kernel: LNetError: 88137:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 07:39:25 fir-io7-s1 kernel: LNetError: 88137:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 206 previous similar messages Mar 09 07:41:31 fir-io7-s1 kernel: LNetError: 88137:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 09 07:41:31 fir-io7-s1 kernel: LNetError: 88137:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 336 previous similar messages Mar 09 07:43:48 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 Mar 09 07:43:48 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 11 previous similar messages Mar 09 07:46:41 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds Mar 09 07:46:41 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 465 previous similar messages Mar 09 07:49:25 fir-io7-s1 kernel: LNetError: 88413:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 07:49:25 fir-io7-s1 kernel: LNetError: 88413:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 239 previous similar messages Mar 09 07:49:58 fir-io7-s1 kernel: LustreError: 68854:0:(tgt_grant.c:758:tgt_grant_check()) fir-OST0052: cli 65fedb94-cd76-4 claims 28672 GRANT, real grant 16384 Mar 09 07:51:33 fir-io7-s1 kernel: LNetError: 88765:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 09 07:51:33 fir-io7-s1 kernel: LNetError: 88765:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 358 previous similar messages Mar 09 07:53:48 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 Mar 09 07:53:48 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 14 previous similar messages Mar 09 07:56:51 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 4 seconds Mar 09 07:56:51 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 458 previous similar messages Mar 09 07:59:30 fir-io7-s1 kernel: LNetError: 58908:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 07:59:30 fir-io7-s1 kernel: LNetError: 58908:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 217 previous similar messages Mar 09 08:01:35 fir-io7-s1 kernel: LNetError: 88765:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 09 08:01:35 fir-io7-s1 kernel: LNetError: 88765:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 353 previous similar messages Mar 09 08:03:48 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 Mar 09 08:03:48 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages Mar 09 08:06:56 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.236@o2ib7: 5 seconds Mar 09 08:06:56 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 303 previous similar messages Mar 09 08:09:31 fir-io7-s1 kernel: LNetError: 88765:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 08:09:31 fir-io7-s1 kernel: LNetError: 88765:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 244 previous similar messages Mar 09 08:11:38 fir-io7-s1 kernel: LNetError: 88765:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 09 08:11:38 fir-io7-s1 kernel: LNetError: 88765:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 333 previous similar messages Mar 09 08:13:55 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 Mar 09 08:13:55 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 16 previous similar messages Mar 09 08:16:57 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.237@o2ib7: 0 seconds Mar 09 08:16:57 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 326 previous similar messages Mar 09 08:19:31 fir-io7-s1 kernel: LNetError: 89727:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 08:19:31 fir-io7-s1 kernel: LNetError: 89727:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 239 previous similar messages Mar 09 08:21:40 fir-io7-s1 kernel: LNetError: 88765:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 09 08:21:40 fir-io7-s1 kernel: LNetError: 88765:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 336 previous similar messages Mar 09 08:23:17 fir-io7-s1 kernel: Lustre: fir-OST0048: Client bf9c0359-a266-4 (at 10.50.0.1@o2ib2) reconnecting Mar 09 08:23:17 fir-io7-s1 kernel: Lustre: fir-OST004c: Connection restored to bf9c0359-a266-4 (at 10.50.0.1@o2ib2) Mar 09 08:23:17 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 09 08:23:33 fir-io7-s1 kernel: LustreError: 137-5: fir-OST0049_UUID: not available for connect from 10.50.0.1@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Mar 09 08:23:33 fir-io7-s1 kernel: LustreError: Skipped 75 previous similar messages Mar 09 08:23:56 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 Mar 09 08:23:56 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 17 previous similar messages Mar 09 08:23:58 fir-io7-s1 kernel: Lustre: fir-OST0048: Client bf9c0359-a266-4 (at 10.50.0.1@o2ib2) reconnecting Mar 09 08:23:58 fir-io7-s1 kernel: Lustre: fir-OST004a: Connection restored to bf9c0359-a266-4 (at 10.50.0.1@o2ib2) Mar 09 08:23:58 fir-io7-s1 kernel: Lustre: Skipped 4 previous similar messages Mar 09 08:23:58 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 09 08:25:11 fir-io7-s1 kernel: Lustre: fir-OST0048: Client bf9c0359-a266-4 (at 10.50.0.1@o2ib2) reconnecting Mar 09 08:25:11 fir-io7-s1 kernel: Lustre: fir-OST004e: Connection restored to bf9c0359-a266-4 (at 10.50.0.1@o2ib2) Mar 09 08:25:11 fir-io7-s1 kernel: Lustre: fir-OST0052: Connection restored to bf9c0359-a266-4 (at 10.50.0.1@o2ib2) Mar 09 08:25:11 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 09 08:25:11 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 09 08:25:11 fir-io7-s1 kernel: Lustre: Skipped 3 previous similar messages Mar 09 08:25:37 fir-io7-s1 kernel: LustreError: 137-5: fir-OST0049_UUID: not available for connect from 10.50.0.1@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Mar 09 08:25:37 fir-io7-s1 kernel: LustreError: Skipped 4 previous similar messages Mar 09 08:26:02 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to bf9c0359-a266-4 (at 10.50.0.1@o2ib2) Mar 09 08:26:02 fir-io7-s1 kernel: Lustre: Skipped 8 previous similar messages Mar 09 08:27:10 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds Mar 09 08:27:10 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 426 previous similar messages Mar 09 08:29:31 fir-io7-s1 kernel: LNetError: 88413:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 08:29:31 fir-io7-s1 kernel: LNetError: 88413:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 213 previous similar messages Mar 09 08:31:41 fir-io7-s1 kernel: LNetError: 88765:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 09 08:31:41 fir-io7-s1 kernel: LNetError: 88765:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 337 previous similar messages Mar 09 08:33:59 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 Mar 09 08:33:59 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 14 previous similar messages Mar 09 08:37:11 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.219@o2ib7: 5 seconds Mar 09 08:37:11 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 358 previous similar messages Mar 09 08:39:31 fir-io7-s1 kernel: LNetError: 90336:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 08:39:31 fir-io7-s1 kernel: LNetError: 90336:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 218 previous similar messages Mar 09 08:41:41 fir-io7-s1 kernel: LNetError: 90596:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 09 08:41:41 fir-io7-s1 kernel: LNetError: 90596:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 337 previous similar messages Mar 09 08:43:59 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 Mar 09 08:43:59 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 10 previous similar messages Mar 09 08:47:21 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 5 seconds Mar 09 08:47:21 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 332 previous similar messages Mar 09 08:49:31 fir-io7-s1 kernel: LNetError: 90596:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 08:49:31 fir-io7-s1 kernel: LNetError: 90596:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 245 previous similar messages Mar 09 08:51:46 fir-io7-s1 kernel: LNetError: 90984:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 09 08:51:46 fir-io7-s1 kernel: LNetError: 90984:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 340 previous similar messages Mar 09 08:54:02 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 Mar 09 08:54:02 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 20 previous similar messages Mar 09 08:57:31 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 5 seconds Mar 09 08:57:31 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 408 previous similar messages Mar 09 08:59:31 fir-io7-s1 kernel: LNetError: 90984:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 08:59:31 fir-io7-s1 kernel: LNetError: 90984:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 212 previous similar messages Mar 09 09:01:46 fir-io7-s1 kernel: LNetError: 91335:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 09 09:01:46 fir-io7-s1 kernel: LNetError: 91335:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 329 previous similar messages Mar 09 09:04:04 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 Mar 09 09:04:04 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 9 previous similar messages Mar 09 09:07:36 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.233@o2ib7: 5 seconds Mar 09 09:07:36 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 551 previous similar messages Mar 09 09:09:36 fir-io7-s1 kernel: LNetError: 91335:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 09:09:36 fir-io7-s1 kernel: LNetError: 91335:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 213 previous similar messages Mar 09 09:11:50 fir-io7-s1 kernel: LNetError: 91710:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 09 09:11:50 fir-io7-s1 kernel: LNetError: 91710:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 337 previous similar messages Mar 09 09:14:06 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 Mar 09 09:14:06 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 17 previous similar messages Mar 09 09:17:47 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.236@o2ib7: 10 seconds Mar 09 09:17:47 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 819 previous similar messages Mar 09 09:19:36 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 09:19:36 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 168 previous similar messages Mar 09 09:21:52 fir-io7-s1 kernel: LNetError: 91710:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 09 09:21:52 fir-io7-s1 kernel: LNetError: 91710:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 354 previous similar messages Mar 09 09:24:10 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.238@o2ib7: -125 Mar 09 09:24:10 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages Mar 09 09:27:48 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.233@o2ib7: 2 seconds Mar 09 09:27:48 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 606 previous similar messages Mar 09 09:29:36 fir-io7-s1 kernel: LNetError: 91710:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 09:29:36 fir-io7-s1 kernel: LNetError: 91710:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 159 previous similar messages Mar 09 09:31:54 fir-io7-s1 kernel: LNetError: 91710:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 09 09:31:54 fir-io7-s1 kernel: LNetError: 91710:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 349 previous similar messages Mar 09 09:34:10 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.235@o2ib7: -125 Mar 09 09:34:10 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 10 previous similar messages Mar 09 09:37:52 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.226@o2ib7: 1 seconds Mar 09 09:37:52 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 560 previous similar messages Mar 09 09:39:36 fir-io7-s1 kernel: LNetError: 91710:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 09:39:36 fir-io7-s1 kernel: LNetError: 91710:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 194 previous similar messages Mar 09 09:41:56 fir-io7-s1 kernel: LNetError: 92756:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 09 09:41:56 fir-io7-s1 kernel: LNetError: 92756:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 342 previous similar messages Mar 09 09:44:12 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 Mar 09 09:44:12 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 21 previous similar messages Mar 09 09:47:55 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.235@o2ib7: 0 seconds Mar 09 09:47:55 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 603 previous similar messages Mar 09 09:49:36 fir-io7-s1 kernel: LNetError: 92756:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 09:49:36 fir-io7-s1 kernel: LNetError: 92756:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 178 previous similar messages Mar 09 09:51:58 fir-io7-s1 kernel: LNetError: 93111:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 09 09:51:58 fir-io7-s1 kernel: LNetError: 93111:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 337 previous similar messages Mar 09 09:54:15 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 Mar 09 09:54:15 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 9 previous similar messages Mar 09 09:57:56 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.237@o2ib7: 1 seconds Mar 09 09:57:56 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 644 previous similar messages Mar 09 09:59:36 fir-io7-s1 kernel: LNetError: 93111:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 09:59:36 fir-io7-s1 kernel: LNetError: 93111:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 187 previous similar messages Mar 09 10:02:00 fir-io7-s1 kernel: LNetError: 93460:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 09 10:02:00 fir-io7-s1 kernel: LNetError: 93460:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 353 previous similar messages Mar 09 10:04:16 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 Mar 09 10:04:16 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 17 previous similar messages Mar 09 10:08:05 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.217@o2ib7: 0 seconds Mar 09 10:08:05 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 638 previous similar messages Mar 09 10:09:41 fir-io7-s1 kernel: LNetError: 93675:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 10:09:41 fir-io7-s1 kernel: LNetError: 93675:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 226 previous similar messages Mar 09 10:12:02 fir-io7-s1 kernel: LNetError: 93889:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 09 10:12:02 fir-io7-s1 kernel: LNetError: 93889:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 351 previous similar messages Mar 09 10:14:20 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.238@o2ib7: -125 Mar 09 10:14:20 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 17 previous similar messages Mar 09 10:18:12 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.225@o2ib7: 0 seconds Mar 09 10:18:12 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 638 previous similar messages Mar 09 10:19:41 fir-io7-s1 kernel: LNetError: 93471:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 10:19:41 fir-io7-s1 kernel: LNetError: 93471:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 232 previous similar messages Mar 09 10:22:03 fir-io7-s1 kernel: LNetError: 93889:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 09 10:22:03 fir-io7-s1 kernel: LNetError: 93889:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 362 previous similar messages Mar 09 10:22:17 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 2539e822-064e-4 (at 10.50.0.61@o2ib2) Mar 09 10:22:17 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 09 10:24:20 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 Mar 09 10:24:20 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 11 previous similar messages Mar 09 10:28:22 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.225@o2ib7: 2 seconds Mar 09 10:28:22 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 660 previous similar messages Mar 09 10:29:46 fir-io7-s1 kernel: LNetError: 93889:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 10:29:46 fir-io7-s1 kernel: LNetError: 93889:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 230 previous similar messages Mar 09 10:32:05 fir-io7-s1 kernel: LNetError: 93889:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 09 10:32:05 fir-io7-s1 kernel: LNetError: 93889:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 358 previous similar messages Mar 09 10:34:21 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 Mar 09 10:34:21 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 17 previous similar messages Mar 09 10:38:32 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.217@o2ib7: 5 seconds Mar 09 10:38:32 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 692 previous similar messages Mar 09 10:39:46 fir-io7-s1 kernel: LNetError: 93889:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 10:39:46 fir-io7-s1 kernel: LNetError: 93889:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 252 previous similar messages Mar 09 10:42:07 fir-io7-s1 kernel: LNetError: 94878:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 09 10:42:07 fir-io7-s1 kernel: LNetError: 94878:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 373 previous similar messages Mar 09 10:44:25 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.238@o2ib7: -125 Mar 09 10:44:25 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 17 previous similar messages Mar 09 10:48:44 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 0 seconds Mar 09 10:48:44 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 478 previous similar messages Mar 09 10:49:46 fir-io7-s1 kernel: LNetError: 95080:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 10:49:46 fir-io7-s1 kernel: LNetError: 95080:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 263 previous similar messages Mar 09 10:52:09 fir-io7-s1 kernel: LNetError: 94878:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 09 10:52:09 fir-io7-s1 kernel: LNetError: 94878:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 393 previous similar messages Mar 09 10:53:16 fir-io7-s1 kernel: Lustre: fir-OST0048: haven't heard from client ebe282dd-9eaf-4 (at 10.50.0.61@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c78fe143800, cur 1583776396 expire 1583776246 last 1583776169 Mar 09 10:53:16 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 09 10:54:25 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 Mar 09 10:54:25 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 10 previous similar messages Mar 09 10:58:51 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.226@o2ib7: 0 seconds Mar 09 10:58:51 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 380 previous similar messages Mar 09 10:59:46 fir-io7-s1 kernel: LNetError: 95289:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 10:59:46 fir-io7-s1 kernel: LNetError: 95289:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 256 previous similar messages Mar 09 11:02:11 fir-io7-s1 kernel: LNetError: 95442:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 09 11:02:11 fir-io7-s1 kernel: LNetError: 95442:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 379 previous similar messages Mar 09 11:09:02 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.233@o2ib7: 2 seconds Mar 09 11:09:02 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 415 previous similar messages Mar 09 11:09:22 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 Mar 09 11:09:22 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 23 previous similar messages Mar 09 11:09:46 fir-io7-s1 kernel: LNetError: 95442:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 11:09:46 fir-io7-s1 kernel: LNetError: 95442:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 251 previous similar messages Mar 09 11:12:13 fir-io7-s1 kernel: LNetError: 95442:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 09 11:12:13 fir-io7-s1 kernel: LNetError: 95442:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 373 previous similar messages Mar 09 11:19:06 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.239@o2ib7: 0 seconds Mar 09 11:19:06 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 225 previous similar messages Mar 09 11:19:29 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 Mar 09 11:19:29 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages Mar 09 11:19:46 fir-io7-s1 kernel: LNetError: 95442:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 11:19:46 fir-io7-s1 kernel: LNetError: 95442:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 256 previous similar messages Mar 09 11:22:15 fir-io7-s1 kernel: LNetError: 96306:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 09 11:22:15 fir-io7-s1 kernel: LNetError: 96306:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 368 previous similar messages Mar 09 11:29:29 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 Mar 09 11:29:29 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 16 previous similar messages Mar 09 11:29:33 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.237@o2ib7: 0 seconds Mar 09 11:29:33 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 300 previous similar messages Mar 09 11:29:46 fir-io7-s1 kernel: LNetError: 96306:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 11:29:46 fir-io7-s1 kernel: LNetError: 96306:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 254 previous similar messages Mar 09 11:32:16 fir-io7-s1 kernel: LNetError: 96306:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 09 11:32:16 fir-io7-s1 kernel: LNetError: 96306:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 442 previous similar messages Mar 09 11:39:29 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 Mar 09 11:39:29 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 14 previous similar messages Mar 09 11:39:41 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.226@o2ib7: 0 seconds Mar 09 11:39:41 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 240 previous similar messages Mar 09 11:39:46 fir-io7-s1 kernel: LNetError: 96306:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 11:39:46 fir-io7-s1 kernel: LNetError: 96306:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 248 previous similar messages Mar 09 11:42:20 fir-io7-s1 kernel: LNetError: 96306:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 09 11:42:20 fir-io7-s1 kernel: LNetError: 96306:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 470 previous similar messages Mar 09 11:49:32 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 Mar 09 11:49:32 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages Mar 09 11:49:51 fir-io7-s1 kernel: LNetError: 97242:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 11:49:51 fir-io7-s1 kernel: LNetError: 97242:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 214 previous similar messages Mar 09 11:50:00 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.226@o2ib7: 0 seconds Mar 09 11:50:00 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 122 previous similar messages Mar 09 11:52:22 fir-io7-s1 kernel: LNetError: 97242:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 09 11:52:22 fir-io7-s1 kernel: LNetError: 97242:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 457 previous similar messages Mar 09 11:59:51 fir-io7-s1 kernel: LNetError: 97242:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 11:59:51 fir-io7-s1 kernel: LNetError: 97242:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 219 previous similar messages Mar 09 12:00:01 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.226@o2ib7: 0 seconds Mar 09 12:00:01 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 696 previous similar messages Mar 09 12:00:40 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 2539e822-064e-4 (at 10.50.0.61@o2ib2) Mar 09 12:00:40 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 09 12:02:24 fir-io7-s1 kernel: LNetError: 97704:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 09 12:02:24 fir-io7-s1 kernel: LNetError: 97704:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 351 previous similar messages Mar 09 12:04:39 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 Mar 09 12:04:39 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages Mar 09 12:09:56 fir-io7-s1 kernel: LNetError: 97704:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 12:09:56 fir-io7-s1 kernel: LNetError: 97704:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 200 previous similar messages Mar 09 12:10:12 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.233@o2ib7: 6 seconds Mar 09 12:10:12 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 633 previous similar messages Mar 09 12:12:26 fir-io7-s1 kernel: LNetError: 98096:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 09 12:12:26 fir-io7-s1 kernel: LNetError: 98096:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 347 previous similar messages Mar 09 12:14:42 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 Mar 09 12:14:42 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 18 previous similar messages Mar 09 12:19:56 fir-io7-s1 kernel: LNetError: 98096:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 12:19:56 fir-io7-s1 kernel: LNetError: 98096:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 174 previous similar messages Mar 09 12:20:15 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.232@o2ib7: 0 seconds Mar 09 12:20:15 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 673 previous similar messages Mar 09 12:22:16 fir-io7-s1 kernel: Lustre: fir-OST0050: haven't heard from client a76fb3e4-ae87-4 (at 10.50.0.64@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c57ee950c00, cur 1583781736 expire 1583781586 last 1583781509 Mar 09 12:22:16 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 09 12:22:26 fir-io7-s1 kernel: LNetError: 98096:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 09 12:22:26 fir-io7-s1 kernel: LNetError: 98096:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 350 previous similar messages Mar 09 12:24:44 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 Mar 09 12:24:44 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 8 previous similar messages Mar 09 12:30:01 fir-io7-s1 kernel: LNetError: 98096:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 12:30:01 fir-io7-s1 kernel: LNetError: 98096:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 240 previous similar messages Mar 09 12:30:17 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.219@o2ib7: 6 seconds Mar 09 12:30:17 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 443 previous similar messages Mar 09 12:32:30 fir-io7-s1 kernel: LNetError: 98813:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 09 12:32:30 fir-io7-s1 kernel: LNetError: 98813:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 356 previous similar messages Mar 09 12:34:47 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 Mar 09 12:34:47 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 16 previous similar messages Mar 09 12:40:01 fir-io7-s1 kernel: LNetError: 90985:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 12:40:01 fir-io7-s1 kernel: LNetError: 90985:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 235 previous similar messages Mar 09 12:40:19 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.239@o2ib7: 0 seconds Mar 09 12:40:19 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 610 previous similar messages Mar 09 12:42:32 fir-io7-s1 kernel: LNetError: 98813:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 09 12:42:32 fir-io7-s1 kernel: LNetError: 98813:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 361 previous similar messages Mar 09 12:44:50 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.238@o2ib7: -125 Mar 09 12:44:50 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 12 previous similar messages Mar 09 12:50:01 fir-io7-s1 kernel: LNetError: 90985:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 12:50:01 fir-io7-s1 kernel: LNetError: 90985:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 247 previous similar messages Mar 09 12:50:20 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.233@o2ib7: 0 seconds Mar 09 12:50:20 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 462 previous similar messages Mar 09 12:52:34 fir-io7-s1 kernel: LNetError: 99538:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 09 12:52:34 fir-io7-s1 kernel: LNetError: 99538:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 359 previous similar messages Mar 09 12:54:52 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.237@o2ib7: -125 Mar 09 12:54:52 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 17 previous similar messages Mar 09 13:00:01 fir-io7-s1 kernel: LNetError: 99748:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 13:00:01 fir-io7-s1 kernel: LNetError: 99748:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 248 previous similar messages Mar 09 13:00:24 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.236@o2ib7: 0 seconds Mar 09 13:00:24 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 368 previous similar messages Mar 09 13:02:35 fir-io7-s1 kernel: LNetError: 99748:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 09 13:02:35 fir-io7-s1 kernel: LNetError: 99748:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 363 previous similar messages Mar 09 13:04:35 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to a76fb3e4-ae87-4 (at 10.50.0.64@o2ib2) Mar 09 13:04:35 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 09 13:04:52 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 Mar 09 13:04:52 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 14 previous similar messages Mar 09 13:10:01 fir-io7-s1 kernel: LNetError: 100075:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 13:10:01 fir-io7-s1 kernel: LNetError: 100075:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 241 previous similar messages Mar 09 13:10:31 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.226@o2ib7: 1 seconds Mar 09 13:10:31 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 205 previous similar messages Mar 09 13:12:36 fir-io7-s1 kernel: LNetError: 100075:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 09 13:12:36 fir-io7-s1 kernel: LNetError: 100075:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 357 previous similar messages Mar 09 13:14:55 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.238@o2ib7: -125 Mar 09 13:14:55 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 9 previous similar messages Mar 09 13:20:01 fir-io7-s1 kernel: LNetError: 100426:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 13:20:01 fir-io7-s1 kernel: LNetError: 100426:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 236 previous similar messages Mar 09 13:20:37 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.227@o2ib7: 1 seconds Mar 09 13:20:37 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 279 previous similar messages Mar 09 13:22:39 fir-io7-s1 kernel: LNetError: 100426:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 09 13:22:39 fir-io7-s1 kernel: LNetError: 100426:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 355 previous similar messages Mar 09 13:29:52 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.238@o2ib7: -125 Mar 09 13:29:52 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 17 previous similar messages Mar 09 13:30:01 fir-io7-s1 kernel: LNetError: 100426:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 13:30:01 fir-io7-s1 kernel: LNetError: 100426:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 250 previous similar messages Mar 09 13:30:39 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.237@o2ib7: 0 seconds Mar 09 13:30:39 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 348 previous similar messages Mar 09 13:32:41 fir-io7-s1 kernel: LNetError: 101032:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 09 13:32:41 fir-io7-s1 kernel: LNetError: 101032:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 373 previous similar messages Mar 09 13:39:52 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 Mar 09 13:39:52 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages Mar 09 13:40:01 fir-io7-s1 kernel: LNetError: 99771:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 13:40:01 fir-io7-s1 kernel: LNetError: 99771:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 244 previous similar messages Mar 09 13:40:42 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.233@o2ib7: 1 seconds Mar 09 13:40:42 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 286 previous similar messages Mar 09 13:42:43 fir-io7-s1 kernel: LNetError: 101296:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 09 13:42:43 fir-io7-s1 kernel: LNetError: 101296:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 354 previous similar messages Mar 09 13:50:00 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 Mar 09 13:50:00 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages Mar 09 13:50:03 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 13:50:03 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 232 previous similar messages Mar 09 13:50:47 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 8 seconds Mar 09 13:50:47 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 291 previous similar messages Mar 09 13:52:45 fir-io7-s1 kernel: LNetError: 101652:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 09 13:52:45 fir-io7-s1 kernel: LNetError: 101652:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 350 previous similar messages Mar 09 14:00:02 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 Mar 09 14:00:02 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 20 previous similar messages Mar 09 14:00:06 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 14:00:06 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 243 previous similar messages Mar 09 14:01:07 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 0 seconds Mar 09 14:01:07 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 319 previous similar messages Mar 09 14:02:46 fir-io7-s1 kernel: LNetError: 101652:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 09 14:02:46 fir-io7-s1 kernel: LNetError: 101652:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 361 previous similar messages Mar 09 14:10:05 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 Mar 09 14:10:05 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 11 previous similar messages Mar 09 14:10:06 fir-io7-s1 kernel: LNetError: 102247:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 14:10:06 fir-io7-s1 kernel: LNetError: 102247:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 226 previous similar messages Mar 09 14:11:17 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.219@o2ib7: 10 seconds Mar 09 14:11:17 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 366 previous similar messages Mar 09 14:12:49 fir-io7-s1 kernel: LNetError: 102516:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 09 14:12:49 fir-io7-s1 kernel: LNetError: 102516:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 351 previous similar messages Mar 09 14:20:06 fir-io7-s1 kernel: LNetError: 102757:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 14:20:06 fir-io7-s1 kernel: LNetError: 102757:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 236 previous similar messages Mar 09 14:21:24 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.227@o2ib7: 0 seconds Mar 09 14:21:24 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 378 previous similar messages Mar 09 14:22:00 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 1f639cf6-1da0-4 (at 10.50.0.63@o2ib2) Mar 09 14:22:00 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 09 14:22:51 fir-io7-s1 kernel: LNetError: 102891:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 09 14:22:51 fir-io7-s1 kernel: LNetError: 102891:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 352 previous similar messages Mar 09 14:25:02 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 Mar 09 14:25:02 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 22 previous similar messages Mar 09 14:30:06 fir-io7-s1 kernel: LNetError: 103148:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 14:30:06 fir-io7-s1 kernel: LNetError: 103148:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 223 previous similar messages Mar 09 14:31:38 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 2 seconds Mar 09 14:31:38 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 275 previous similar messages Mar 09 14:32:53 fir-io7-s1 kernel: LNetError: 103148:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 09 14:32:53 fir-io7-s1 kernel: LNetError: 103148:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 353 previous similar messages Mar 09 14:35:03 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 Mar 09 14:35:03 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages Mar 09 14:40:06 fir-io7-s1 kernel: LNetError: 103534:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 14:40:06 fir-io7-s1 kernel: LNetError: 103534:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 226 previous similar messages Mar 09 14:41:48 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.239@o2ib7: 5 seconds Mar 09 14:41:48 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 261 previous similar messages Mar 09 14:42:55 fir-io7-s1 kernel: LNetError: 103148:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 09 14:42:55 fir-io7-s1 kernel: LNetError: 103148:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 361 previous similar messages Mar 09 14:45:07 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.238@o2ib7: -125 Mar 09 14:45:07 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages Mar 09 14:50:06 fir-io7-s1 kernel: LNetError: 103148:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 14:50:06 fir-io7-s1 kernel: LNetError: 103148:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 239 previous similar messages Mar 09 14:52:10 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 0 seconds Mar 09 14:52:10 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 265 previous similar messages Mar 09 14:52:56 fir-io7-s1 kernel: LNetError: 103939:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 09 14:52:56 fir-io7-s1 kernel: LNetError: 103939:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 366 previous similar messages Mar 09 14:54:00 fir-io7-s1 kernel: LustreError: 73705:0:(tgt_grant.c:758:tgt_grant_check()) fir-OST004e: cli 83308672-91dd-4 claims 28672 GRANT, real grant 12288 Mar 09 14:55:09 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 Mar 09 14:55:09 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages Mar 09 15:00:06 fir-io7-s1 kernel: LNetError: 103939:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 15:00:06 fir-io7-s1 kernel: LNetError: 103939:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 238 previous similar messages Mar 09 15:02:11 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 0 seconds Mar 09 15:02:11 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 198 previous similar messages Mar 09 15:02:56 fir-io7-s1 kernel: LNetError: 104287:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 09 15:02:56 fir-io7-s1 kernel: LNetError: 104287:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 356 previous similar messages Mar 09 15:05:14 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.238@o2ib7: -125 Mar 09 15:05:14 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 12 previous similar messages Mar 09 15:06:28 fir-io7-s1 kernel: LustreError: 40811:0:(tgt_grant.c:758:tgt_grant_check()) fir-OST0050: cli 83308672-91dd-4 claims 28672 GRANT, real grant 20480 Mar 09 15:06:37 fir-io7-s1 kernel: LustreError: 68348:0:(tgt_grant.c:758:tgt_grant_check()) fir-OST004a: cli ef7e7ddb-0d3b-4 claims 16375808 GRANT, real grant 32768 Mar 09 15:10:06 fir-io7-s1 kernel: LNetError: 104287:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 15:10:06 fir-io7-s1 kernel: LNetError: 104287:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 223 previous similar messages Mar 09 15:12:21 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.239@o2ib7: 0 seconds Mar 09 15:12:21 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 141 previous similar messages Mar 09 15:13:01 fir-io7-s1 kernel: LNetError: 104287:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 09 15:13:01 fir-io7-s1 kernel: LNetError: 104287:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 371 previous similar messages Mar 09 15:15:14 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 Mar 09 15:15:14 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 20 previous similar messages Mar 09 15:20:07 fir-io7-s1 kernel: LNetError: 104287:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 15:20:07 fir-io7-s1 kernel: LNetError: 104287:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 205 previous similar messages Mar 09 15:21:05 fir-io7-s1 kernel: LustreError: 90839:0:(tgt_grant.c:758:tgt_grant_check()) fir-OST004e: cli a387b29d-067d-4 claims 49152 GRANT, real grant 0 Mar 09 15:22:22 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.226@o2ib7: 5 seconds Mar 09 15:22:22 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 81 previous similar messages Mar 09 15:23:04 fir-io7-s1 kernel: LNetError: 104287:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 09 15:23:04 fir-io7-s1 kernel: LNetError: 104287:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 389 previous similar messages Mar 09 15:25:19 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 Mar 09 15:25:19 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 10 previous similar messages Mar 09 15:30:07 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 15:30:07 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 207 previous similar messages Mar 09 15:32:45 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.232@o2ib7: 0 seconds Mar 09 15:32:45 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 108 previous similar messages Mar 09 15:33:06 fir-io7-s1 kernel: LNetError: 104287:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 09 15:33:06 fir-io7-s1 kernel: LNetError: 104287:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 380 previous similar messages Mar 09 15:35:19 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 Mar 09 15:35:19 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages Mar 09 15:40:10 fir-io7-s1 kernel: LNetError: 105172:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 15:40:10 fir-io7-s1 kernel: LNetError: 105172:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 217 previous similar messages Mar 09 15:43:06 fir-io7-s1 kernel: LNetError: 105447:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 09 15:43:06 fir-io7-s1 kernel: LNetError: 105447:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 387 previous similar messages Mar 09 15:43:25 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.233@o2ib7: 0 seconds Mar 09 15:43:25 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 153 previous similar messages Mar 09 15:45:25 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 Mar 09 15:45:25 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages Mar 09 15:50:11 fir-io7-s1 kernel: LNetError: 105796:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 15:50:11 fir-io7-s1 kernel: LNetError: 105796:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 206 previous similar messages Mar 09 15:53:06 fir-io7-s1 kernel: LNetError: 106051:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 09 15:53:06 fir-io7-s1 kernel: LNetError: 106051:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 389 previous similar messages Mar 09 15:53:26 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.225@o2ib7: 0 seconds Mar 09 15:53:26 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 447 previous similar messages Mar 09 15:55:26 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 Mar 09 15:55:26 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 19 previous similar messages Mar 09 16:00:11 fir-io7-s1 kernel: LNetError: 106051:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 16:00:11 fir-io7-s1 kernel: LNetError: 106051:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 256 previous similar messages Mar 09 16:03:06 fir-io7-s1 kernel: LNetError: 106051:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 09 16:03:06 fir-io7-s1 kernel: LNetError: 106051:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 385 previous similar messages Mar 09 16:03:27 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.232@o2ib7: 1 seconds Mar 09 16:03:27 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 364 previous similar messages Mar 09 16:05:29 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.235@o2ib7: -125 Mar 09 16:05:29 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 12 previous similar messages Mar 09 16:07:58 fir-io7-s1 kernel: LustreError: 68379:0:(tgt_grant.c:758:tgt_grant_check()) fir-OST0050: cli e225f3d7-7aff-4 claims 3493888 GRANT, real grant 2125824 Mar 09 16:10:11 fir-io7-s1 kernel: LNetError: 106051:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 16:10:11 fir-io7-s1 kernel: LNetError: 106051:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 240 previous similar messages Mar 09 16:13:06 fir-io7-s1 kernel: LNetError: 106770:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 09 16:13:06 fir-io7-s1 kernel: LNetError: 106770:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 361 previous similar messages Mar 09 16:13:36 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.219@o2ib7: 1 seconds Mar 09 16:13:36 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 229 previous similar messages Mar 09 16:15:30 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 Mar 09 16:15:30 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 11 previous similar messages Mar 09 16:20:16 fir-io7-s1 kernel: LNetError: 106770:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 16:20:16 fir-io7-s1 kernel: LNetError: 106770:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 245 previous similar messages Mar 09 16:23:11 fir-io7-s1 kernel: LNetError: 107123:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 09 16:23:11 fir-io7-s1 kernel: LNetError: 107123:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 360 previous similar messages Mar 09 16:23:40 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.232@o2ib7: 0 seconds Mar 09 16:23:40 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 366 previous similar messages Mar 09 16:25:31 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 Mar 09 16:25:31 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 16 previous similar messages Mar 09 16:30:19 fir-io7-s1 kernel: LNetError: 106223:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 16:30:19 fir-io7-s1 kernel: LNetError: 106223:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 257 previous similar messages Mar 09 16:33:11 fir-io7-s1 kernel: LNetError: 107123:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 09 16:33:11 fir-io7-s1 kernel: LNetError: 107123:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 399 previous similar messages Mar 09 16:33:46 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.232@o2ib7: 1 seconds Mar 09 16:33:46 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 410 previous similar messages Mar 09 16:34:45 fir-io7-s1 kernel: Lustre: fir-OST004e: haven't heard from client 0a66dea3-813c-4 (at 10.50.0.12@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c698f1d2000, cur 1583796885 expire 1583796735 last 1583796658 Mar 09 16:34:45 fir-io7-s1 kernel: Lustre: Skipped 11 previous similar messages Mar 09 16:35:35 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.238@o2ib7: -125 Mar 09 16:35:35 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 17 previous similar messages Mar 09 16:40:20 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 16:40:20 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 228 previous similar messages Mar 09 16:43:11 fir-io7-s1 kernel: LNetError: 90985:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 09 16:43:11 fir-io7-s1 kernel: LNetError: 90985:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 354 previous similar messages Mar 09 16:43:52 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.236@o2ib7: 2 seconds Mar 09 16:43:52 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 226 previous similar messages Mar 09 16:50:21 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 16:50:21 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 234 previous similar messages Mar 09 16:50:34 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 Mar 09 16:50:34 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 18 previous similar messages Mar 09 16:53:11 fir-io7-s1 kernel: LNetError: 90985:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 09 16:53:11 fir-io7-s1 kernel: LNetError: 90985:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 367 previous similar messages Mar 09 16:53:53 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.232@o2ib7: 2 seconds Mar 09 16:53:53 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 348 previous similar messages Mar 09 17:00:21 fir-io7-s1 kernel: LNetError: 108530:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 17:00:21 fir-io7-s1 kernel: LNetError: 108530:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 227 previous similar messages Mar 09 17:00:34 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 Mar 09 17:00:34 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages Mar 09 17:03:16 fir-io7-s1 kernel: LNetError: 107820:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 09 17:03:16 fir-io7-s1 kernel: LNetError: 107820:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 353 previous similar messages Mar 09 17:04:02 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.237@o2ib7: 2 seconds Mar 09 17:04:02 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 395 previous similar messages Mar 09 17:06:01 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 0a66dea3-813c-4 (at 10.50.0.12@o2ib2) Mar 09 17:06:01 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 09 17:10:26 fir-io7-s1 kernel: LNetError: 108650:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 17:10:26 fir-io7-s1 kernel: LNetError: 108650:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 233 previous similar messages Mar 09 17:10:40 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 Mar 09 17:10:40 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages Mar 09 17:13:16 fir-io7-s1 kernel: LNetError: 90985:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 09 17:13:16 fir-io7-s1 kernel: LNetError: 90985:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 374 previous similar messages Mar 09 17:14:03 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.232@o2ib7: 0 seconds Mar 09 17:14:03 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 321 previous similar messages Mar 09 17:20:26 fir-io7-s1 kernel: LNetError: 108037:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 17:20:26 fir-io7-s1 kernel: LNetError: 108037:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 232 previous similar messages Mar 09 17:23:26 fir-io7-s1 kernel: LNetError: 108901:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 09 17:23:26 fir-io7-s1 kernel: LNetError: 108901:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 357 previous similar messages Mar 09 17:24:12 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.239@o2ib7: 6 seconds Mar 09 17:24:12 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 375 previous similar messages Mar 09 17:25:37 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 Mar 09 17:25:37 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 23 previous similar messages Mar 09 17:30:26 fir-io7-s1 kernel: LNetError: 108037:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 17:30:26 fir-io7-s1 kernel: LNetError: 108037:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 225 previous similar messages Mar 09 17:33:29 fir-io7-s1 kernel: LNetError: 109358:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 09 17:33:29 fir-io7-s1 kernel: LNetError: 109358:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 382 previous similar messages Mar 09 17:34:13 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.239@o2ib7: 1 seconds Mar 09 17:34:13 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 347 previous similar messages Mar 09 17:35:44 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 Mar 09 17:35:44 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages Mar 09 17:40:26 fir-io7-s1 kernel: LNetError: 109358:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 17:40:26 fir-io7-s1 kernel: LNetError: 109358:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 261 previous similar messages Mar 09 17:43:31 fir-io7-s1 kernel: LNetError: 81981:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 09 17:43:31 fir-io7-s1 kernel: LNetError: 81981:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 407 previous similar messages Mar 09 17:44:15 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.239@o2ib7: 0 seconds Mar 09 17:44:15 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 289 previous similar messages Mar 09 17:45:44 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 Mar 09 17:45:44 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages Mar 09 17:50:26 fir-io7-s1 kernel: LNetError: 110283:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 17:50:26 fir-io7-s1 kernel: LNetError: 110283:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 229 previous similar messages Mar 09 17:53:33 fir-io7-s1 kernel: LNetError: 110283:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 09 17:53:33 fir-io7-s1 kernel: LNetError: 110283:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 353 previous similar messages Mar 09 17:54:22 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 4 seconds Mar 09 17:54:22 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 330 previous similar messages Mar 09 17:55:50 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 Mar 09 17:55:50 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 16 previous similar messages Mar 09 17:56:32 fir-io7-s1 kernel: Lustre: fir-OST004a: haven't heard from client b72573f6-5f4e-4 (at 10.49.21.11@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c5037c91800, cur 1583801792 expire 1583801642 last 1583801565 Mar 09 17:56:32 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 09 18:00:26 fir-io7-s1 kernel: LNetError: 109982:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 18:00:26 fir-io7-s1 kernel: LNetError: 109982:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 225 previous similar messages Mar 09 18:03:35 fir-io7-s1 kernel: LNetError: 110283:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 09 18:03:35 fir-io7-s1 kernel: LNetError: 110283:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 335 previous similar messages Mar 09 18:04:23 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 5 seconds Mar 09 18:04:23 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 349 previous similar messages Mar 09 18:10:26 fir-io7-s1 kernel: LNetError: 110880:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 18:10:26 fir-io7-s1 kernel: LNetError: 110880:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 212 previous similar messages Mar 09 18:10:49 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 Mar 09 18:10:49 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 22 previous similar messages Mar 09 18:12:17 fir-io7-s1 kernel: Lustre: fir-OST0050: haven't heard from client 2fba0fed-0839-4 (at 10.50.14.4@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c5037c92000, cur 1583802737 expire 1583802587 last 1583802510 Mar 09 18:12:17 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 09 18:13:36 fir-io7-s1 kernel: LNetError: 110796:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 09 18:13:36 fir-io7-s1 kernel: LNetError: 110796:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 357 previous similar messages Mar 09 18:14:25 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.233@o2ib7: 0 seconds Mar 09 18:14:25 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 352 previous similar messages Mar 09 18:20:31 fir-io7-s1 kernel: LNetError: 111123:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 18:20:31 fir-io7-s1 kernel: LNetError: 111123:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 217 previous similar messages Mar 09 18:20:49 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 Mar 09 18:20:49 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages Mar 09 18:23:36 fir-io7-s1 kernel: LNetError: 111400:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 09 18:23:36 fir-io7-s1 kernel: LNetError: 111400:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 339 previous similar messages Mar 09 18:24:32 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.239@o2ib7: 5 seconds Mar 09 18:24:32 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 342 previous similar messages Mar 09 18:28:02 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to d4a82c84-dcbc-4 (at 10.49.21.11@o2ib1) Mar 09 18:28:02 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 09 18:30:31 fir-io7-s1 kernel: LNetError: 111400:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 18:30:31 fir-io7-s1 kernel: LNetError: 111400:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 221 previous similar messages Mar 09 18:30:52 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.238@o2ib7: -125 Mar 09 18:30:52 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 14 previous similar messages Mar 09 18:33:41 fir-io7-s1 kernel: LNetError: 111754:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 09 18:33:41 fir-io7-s1 kernel: LNetError: 111754:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 355 previous similar messages Mar 09 18:34:42 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 5 seconds Mar 09 18:34:42 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 340 previous similar messages Mar 09 18:40:32 fir-io7-s1 kernel: LNetError: 110880:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 18:40:32 fir-io7-s1 kernel: LNetError: 110880:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 221 previous similar messages Mar 09 18:40:52 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 Mar 09 18:40:52 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages Mar 09 18:42:34 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 6539b467-d583-4 (at 10.49.23.3@o2ib1) Mar 09 18:42:34 fir-io7-s1 kernel: Lustre: Skipped 4 previous similar messages Mar 09 18:43:42 fir-io7-s1 kernel: LNetError: 111754:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 09 18:43:42 fir-io7-s1 kernel: LNetError: 111754:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 343 previous similar messages Mar 09 18:44:52 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds Mar 09 18:44:52 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 335 previous similar messages Mar 09 18:50:41 fir-io7-s1 kernel: LNetError: 112207:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 18:50:41 fir-io7-s1 kernel: LNetError: 112207:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 224 previous similar messages Mar 09 18:50:59 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 Mar 09 18:50:59 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages Mar 09 18:53:31 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 85a6d889-5aaa-4 (at 10.50.14.4@o2ib2) Mar 09 18:53:31 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 09 18:53:46 fir-io7-s1 kernel: LNetError: 112207:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 09 18:53:46 fir-io7-s1 kernel: LNetError: 112207:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 362 previous similar messages Mar 09 18:54:56 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.239@o2ib7: 3 seconds Mar 09 18:54:56 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 337 previous similar messages Mar 09 19:00:41 fir-io7-s1 kernel: LNetError: 112567:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 19:00:41 fir-io7-s1 kernel: LNetError: 112567:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 216 previous similar messages Mar 09 19:00:59 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 Mar 09 19:00:59 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages Mar 09 19:03:48 fir-io7-s1 kernel: LNetError: 112797:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 09 19:03:48 fir-io7-s1 kernel: LNetError: 112797:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 361 previous similar messages Mar 09 19:05:06 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.233@o2ib7: 2 seconds Mar 09 19:05:06 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 292 previous similar messages Mar 09 19:10:41 fir-io7-s1 kernel: LNetError: 112797:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 19:10:41 fir-io7-s1 kernel: LNetError: 112797:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 222 previous similar messages Mar 09 19:11:04 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 Mar 09 19:11:04 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages Mar 09 19:13:50 fir-io7-s1 kernel: LNetError: 112797:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 09 19:13:50 fir-io7-s1 kernel: LNetError: 112797:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 352 previous similar messages Mar 09 19:15:14 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.238@o2ib7: 0 seconds Mar 09 19:15:14 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 146 previous similar messages Mar 09 19:20:41 fir-io7-s1 kernel: LNetError: 113319:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 19:20:41 fir-io7-s1 kernel: LNetError: 113319:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 234 previous similar messages Mar 09 19:21:04 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 Mar 09 19:21:04 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 16 previous similar messages Mar 09 19:23:52 fir-io7-s1 kernel: LNetError: 113477:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 09 19:23:52 fir-io7-s1 kernel: LNetError: 113477:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 367 previous similar messages Mar 09 19:25:16 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds Mar 09 19:25:16 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 451 previous similar messages Mar 09 19:26:34 fir-io7-s1 kernel: LustreError: 68861:0:(tgt_grant.c:758:tgt_grant_check()) fir-OST004e: cli 8b6b6a33-9ab5-4 claims 32768 GRANT, real grant 28672 Mar 09 19:30:41 fir-io7-s1 kernel: LNetError: 113477:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 19:30:41 fir-io7-s1 kernel: LNetError: 113477:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 189 previous similar messages Mar 09 19:31:09 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 Mar 09 19:31:09 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 10 previous similar messages Mar 09 19:33:54 fir-io7-s1 kernel: LNetError: 113864:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 09 19:33:54 fir-io7-s1 kernel: LNetError: 113864:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 363 previous similar messages Mar 09 19:35:21 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.217@o2ib7: 0 seconds Mar 09 19:35:21 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 582 previous similar messages Mar 09 19:40:41 fir-io7-s1 kernel: LNetError: 114128:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 19:40:41 fir-io7-s1 kernel: LNetError: 114128:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 217 previous similar messages Mar 09 19:41:11 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 Mar 09 19:41:11 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 18 previous similar messages Mar 09 19:42:50 fir-io7-s1 kernel: LustreError: 40794:0:(tgt_grant.c:758:tgt_grant_check()) fir-OST0050: cli 83308672-91dd-4 claims 28672 GRANT, real grant 16384 Mar 09 19:43:56 fir-io7-s1 kernel: LNetError: 114302:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 09 19:43:56 fir-io7-s1 kernel: LNetError: 114302:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 366 previous similar messages Mar 09 19:45:31 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.233@o2ib7: 0 seconds Mar 09 19:45:31 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 447 previous similar messages Mar 09 19:50:41 fir-io7-s1 kernel: LNetError: 114302:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 19:50:41 fir-io7-s1 kernel: LNetError: 114302:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 226 previous similar messages Mar 09 19:51:15 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.238@o2ib7: -125 Mar 09 19:51:15 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 16 previous similar messages Mar 09 19:53:56 fir-io7-s1 kernel: LNetError: 114566:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 09 19:53:56 fir-io7-s1 kernel: LNetError: 114566:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 378 previous similar messages Mar 09 19:55:37 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.237@o2ib7: 1 seconds Mar 09 19:55:37 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 457 previous similar messages Mar 09 20:00:46 fir-io7-s1 kernel: LNetError: 114566:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 20:00:46 fir-io7-s1 kernel: LNetError: 114566:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 240 previous similar messages Mar 09 20:01:15 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 Mar 09 20:01:15 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 8 previous similar messages Mar 09 20:03:56 fir-io7-s1 kernel: LNetError: 114919:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 09 20:03:56 fir-io7-s1 kernel: LNetError: 114919:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 358 previous similar messages Mar 09 20:05:41 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.217@o2ib7: 0 seconds Mar 09 20:05:41 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 461 previous similar messages Mar 09 20:10:46 fir-io7-s1 kernel: LNetError: 114919:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 20:10:46 fir-io7-s1 kernel: LNetError: 114919:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 249 previous similar messages Mar 09 20:11:16 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 Mar 09 20:11:16 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 17 previous similar messages Mar 09 20:14:01 fir-io7-s1 kernel: LNetError: 114919:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 09 20:14:01 fir-io7-s1 kernel: LNetError: 114919:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 350 previous similar messages Mar 09 20:15:46 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.217@o2ib7: 0 seconds Mar 09 20:15:46 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 418 previous similar messages Mar 09 20:20:46 fir-io7-s1 kernel: LNetError: 115623:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 20:20:46 fir-io7-s1 kernel: LNetError: 115623:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 255 previous similar messages Mar 09 20:21:19 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.234@o2ib7: -125 Mar 09 20:21:19 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 11 previous similar messages Mar 09 20:24:01 fir-io7-s1 kernel: LNetError: 115623:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 09 20:24:01 fir-io7-s1 kernel: LNetError: 115623:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 357 previous similar messages Mar 09 20:25:47 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 5 seconds Mar 09 20:25:47 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 378 previous similar messages Mar 09 20:30:46 fir-io7-s1 kernel: LNetError: 115623:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 20:30:46 fir-io7-s1 kernel: LNetError: 115623:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 250 previous similar messages Mar 09 20:31:19 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 Mar 09 20:31:19 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 11 previous similar messages Mar 09 20:34:01 fir-io7-s1 kernel: LNetError: 115984:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 09 20:34:01 fir-io7-s1 kernel: LNetError: 115984:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 347 previous similar messages Mar 09 20:35:52 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 2 seconds Mar 09 20:35:52 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 488 previous similar messages Mar 09 20:40:46 fir-io7-s1 kernel: LNetError: 115984:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 20:40:46 fir-io7-s1 kernel: LNetError: 115984:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 252 previous similar messages Mar 09 20:41:19 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.235@o2ib7: -125 Mar 09 20:41:19 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 19 previous similar messages Mar 09 20:44:01 fir-io7-s1 kernel: LNetError: 116333:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 09 20:44:01 fir-io7-s1 kernel: LNetError: 116333:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 351 previous similar messages Mar 09 20:45:57 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.236@o2ib7: 4 seconds Mar 09 20:45:57 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 335 previous similar messages Mar 09 20:50:46 fir-io7-s1 kernel: LNetError: 116333:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 20:50:46 fir-io7-s1 kernel: LNetError: 116333:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 253 previous similar messages Mar 09 20:51:24 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 Mar 09 20:51:24 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 12 previous similar messages Mar 09 20:54:01 fir-io7-s1 kernel: LNetError: 116333:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 09 20:54:01 fir-io7-s1 kernel: LNetError: 116333:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 370 previous similar messages Mar 09 20:56:02 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.236@o2ib7: 0 seconds Mar 09 20:56:02 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 401 previous similar messages Mar 09 21:00:46 fir-io7-s1 kernel: LNetError: 116534:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 21:00:46 fir-io7-s1 kernel: LNetError: 116534:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 248 previous similar messages Mar 09 21:01:24 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 Mar 09 21:01:24 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 14 previous similar messages Mar 09 21:04:01 fir-io7-s1 kernel: LNetError: 116809:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 09 21:04:01 fir-io7-s1 kernel: LNetError: 116809:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 346 previous similar messages Mar 09 21:06:07 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.239@o2ib7: 0 seconds Mar 09 21:06:07 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 324 previous similar messages Mar 09 21:10:46 fir-io7-s1 kernel: LNetError: 116906:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 21:10:46 fir-io7-s1 kernel: LNetError: 116906:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 240 previous similar messages Mar 09 21:11:29 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 Mar 09 21:11:29 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 17 previous similar messages Mar 09 21:14:06 fir-io7-s1 kernel: LNetError: 117157:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 09 21:14:06 fir-io7-s1 kernel: LNetError: 117157:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 361 previous similar messages Mar 09 21:16:09 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.226@o2ib7: 0 seconds Mar 09 21:16:09 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 388 previous similar messages Mar 09 21:20:46 fir-io7-s1 kernel: LNetError: 117517:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 21:20:46 fir-io7-s1 kernel: LNetError: 117517:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 246 previous similar messages Mar 09 21:21:29 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.235@o2ib7: -125 Mar 09 21:21:29 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 13 previous similar messages Mar 09 21:24:11 fir-io7-s1 kernel: LNetError: 117517:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 09 21:24:11 fir-io7-s1 kernel: LNetError: 117517:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 342 previous similar messages Mar 09 21:26:12 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.237@o2ib7: 2 seconds Mar 09 21:26:12 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 350 previous similar messages Mar 09 21:30:46 fir-io7-s1 kernel: LNetError: 117870:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 21:30:46 fir-io7-s1 kernel: LNetError: 117870:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 224 previous similar messages Mar 09 21:31:32 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 Mar 09 21:31:32 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 14 previous similar messages Mar 09 21:34:11 fir-io7-s1 kernel: LNetError: 118092:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 09 21:34:11 fir-io7-s1 kernel: LNetError: 118092:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 347 previous similar messages Mar 09 21:36:22 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.237@o2ib7: 2 seconds Mar 09 21:36:22 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 396 previous similar messages Mar 09 21:40:46 fir-io7-s1 kernel: LNetError: 118092:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 21:40:46 fir-io7-s1 kernel: LNetError: 118092:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 225 previous similar messages Mar 09 21:41:34 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.238@o2ib7: -125 Mar 09 21:41:34 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 10 previous similar messages Mar 09 21:44:16 fir-io7-s1 kernel: LNetError: 118444:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 09 21:44:16 fir-io7-s1 kernel: LNetError: 118444:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 345 previous similar messages Mar 09 21:46:23 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.236@o2ib7: 22 seconds Mar 09 21:46:23 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 335 previous similar messages Mar 09 21:50:46 fir-io7-s1 kernel: LNetError: 118736:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 21:50:46 fir-io7-s1 kernel: LNetError: 118736:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 238 previous similar messages Mar 09 21:51:34 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 Mar 09 21:51:34 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 14 previous similar messages Mar 09 21:54:21 fir-io7-s1 kernel: LNetError: 118736:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 09 21:54:21 fir-io7-s1 kernel: LNetError: 118736:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 367 previous similar messages Mar 09 21:56:26 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.237@o2ib7: 0 seconds Mar 09 21:56:26 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 235 previous similar messages Mar 09 22:00:46 fir-io7-s1 kernel: LNetError: 118780:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 22:00:46 fir-io7-s1 kernel: LNetError: 118780:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 220 previous similar messages Mar 09 22:01:39 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 Mar 09 22:01:39 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 17 previous similar messages Mar 09 22:04:21 fir-io7-s1 kernel: LNetError: 118736:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 09 22:04:21 fir-io7-s1 kernel: LNetError: 118736:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 359 previous similar messages Mar 09 22:06:37 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.232@o2ib7: 1 seconds Mar 09 22:06:37 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 151 previous similar messages Mar 09 22:10:46 fir-io7-s1 kernel: LNetError: 118736:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 22:10:46 fir-io7-s1 kernel: LNetError: 118736:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 241 previous similar messages Mar 09 22:11:40 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 Mar 09 22:11:40 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 13 previous similar messages Mar 09 22:14:21 fir-io7-s1 kernel: LNetError: 119507:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 09 22:14:21 fir-io7-s1 kernel: LNetError: 119507:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 354 previous similar messages Mar 09 22:16:38 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 6 seconds Mar 09 22:16:38 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 270 previous similar messages Mar 09 22:20:46 fir-io7-s1 kernel: LNetError: 118780:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 22:20:46 fir-io7-s1 kernel: LNetError: 118780:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 222 previous similar messages Mar 09 22:24:21 fir-io7-s1 kernel: LNetError: 119507:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 09 22:24:21 fir-io7-s1 kernel: LNetError: 119507:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 341 previous similar messages Mar 09 22:26:37 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 Mar 09 22:26:37 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 23 previous similar messages Mar 09 22:26:42 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.239@o2ib7: 1 seconds Mar 09 22:26:42 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 277 previous similar messages Mar 09 22:29:54 fir-io7-s1 kernel: LustreError: 66953:0:(tgt_grant.c:758:tgt_grant_check()) fir-OST004e: cli a387b29d-067d-4 claims 16752640 GRANT, real grant 49152 Mar 09 22:30:47 fir-io7-s1 kernel: LNetError: 120151:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 22:30:47 fir-io7-s1 kernel: LNetError: 120151:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 217 previous similar messages Mar 09 22:34:27 fir-io7-s1 kernel: LNetError: 120151:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 09 22:34:27 fir-io7-s1 kernel: LNetError: 120151:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 349 previous similar messages Mar 09 22:36:42 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.238@o2ib7: -125 Mar 09 22:36:42 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 13 previous similar messages Mar 09 22:36:46 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.219@o2ib7: 1 seconds Mar 09 22:36:46 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 257 previous similar messages Mar 09 22:40:47 fir-io7-s1 kernel: LNetError: 120431:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 22:40:47 fir-io7-s1 kernel: LNetError: 120431:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 219 previous similar messages Mar 09 22:41:02 fir-io7-s1 kernel: LustreError: 68788:0:(tgt_grant.c:758:tgt_grant_check()) fir-OST0050: cli 4683044c-87cf-4 claims 8183808 GRANT, real grant 57344 Mar 09 22:44:32 fir-io7-s1 kernel: LNetError: 120431:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 09 22:44:32 fir-io7-s1 kernel: LNetError: 120431:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 373 previous similar messages Mar 09 22:46:44 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 Mar 09 22:46:44 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages Mar 09 22:46:51 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds Mar 09 22:46:51 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 219 previous similar messages Mar 09 22:50:47 fir-io7-s1 kernel: LNetError: 75529:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 22:50:47 fir-io7-s1 kernel: LNetError: 75529:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 226 previous similar messages Mar 09 22:54:34 fir-io7-s1 kernel: LNetError: 120736:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 09 22:54:34 fir-io7-s1 kernel: LNetError: 120736:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 369 previous similar messages Mar 09 22:56:50 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 Mar 09 22:56:50 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages Mar 09 22:56:53 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.237@o2ib7: 0 seconds Mar 09 22:56:53 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 279 previous similar messages Mar 09 23:00:49 fir-io7-s1 kernel: LNetError: 121252:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 23:00:49 fir-io7-s1 kernel: LNetError: 121252:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 239 previous similar messages Mar 09 23:04:36 fir-io7-s1 kernel: LNetError: 121252:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 09 23:04:36 fir-io7-s1 kernel: LNetError: 121252:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 372 previous similar messages Mar 09 23:06:52 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 Mar 09 23:06:52 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 21 previous similar messages Mar 09 23:06:54 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.239@o2ib7: 0 seconds Mar 09 23:06:54 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 719 previous similar messages Mar 09 23:10:51 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 23:10:51 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 191 previous similar messages Mar 09 23:14:38 fir-io7-s1 kernel: LNetError: 121616:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 09 23:14:38 fir-io7-s1 kernel: LNetError: 121616:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 355 previous similar messages Mar 09 23:16:54 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 Mar 09 23:16:54 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 6 previous similar messages Mar 09 23:16:59 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 0 seconds Mar 09 23:16:59 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 433 previous similar messages Mar 09 23:20:51 fir-io7-s1 kernel: LNetError: 121616:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 23:20:51 fir-io7-s1 kernel: LNetError: 121616:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 186 previous similar messages Mar 09 23:24:40 fir-io7-s1 kernel: LNetError: 121978:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 09 23:24:40 fir-io7-s1 kernel: LNetError: 121978:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 341 previous similar messages Mar 09 23:26:54 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 Mar 09 23:26:54 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 11 previous similar messages Mar 09 23:27:00 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.232@o2ib7: 0 seconds Mar 09 23:27:00 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 473 previous similar messages Mar 09 23:30:55 fir-io7-s1 kernel: LNetError: 121978:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 23:30:55 fir-io7-s1 kernel: LNetError: 121978:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 197 previous similar messages Mar 09 23:34:41 fir-io7-s1 kernel: LNetError: 121978:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 09 23:34:41 fir-io7-s1 kernel: LNetError: 121978:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 375 previous similar messages Mar 09 23:36:54 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 Mar 09 23:36:54 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 17 previous similar messages Mar 09 23:37:01 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.232@o2ib7: 1 seconds Mar 09 23:37:01 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 647 previous similar messages Mar 09 23:40:56 fir-io7-s1 kernel: LNetError: 122687:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 23:40:56 fir-io7-s1 kernel: LNetError: 122687:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 253 previous similar messages Mar 09 23:44:41 fir-io7-s1 kernel: LNetError: 122780:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 09 23:44:41 fir-io7-s1 kernel: LNetError: 122780:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 363 previous similar messages Mar 09 23:46:59 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 Mar 09 23:46:59 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 14 previous similar messages Mar 09 23:47:07 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.232@o2ib7: 2 seconds Mar 09 23:47:07 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 408 previous similar messages Mar 09 23:50:56 fir-io7-s1 kernel: LNetError: 122780:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 09 23:50:56 fir-io7-s1 kernel: LNetError: 122780:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 233 previous similar messages Mar 09 23:54:41 fir-io7-s1 kernel: LNetError: 123044:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 09 23:54:41 fir-io7-s1 kernel: LNetError: 123044:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 354 previous similar messages Mar 09 23:56:59 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.235@o2ib7: -125 Mar 09 23:56:59 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 16 previous similar messages Mar 09 23:57:11 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.219@o2ib7: 0 seconds Mar 09 23:57:11 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 342 previous similar messages Mar 10 00:01:00 fir-io7-s1 kernel: LNetError: 123280:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 00:01:00 fir-io7-s1 kernel: LNetError: 123280:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 238 previous similar messages Mar 10 00:04:41 fir-io7-s1 kernel: LNetError: 123280:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 10 00:04:41 fir-io7-s1 kernel: LNetError: 123280:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 345 previous similar messages Mar 10 00:07:04 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 Mar 10 00:07:04 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages Mar 10 00:07:21 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.217@o2ib7: 0 seconds Mar 10 00:07:21 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 385 previous similar messages Mar 10 00:11:01 fir-io7-s1 kernel: LNetError: 123512:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 00:11:01 fir-io7-s1 kernel: LNetError: 123512:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 254 previous similar messages Mar 10 00:11:02 fir-io7-s1 kernel: Lustre: fir-OST004c: haven't heard from client b1f8c226-9e49-4 (at 10.50.8.1@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c812c96d800, cur 1583824262 expire 1583824112 last 1583824035 Mar 10 00:11:02 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 10 00:14:46 fir-io7-s1 kernel: LNetError: 123841:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 10 00:14:46 fir-io7-s1 kernel: LNetError: 123841:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 350 previous similar messages Mar 10 00:17:05 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 Mar 10 00:17:05 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 13 previous similar messages Mar 10 00:17:26 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 0 seconds Mar 10 00:17:26 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 318 previous similar messages Mar 10 00:17:52 fir-io7-s1 kernel: Lustre: 101318:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1583824665/real 1583824665] req@ffff9c512d89a880 x1652475054407872/t0(0) o106->fir-OST004c@10.50.9.37@o2ib2:15/16 lens 296/280 e 0 to 1 dl 1583824672 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Mar 10 00:17:52 fir-io7-s1 kernel: Lustre: 101318:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Mar 10 00:18:06 fir-io7-s1 kernel: Lustre: 101318:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1583824679/real 1583824679] req@ffff9c512d89a880 x1652475054407872/t0(0) o106->fir-OST004c@10.50.9.37@o2ib2:15/16 lens 296/280 e 0 to 1 dl 1583824686 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Mar 10 00:18:06 fir-io7-s1 kernel: Lustre: 101318:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 1 previous similar message Mar 10 00:18:27 fir-io7-s1 kernel: Lustre: 101318:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1583824700/real 1583824700] req@ffff9c512d89a880 x1652475054407872/t0(0) o106->fir-OST004c@10.50.9.37@o2ib2:15/16 lens 296/280 e 0 to 1 dl 1583824707 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Mar 10 00:18:27 fir-io7-s1 kernel: Lustre: 101318:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Mar 10 00:19:02 fir-io7-s1 kernel: Lustre: 101318:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1583824735/real 1583824735] req@ffff9c512d89a880 x1652475054407872/t0(0) o106->fir-OST004c@10.50.9.37@o2ib2:15/16 lens 296/280 e 0 to 1 dl 1583824742 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Mar 10 00:19:02 fir-io7-s1 kernel: Lustre: 101318:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 4 previous similar messages Mar 10 00:20:12 fir-io7-s1 kernel: Lustre: 101318:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1583824805/real 1583824805] req@ffff9c512d89a880 x1652475054407872/t0(0) o106->fir-OST004c@10.50.9.37@o2ib2:15/16 lens 296/280 e 0 to 1 dl 1583824812 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Mar 10 00:20:12 fir-io7-s1 kernel: Lustre: 101318:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 9 previous similar messages Mar 10 00:21:01 fir-io7-s1 kernel: LNetError: 123305:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 00:21:01 fir-io7-s1 kernel: LNetError: 123305:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 240 previous similar messages Mar 10 00:21:05 fir-io7-s1 kernel: Lustre: fir-OST004a: haven't heard from client f9209f4d-c3b0-4 (at 10.50.9.37@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c5a08761400, cur 1583824865 expire 1583824715 last 1583824638 Mar 10 00:21:05 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 10 00:24:51 fir-io7-s1 kernel: LNetError: 123841:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 10 00:24:51 fir-io7-s1 kernel: LNetError: 123841:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 355 previous similar messages Mar 10 00:27:07 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 Mar 10 00:27:07 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 20 previous similar messages Mar 10 00:27:29 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds Mar 10 00:27:29 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 499 previous similar messages Mar 10 00:27:40 fir-io7-s1 kernel: Lustre: fir-OST004e: haven't heard from client 24b4722a-2543-4 (at 10.50.9.27@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c4a6993b400, cur 1583825260 expire 1583825110 last 1583825033 Mar 10 00:27:40 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 10 00:28:26 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 2f501a16-5689-4 (at 10.50.9.27@o2ib2) Mar 10 00:28:26 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 10 00:31:06 fir-io7-s1 kernel: LNetError: 124294:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 00:31:06 fir-io7-s1 kernel: LNetError: 124294:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 240 previous similar messages Mar 10 00:34:53 fir-io7-s1 kernel: LNetError: 124294:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 10 00:34:53 fir-io7-s1 kernel: LNetError: 124294:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 355 previous similar messages Mar 10 00:37:08 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.234@o2ib7: -125 Mar 10 00:37:08 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 11 previous similar messages Mar 10 00:37:33 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.227@o2ib7: 0 seconds Mar 10 00:37:33 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 609 previous similar messages Mar 10 00:41:06 fir-io7-s1 kernel: LNetError: 124690:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 00:41:06 fir-io7-s1 kernel: LNetError: 124690:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 249 previous similar messages Mar 10 00:44:55 fir-io7-s1 kernel: LNetError: 124786:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 10 00:44:55 fir-io7-s1 kernel: LNetError: 124786:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 375 previous similar messages Mar 10 00:47:09 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 Mar 10 00:47:09 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 14 previous similar messages Mar 10 00:47:36 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.233@o2ib7: 0 seconds Mar 10 00:47:36 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 582 previous similar messages Mar 10 00:51:10 fir-io7-s1 kernel: LNetError: 124786:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 00:51:10 fir-io7-s1 kernel: LNetError: 124786:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 272 previous similar messages Mar 10 00:54:57 fir-io7-s1 kernel: LNetError: 124786:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 10 00:54:57 fir-io7-s1 kernel: LNetError: 124786:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 372 previous similar messages Mar 10 00:57:14 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.238@o2ib7: -125 Mar 10 00:57:14 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 20 previous similar messages Mar 10 00:57:38 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.236@o2ib7: 0 seconds Mar 10 00:57:38 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 561 previous similar messages Mar 10 01:01:11 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 01:01:11 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 250 previous similar messages Mar 10 01:04:59 fir-io7-s1 kernel: LNetError: 124786:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 10 01:04:59 fir-io7-s1 kernel: LNetError: 124786:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 389 previous similar messages Mar 10 01:07:15 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 Mar 10 01:07:15 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 7 previous similar messages Mar 10 01:07:42 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 0 seconds Mar 10 01:07:42 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 341 previous similar messages Mar 10 01:11:11 fir-io7-s1 kernel: LNetError: 125674:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 01:11:11 fir-io7-s1 kernel: LNetError: 125674:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 234 previous similar messages Mar 10 01:15:01 fir-io7-s1 kernel: LNetError: 124786:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 10 01:15:01 fir-io7-s1 kernel: LNetError: 124786:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 387 previous similar messages Mar 10 01:17:43 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.232@o2ib7: 0 seconds Mar 10 01:17:43 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 321 previous similar messages Mar 10 01:21:11 fir-io7-s1 kernel: LNetError: 125674:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 01:21:11 fir-io7-s1 kernel: LNetError: 125674:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 242 previous similar messages Mar 10 01:22:12 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 Mar 10 01:22:12 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 23 previous similar messages Mar 10 01:25:03 fir-io7-s1 kernel: LNetError: 126036:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 10 01:25:03 fir-io7-s1 kernel: LNetError: 126036:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 436 previous similar messages Mar 10 01:27:50 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.217@o2ib7: 0 seconds Mar 10 01:27:50 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 186 previous similar messages Mar 10 01:31:11 fir-io7-s1 kernel: LNetError: 125674:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 01:31:11 fir-io7-s1 kernel: LNetError: 125674:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 230 previous similar messages Mar 10 01:32:20 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 Mar 10 01:32:20 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 14 previous similar messages Mar 10 01:35:05 fir-io7-s1 kernel: LNetError: 126036:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 10 01:35:05 fir-io7-s1 kernel: LNetError: 126036:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 398 previous similar messages Mar 10 01:37:54 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.237@o2ib7: 0 seconds Mar 10 01:37:54 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 178 previous similar messages Mar 10 01:41:16 fir-io7-s1 kernel: LNetError: 126871:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 01:41:16 fir-io7-s1 kernel: LNetError: 126871:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 230 previous similar messages Mar 10 01:42:25 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.238@o2ib7: -125 Mar 10 01:42:25 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 20 previous similar messages Mar 10 01:45:08 fir-io7-s1 kernel: LNetError: 126871:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 10 01:45:08 fir-io7-s1 kernel: LNetError: 126871:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 430 previous similar messages Mar 10 01:47:56 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.226@o2ib7: 0 seconds Mar 10 01:47:56 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 125 previous similar messages Mar 10 01:51:16 fir-io7-s1 kernel: LNetError: 127090:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 01:51:16 fir-io7-s1 kernel: LNetError: 127090:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 232 previous similar messages Mar 10 01:55:10 fir-io7-s1 kernel: LNetError: 127090:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 10 01:55:10 fir-io7-s1 kernel: LNetError: 127090:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 455 previous similar messages Mar 10 01:57:24 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 Mar 10 01:57:24 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 18 previous similar messages Mar 10 01:58:04 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 0 seconds Mar 10 01:58:04 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 286 previous similar messages Mar 10 02:01:21 fir-io7-s1 kernel: LNetError: 127090:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 02:01:21 fir-io7-s1 kernel: LNetError: 127090:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 201 previous similar messages Mar 10 02:05:12 fir-io7-s1 kernel: LNetError: 127090:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 10 02:05:12 fir-io7-s1 kernel: LNetError: 127090:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 393 previous similar messages Mar 10 02:07:24 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 Mar 10 02:07:24 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages Mar 10 02:08:07 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.236@o2ib7: 1 seconds Mar 10 02:08:07 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 629 previous similar messages Mar 10 02:11:21 fir-io7-s1 kernel: LNetError: 127432:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 02:11:21 fir-io7-s1 kernel: LNetError: 127432:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 200 previous similar messages Mar 10 02:15:14 fir-io7-s1 kernel: LNetError: 127090:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 10 02:15:14 fir-io7-s1 kernel: LNetError: 127090:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 473 previous similar messages Mar 10 02:17:30 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 Mar 10 02:17:30 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 13 previous similar messages Mar 10 02:18:10 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 0 seconds Mar 10 02:18:10 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 540 previous similar messages Mar 10 02:21:21 fir-io7-s1 kernel: LNetError: 128251:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 02:21:21 fir-io7-s1 kernel: LNetError: 128251:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 281 previous similar messages Mar 10 02:25:15 fir-io7-s1 kernel: LNetError: 128173:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 10 02:25:15 fir-io7-s1 kernel: LNetError: 128173:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 463 previous similar messages Mar 10 02:27:31 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 Mar 10 02:27:31 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 18 previous similar messages Mar 10 02:28:11 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.237@o2ib7: 0 seconds Mar 10 02:28:11 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 528 previous similar messages Mar 10 02:31:21 fir-io7-s1 kernel: LNetError: 128436:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 02:31:21 fir-io7-s1 kernel: LNetError: 128436:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 269 previous similar messages Mar 10 02:35:17 fir-io7-s1 kernel: LNetError: 128173:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 10 02:35:17 fir-io7-s1 kernel: LNetError: 128173:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 448 previous similar messages Mar 10 02:37:35 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.238@o2ib7: -125 Mar 10 02:37:35 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 17 previous similar messages Mar 10 02:38:18 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds Mar 10 02:38:18 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 448 previous similar messages Mar 10 02:41:21 fir-io7-s1 kernel: LNetError: 128826:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 02:41:21 fir-io7-s1 kernel: LNetError: 128826:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 256 previous similar messages Mar 10 02:45:19 fir-io7-s1 kernel: LNetError: 128879:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 10 02:45:19 fir-io7-s1 kernel: LNetError: 128879:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 437 previous similar messages Mar 10 02:47:35 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 Mar 10 02:47:35 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 10 previous similar messages Mar 10 02:48:20 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.233@o2ib7: 0 seconds Mar 10 02:48:20 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 492 previous similar messages Mar 10 02:51:21 fir-io7-s1 kernel: LNetError: 129196:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 02:51:21 fir-io7-s1 kernel: LNetError: 129196:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 270 previous similar messages Mar 10 02:55:21 fir-io7-s1 kernel: LNetError: 129422:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 10 02:55:21 fir-io7-s1 kernel: LNetError: 129422:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 430 previous similar messages Mar 10 02:57:37 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 Mar 10 02:57:37 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 22 previous similar messages Mar 10 02:58:23 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.233@o2ib7: 0 seconds Mar 10 02:58:23 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 294 previous similar messages Mar 10 03:01:21 fir-io7-s1 kernel: LNetError: 129422:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 03:01:21 fir-io7-s1 kernel: LNetError: 129422:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 232 previous similar messages Mar 10 03:05:21 fir-io7-s1 kernel: LNetError: 129787:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 10 03:05:21 fir-io7-s1 kernel: LNetError: 129787:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 358 previous similar messages Mar 10 03:07:40 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 Mar 10 03:07:40 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 9 previous similar messages Mar 10 03:08:25 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.219@o2ib7: 0 seconds Mar 10 03:08:25 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 262 previous similar messages Mar 10 03:11:21 fir-io7-s1 kernel: LNetError: 129787:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 03:11:21 fir-io7-s1 kernel: LNetError: 129787:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 249 previous similar messages Mar 10 03:15:21 fir-io7-s1 kernel: LNetError: 130140:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 10 03:15:21 fir-io7-s1 kernel: LNetError: 130140:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 446 previous similar messages Mar 10 03:17:41 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 Mar 10 03:17:41 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 17 previous similar messages Mar 10 03:18:27 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds Mar 10 03:18:27 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 280 previous similar messages Mar 10 03:21:21 fir-io7-s1 kernel: LNetError: 130372:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 03:21:21 fir-io7-s1 kernel: LNetError: 130372:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 236 previous similar messages Mar 10 03:25:21 fir-io7-s1 kernel: LNetError: 130140:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 10 03:25:21 fir-io7-s1 kernel: LNetError: 130140:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 387 previous similar messages Mar 10 03:27:45 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.238@o2ib7: -125 Mar 10 03:27:45 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 17 previous similar messages Mar 10 03:28:37 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 5 seconds Mar 10 03:28:37 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 412 previous similar messages Mar 10 03:31:26 fir-io7-s1 kernel: LNetError: 130650:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 03:31:26 fir-io7-s1 kernel: LNetError: 130650:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 243 previous similar messages Mar 10 03:35:21 fir-io7-s1 kernel: LNetError: 130650:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 10 03:35:21 fir-io7-s1 kernel: LNetError: 130650:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 394 previous similar messages Mar 10 03:37:46 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 Mar 10 03:37:46 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 16 previous similar messages Mar 10 03:38:38 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.219@o2ib7: 6 seconds Mar 10 03:38:38 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 371 previous similar messages Mar 10 03:41:26 fir-io7-s1 kernel: LNetError: 130757:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 03:41:26 fir-io7-s1 kernel: LNetError: 130757:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 242 previous similar messages Mar 10 03:45:21 fir-io7-s1 kernel: LNetError: 75529:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 10 03:45:21 fir-io7-s1 kernel: LNetError: 75529:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 404 previous similar messages Mar 10 03:47:47 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 Mar 10 03:47:47 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 16 previous similar messages Mar 10 03:48:42 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds Mar 10 03:48:42 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 401 previous similar messages Mar 10 03:51:27 fir-io7-s1 kernel: LNetError: 130965:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 03:51:27 fir-io7-s1 kernel: LNetError: 130965:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 261 previous similar messages Mar 10 03:55:27 fir-io7-s1 kernel: LNetError: 478:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 10 03:55:27 fir-io7-s1 kernel: LNetError: 478:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 406 previous similar messages Mar 10 03:57:29 fir-io7-s1 kernel: LustreError: 68882:0:(tgt_grant.c:758:tgt_grant_check()) fir-OST0050: cli b6e74d6f-535a-4 claims 86016 GRANT, real grant 28672 Mar 10 03:57:51 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 Mar 10 03:57:51 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 9 previous similar messages Mar 10 03:58:46 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.232@o2ib7: 0 seconds Mar 10 03:58:46 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 356 previous similar messages Mar 10 04:01:27 fir-io7-s1 kernel: LNetError: 793:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 04:01:27 fir-io7-s1 kernel: LNetError: 793:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 226 previous similar messages Mar 10 04:05:32 fir-io7-s1 kernel: LNetError: 934:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 10 04:05:32 fir-io7-s1 kernel: LNetError: 934:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 376 previous similar messages Mar 10 04:08:48 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 0 seconds Mar 10 04:08:48 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 335 previous similar messages Mar 10 04:11:27 fir-io7-s1 kernel: LNetError: 1232:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 04:11:27 fir-io7-s1 kernel: LNetError: 1232:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 241 previous similar messages Mar 10 04:12:48 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.234@o2ib7: -125 Mar 10 04:12:48 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 22 previous similar messages Mar 10 04:15:32 fir-io7-s1 kernel: LNetError: 1486:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 10 04:15:32 fir-io7-s1 kernel: LNetError: 1486:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 409 previous similar messages Mar 10 04:18:56 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.236@o2ib7: 0 seconds Mar 10 04:18:56 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 151 previous similar messages Mar 10 04:21:27 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 04:21:27 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 206 previous similar messages Mar 10 04:22:49 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 Mar 10 04:22:49 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages Mar 10 04:25:32 fir-io7-s1 kernel: LNetError: 1486:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 10 04:25:32 fir-io7-s1 kernel: LNetError: 1486:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 436 previous similar messages Mar 10 04:28:57 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.236@o2ib7: 0 seconds Mar 10 04:28:57 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 581 previous similar messages Mar 10 04:31:32 fir-io7-s1 kernel: LNetError: 2007:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 04:31:32 fir-io7-s1 kernel: LNetError: 2007:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 243 previous similar messages Mar 10 04:32:53 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.238@o2ib7: -125 Mar 10 04:32:53 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 11 previous similar messages Mar 10 04:35:32 fir-io7-s1 kernel: LNetError: 2007:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 10 04:35:32 fir-io7-s1 kernel: LNetError: 2007:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 456 previous similar messages Mar 10 04:39:05 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.226@o2ib7: 0 seconds Mar 10 04:39:05 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 562 previous similar messages Mar 10 04:41:32 fir-io7-s1 kernel: LNetError: 2007:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 04:41:32 fir-io7-s1 kernel: LNetError: 2007:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 267 previous similar messages Mar 10 04:42:53 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 Mar 10 04:42:53 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages Mar 10 04:45:32 fir-io7-s1 kernel: LNetError: 2007:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 10 04:45:32 fir-io7-s1 kernel: LNetError: 2007:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 442 previous similar messages Mar 10 04:49:09 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.238@o2ib7: 6 seconds Mar 10 04:49:09 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 444 previous similar messages Mar 10 04:51:32 fir-io7-s1 kernel: LNetError: 2734:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 04:51:32 fir-io7-s1 kernel: LNetError: 2734:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 262 previous similar messages Mar 10 04:52:55 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 Mar 10 04:52:55 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 14 previous similar messages Mar 10 04:55:32 fir-io7-s1 kernel: LNetError: 2714:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 10 04:55:32 fir-io7-s1 kernel: LNetError: 2714:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 453 previous similar messages Mar 10 04:59:12 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.238@o2ib7: 0 seconds Mar 10 04:59:12 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 249 previous similar messages Mar 10 05:01:32 fir-io7-s1 kernel: LNetError: 2714:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 05:01:32 fir-io7-s1 kernel: LNetError: 2714:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 223 previous similar messages Mar 10 05:02:58 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.238@o2ib7: -125 Mar 10 05:02:58 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages Mar 10 05:05:32 fir-io7-s1 kernel: LNetError: 2714:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 10 05:05:32 fir-io7-s1 kernel: LNetError: 2714:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 445 previous similar messages Mar 10 05:09:25 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.232@o2ib7: 0 seconds Mar 10 05:09:25 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 66 previous similar messages Mar 10 05:11:32 fir-io7-s1 kernel: LNetError: 3426:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 05:11:32 fir-io7-s1 kernel: LNetError: 3426:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 205 previous similar messages Mar 10 05:13:00 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 Mar 10 05:13:00 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages Mar 10 05:15:32 fir-io7-s1 kernel: LNetError: 3426:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 10 05:15:32 fir-io7-s1 kernel: LNetError: 3426:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 401 previous similar messages Mar 10 05:19:26 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.227@o2ib7: 0 seconds Mar 10 05:19:26 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 281 previous similar messages Mar 10 05:21:32 fir-io7-s1 kernel: LNetError: 3426:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 05:21:32 fir-io7-s1 kernel: LNetError: 3426:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 252 previous similar messages Mar 10 05:23:05 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 Mar 10 05:23:05 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages Mar 10 05:25:32 fir-io7-s1 kernel: LNetError: 85823:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 10 05:25:32 fir-io7-s1 kernel: LNetError: 85823:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 445 previous similar messages Mar 10 05:29:27 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 0 seconds Mar 10 05:29:27 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 211 previous similar messages Mar 10 05:31:32 fir-io7-s1 kernel: LNetError: 3988:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 05:31:32 fir-io7-s1 kernel: LNetError: 3988:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 229 previous similar messages Mar 10 05:33:05 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 Mar 10 05:33:05 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages Mar 10 05:35:32 fir-io7-s1 kernel: LNetError: 3988:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 10 05:35:32 fir-io7-s1 kernel: LNetError: 3988:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 448 previous similar messages Mar 10 05:39:32 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.217@o2ib7: 0 seconds Mar 10 05:39:32 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 145 previous similar messages Mar 10 05:41:35 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 05:41:35 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 216 previous similar messages Mar 10 05:43:10 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 Mar 10 05:43:10 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages Mar 10 05:45:37 fir-io7-s1 kernel: LNetError: 3988:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 10 05:45:37 fir-io7-s1 kernel: LNetError: 3988:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 461 previous similar messages Mar 10 05:50:10 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.236@o2ib7: 0 seconds Mar 10 05:50:10 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 68 previous similar messages Mar 10 05:51:37 fir-io7-s1 kernel: LNetError: 5050:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 05:51:37 fir-io7-s1 kernel: LNetError: 5050:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 197 previous similar messages Mar 10 05:53:10 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 Mar 10 05:53:10 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 19 previous similar messages Mar 10 05:55:37 fir-io7-s1 kernel: LNetError: 5050:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 10 05:55:37 fir-io7-s1 kernel: LNetError: 5050:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 408 previous similar messages Mar 10 06:00:13 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 1 seconds Mar 10 06:00:13 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 397 previous similar messages Mar 10 06:01:37 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 06:01:37 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 195 previous similar messages Mar 10 06:03:10 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 Mar 10 06:03:10 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 9 previous similar messages Mar 10 06:05:37 fir-io7-s1 kernel: LNetError: 5230:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 10 06:05:37 fir-io7-s1 kernel: LNetError: 5230:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 376 previous similar messages Mar 10 06:10:22 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.237@o2ib7: 0 seconds Mar 10 06:10:22 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 324 previous similar messages Mar 10 06:11:37 fir-io7-s1 kernel: LNetError: 5559:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 06:11:37 fir-io7-s1 kernel: LNetError: 5559:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 243 previous similar messages Mar 10 06:13:13 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.238@o2ib7: -125 Mar 10 06:13:13 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 14 previous similar messages Mar 10 06:15:37 fir-io7-s1 kernel: LNetError: 5559:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 10 06:15:37 fir-io7-s1 kernel: LNetError: 5559:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 419 previous similar messages Mar 10 06:20:30 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.219@o2ib7: 0 seconds Mar 10 06:20:30 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 206 previous similar messages Mar 10 06:21:37 fir-io7-s1 kernel: LNetError: 6131:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 06:21:37 fir-io7-s1 kernel: LNetError: 6131:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 251 previous similar messages Mar 10 06:23:13 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 Mar 10 06:23:13 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages Mar 10 06:25:37 fir-io7-s1 kernel: LNetError: 6131:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 10 06:25:37 fir-io7-s1 kernel: LNetError: 6131:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 433 previous similar messages Mar 10 06:30:33 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.236@o2ib7: 0 seconds Mar 10 06:30:33 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 198 previous similar messages Mar 10 06:31:37 fir-io7-s1 kernel: LNetError: 6337:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 06:31:37 fir-io7-s1 kernel: LNetError: 6337:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 254 previous similar messages Mar 10 06:33:13 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 Mar 10 06:33:13 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 14 previous similar messages Mar 10 06:35:37 fir-io7-s1 kernel: LNetError: 6337:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 10 06:35:37 fir-io7-s1 kernel: LNetError: 6337:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 383 previous similar messages Mar 10 06:40:37 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 0 seconds Mar 10 06:40:37 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 261 previous similar messages Mar 10 06:41:37 fir-io7-s1 kernel: LNetError: 6864:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 06:41:37 fir-io7-s1 kernel: LNetError: 6864:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 243 previous similar messages Mar 10 06:43:18 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.238@o2ib7: -125 Mar 10 06:43:18 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 13 previous similar messages Mar 10 06:45:37 fir-io7-s1 kernel: LNetError: 6864:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 10 06:45:37 fir-io7-s1 kernel: LNetError: 6864:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 442 previous similar messages Mar 10 06:50:52 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.217@o2ib7: 0 seconds Mar 10 06:50:52 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 104 previous similar messages Mar 10 06:51:37 fir-io7-s1 kernel: LNetError: 6864:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 06:51:37 fir-io7-s1 kernel: LNetError: 6864:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 209 previous similar messages Mar 10 06:53:19 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 Mar 10 06:53:19 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages Mar 10 06:55:37 fir-io7-s1 kernel: LNetError: 6864:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 10 06:55:37 fir-io7-s1 kernel: LNetError: 6864:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 423 previous similar messages Mar 10 07:00:58 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.226@o2ib7: 0 seconds Mar 10 07:00:58 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 63 previous similar messages Mar 10 07:01:37 fir-io7-s1 kernel: LNetError: 5643:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 07:01:37 fir-io7-s1 kernel: LNetError: 5643:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 197 previous similar messages Mar 10 07:03:25 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.235@o2ib7: -125 Mar 10 07:03:25 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 12 previous similar messages Mar 10 07:05:37 fir-io7-s1 kernel: LNetError: 75529:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 10 07:05:37 fir-io7-s1 kernel: LNetError: 75529:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 381 previous similar messages Mar 10 07:11:18 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.219@o2ib7: 0 seconds Mar 10 07:11:18 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 95 previous similar messages Mar 10 07:11:37 fir-io7-s1 kernel: LNetError: 6864:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 07:11:37 fir-io7-s1 kernel: LNetError: 6864:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 206 previous similar messages Mar 10 07:13:25 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.238@o2ib7: -125 Mar 10 07:13:25 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 19 previous similar messages Mar 10 07:15:37 fir-io7-s1 kernel: LNetError: 85823:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 10 07:15:37 fir-io7-s1 kernel: LNetError: 85823:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 418 previous similar messages Mar 10 07:21:19 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 1 seconds Mar 10 07:21:19 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 130 previous similar messages Mar 10 07:21:40 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 07:21:40 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 223 previous similar messages Mar 10 07:23:30 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 Mar 10 07:23:30 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 11 previous similar messages Mar 10 07:25:47 fir-io7-s1 kernel: LNetError: 8329:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 10 07:25:47 fir-io7-s1 kernel: LNetError: 8329:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 454 previous similar messages Mar 10 07:31:31 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.232@o2ib7: 0 seconds Mar 10 07:31:31 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 292 previous similar messages Mar 10 07:31:42 fir-io7-s1 kernel: LNetError: 8329:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 07:31:42 fir-io7-s1 kernel: LNetError: 8329:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 245 previous similar messages Mar 10 07:33:33 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 Mar 10 07:33:33 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 22 previous similar messages Mar 10 07:35:47 fir-io7-s1 kernel: LNetError: 8329:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 10 07:35:47 fir-io7-s1 kernel: LNetError: 8329:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 433 previous similar messages Mar 10 07:41:33 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.217@o2ib7: 1 seconds Mar 10 07:41:33 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 555 previous similar messages Mar 10 07:41:47 fir-io7-s1 kernel: LNetError: 8329:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 07:41:47 fir-io7-s1 kernel: LNetError: 8329:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 222 previous similar messages Mar 10 07:43:36 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 Mar 10 07:43:36 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 8 previous similar messages Mar 10 07:45:52 fir-io7-s1 kernel: LNetError: 9101:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 10 07:45:52 fir-io7-s1 kernel: LNetError: 9101:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 416 previous similar messages Mar 10 07:51:36 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.237@o2ib7: 0 seconds Mar 10 07:51:36 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 693 previous similar messages Mar 10 07:51:52 fir-io7-s1 kernel: LNetError: 85823:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 07:51:52 fir-io7-s1 kernel: LNetError: 85823:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 227 previous similar messages Mar 10 07:53:38 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 Mar 10 07:53:38 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 20 previous similar messages Mar 10 07:55:52 fir-io7-s1 kernel: LNetError: 9101:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 10 07:55:52 fir-io7-s1 kernel: LNetError: 9101:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 436 previous similar messages Mar 10 08:01:37 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.233@o2ib7: 1 seconds Mar 10 08:01:37 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 324 previous similar messages Mar 10 08:01:52 fir-io7-s1 kernel: LNetError: 9101:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 08:01:52 fir-io7-s1 kernel: LNetError: 9101:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 259 previous similar messages Mar 10 08:03:40 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.239@o2ib7: -125 Mar 10 08:03:40 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 13 previous similar messages Mar 10 08:05:57 fir-io7-s1 kernel: LNetError: 9725:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 10 08:05:57 fir-io7-s1 kernel: LNetError: 9725:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 381 previous similar messages Mar 10 08:11:38 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.238@o2ib7: 1 seconds Mar 10 08:11:38 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 620 previous similar messages Mar 10 08:11:52 fir-io7-s1 kernel: LNetError: 9909:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 08:11:52 fir-io7-s1 kernel: LNetError: 9909:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 305 previous similar messages Mar 10 08:13:41 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 Mar 10 08:13:41 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 11 previous similar messages Mar 10 08:15:57 fir-io7-s1 kernel: LNetError: 10118:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 10 08:15:57 fir-io7-s1 kernel: LNetError: 10118:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 468 previous similar messages Mar 10 08:21:44 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.233@o2ib7: 0 seconds Mar 10 08:21:44 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 511 previous similar messages Mar 10 08:21:52 fir-io7-s1 kernel: LNetError: 10297:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 08:21:52 fir-io7-s1 kernel: LNetError: 10297:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 274 previous similar messages Mar 10 08:23:42 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 Mar 10 08:23:42 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 18 previous similar messages Mar 10 08:26:02 fir-io7-s1 kernel: LNetError: 10297:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 10 08:26:02 fir-io7-s1 kernel: LNetError: 10297:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 394 previous similar messages Mar 10 08:31:52 fir-io7-s1 kernel: LNetError: 10572:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 08:31:52 fir-io7-s1 kernel: LNetError: 10572:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 260 previous similar messages Mar 10 08:32:00 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 0 seconds Mar 10 08:32:00 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 324 previous similar messages Mar 10 08:33:46 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.238@o2ib7: -125 Mar 10 08:33:46 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 16 previous similar messages Mar 10 08:36:12 fir-io7-s1 kernel: LNetError: 10776:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 10 08:36:12 fir-io7-s1 kernel: LNetError: 10776:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 418 previous similar messages Mar 10 08:41:52 fir-io7-s1 kernel: LNetError: 10684:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 08:41:52 fir-io7-s1 kernel: LNetError: 10684:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 274 previous similar messages Mar 10 08:42:01 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.232@o2ib7: 0 seconds Mar 10 08:42:01 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 543 previous similar messages Mar 10 08:46:17 fir-io7-s1 kernel: LNetError: 10776:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 10 08:46:17 fir-io7-s1 kernel: LNetError: 10776:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 452 previous similar messages Mar 10 08:48:43 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.239@o2ib7: -125 Mar 10 08:48:43 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 18 previous similar messages Mar 10 08:51:52 fir-io7-s1 kernel: LNetError: 10776:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 08:51:52 fir-io7-s1 kernel: LNetError: 10776:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 272 previous similar messages Mar 10 08:52:02 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.239@o2ib7: 0 seconds Mar 10 08:52:02 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 392 previous similar messages Mar 10 08:56:17 fir-io7-s1 kernel: LNetError: 11485:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 10 08:56:17 fir-io7-s1 kernel: LNetError: 11485:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 394 previous similar messages Mar 10 08:58:45 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 Mar 10 08:58:45 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages Mar 10 09:01:52 fir-io7-s1 kernel: LNetError: 11485:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 09:01:52 fir-io7-s1 kernel: LNetError: 11485:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 253 previous similar messages Mar 10 09:02:05 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 0 seconds Mar 10 09:02:05 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 282 previous similar messages Mar 10 09:06:17 fir-io7-s1 kernel: LNetError: 11485:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 10 09:06:17 fir-io7-s1 kernel: LNetError: 11485:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 430 previous similar messages Mar 10 09:08:51 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 Mar 10 09:08:51 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages Mar 10 09:11:52 fir-io7-s1 kernel: LNetError: 12038:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 09:11:52 fir-io7-s1 kernel: LNetError: 12038:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 286 previous similar messages Mar 10 09:12:08 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.225@o2ib7: 0 seconds Mar 10 09:12:08 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 420 previous similar messages Mar 10 09:16:22 fir-io7-s1 kernel: LNetError: 12004:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 10 09:16:22 fir-io7-s1 kernel: LNetError: 12004:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 431 previous similar messages Mar 10 09:21:52 fir-io7-s1 kernel: LNetError: 12354:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 09:21:52 fir-io7-s1 kernel: LNetError: 12354:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 277 previous similar messages Mar 10 09:22:13 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.237@o2ib7: 0 seconds Mar 10 09:22:13 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 379 previous similar messages Mar 10 09:23:50 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 Mar 10 09:23:50 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 23 previous similar messages Mar 10 09:26:27 fir-io7-s1 kernel: LNetError: 12354:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 10 09:26:27 fir-io7-s1 kernel: LNetError: 12354:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 415 previous similar messages Mar 10 09:31:52 fir-io7-s1 kernel: LNetError: 12216:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 09:31:52 fir-io7-s1 kernel: LNetError: 12216:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 255 previous similar messages Mar 10 09:32:18 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.237@o2ib7: 0 seconds Mar 10 09:32:18 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 375 previous similar messages Mar 10 09:33:56 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 Mar 10 09:33:56 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages Mar 10 09:36:32 fir-io7-s1 kernel: LNetError: 13110:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 10 09:36:32 fir-io7-s1 kernel: LNetError: 13110:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 437 previous similar messages Mar 10 09:41:55 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 09:41:55 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 237 previous similar messages Mar 10 09:42:22 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.219@o2ib7: 0 seconds Mar 10 09:42:22 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 183 previous similar messages Mar 10 09:44:01 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.238@o2ib7: -125 Mar 10 09:44:01 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 20 previous similar messages Mar 10 09:46:32 fir-io7-s1 kernel: LNetError: 13350:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 10 09:46:32 fir-io7-s1 kernel: LNetError: 13350:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 451 previous similar messages Mar 10 09:51:57 fir-io7-s1 kernel: LNetError: 13557:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 09:51:57 fir-io7-s1 kernel: LNetError: 13557:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 207 previous similar messages Mar 10 09:52:31 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.239@o2ib7: 0 seconds Mar 10 09:52:31 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 102 previous similar messages Mar 10 09:54:02 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.234@o2ib7: -125 Mar 10 09:54:02 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 15 previous similar messages Mar 10 09:54:59 fir-io7-s1 kernel: LustreError: 40844:0:(tgt_grant.c:758:tgt_grant_check()) fir-OST0050: cli 83308672-91dd-4 claims 28672 GRANT, real grant 24576 Mar 10 09:56:32 fir-io7-s1 kernel: LNetError: 13557:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 10 09:56:32 fir-io7-s1 kernel: LNetError: 13557:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 397 previous similar messages Mar 10 10:01:57 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 10:01:57 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 223 previous similar messages Mar 10 10:02:32 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.239@o2ib7: 0 seconds Mar 10 10:02:32 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 411 previous similar messages Mar 10 10:04:03 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 Mar 10 10:04:03 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 17 previous similar messages Mar 10 10:06:32 fir-io7-s1 kernel: LNetError: 13875:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 10 10:06:32 fir-io7-s1 kernel: LNetError: 13875:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 436 previous similar messages Mar 10 10:11:57 fir-io7-s1 kernel: LNetError: 14399:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 10:11:57 fir-io7-s1 kernel: LNetError: 14399:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 251 previous similar messages Mar 10 10:12:36 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.239@o2ib7: 0 seconds Mar 10 10:12:36 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 200 previous similar messages Mar 10 10:14:05 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 Mar 10 10:14:05 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 9 previous similar messages Mar 10 10:16:37 fir-io7-s1 kernel: LNetError: 14399:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 10 10:16:37 fir-io7-s1 kernel: LNetError: 14399:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 402 previous similar messages Mar 10 10:21:57 fir-io7-s1 kernel: LNetError: 13718:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 10:21:57 fir-io7-s1 kernel: LNetError: 13718:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 235 previous similar messages Mar 10 10:22:40 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.239@o2ib7: 0 seconds Mar 10 10:22:40 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 143 previous similar messages Mar 10 10:24:06 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 Mar 10 10:24:06 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 14 previous similar messages Mar 10 10:26:37 fir-io7-s1 kernel: LNetError: 14610:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 10 10:26:37 fir-io7-s1 kernel: LNetError: 14610:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 382 previous similar messages Mar 10 10:31:57 fir-io7-s1 kernel: LNetError: 14610:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 10:31:57 fir-io7-s1 kernel: LNetError: 14610:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 271 previous similar messages Mar 10 10:32:41 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 0 seconds Mar 10 10:32:41 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 444 previous similar messages Mar 10 10:34:08 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.232@o2ib7: -125 Mar 10 10:34:08 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 22 previous similar messages Mar 10 10:36:37 fir-io7-s1 kernel: LNetError: 14610:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 10 10:36:37 fir-io7-s1 kernel: LNetError: 14610:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 401 previous similar messages Mar 10 10:41:57 fir-io7-s1 kernel: LNetError: 14610:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 10:41:57 fir-io7-s1 kernel: LNetError: 14610:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 250 previous similar messages Mar 10 10:42:43 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 5 seconds Mar 10 10:42:43 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 247 previous similar messages Mar 10 10:44:10 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 Mar 10 10:44:10 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 9 previous similar messages Mar 10 10:46:37 fir-io7-s1 kernel: LNetError: 15501:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 10 10:46:37 fir-io7-s1 kernel: LNetError: 15501:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 418 previous similar messages Mar 10 10:51:57 fir-io7-s1 kernel: LNetError: 15831:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 10:51:57 fir-io7-s1 kernel: LNetError: 15831:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 253 previous similar messages Mar 10 10:52:57 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.239@o2ib7: 0 seconds Mar 10 10:52:57 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 296 previous similar messages Mar 10 10:54:10 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.233@o2ib7: -125 Mar 10 10:54:10 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 17 previous similar messages Mar 10 10:56:37 fir-io7-s1 kernel: LNetError: 15831:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 10 10:56:37 fir-io7-s1 kernel: LNetError: 15831:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 396 previous similar messages Mar 10 11:01:57 fir-io7-s1 kernel: LNetError: 15831:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 11:01:57 fir-io7-s1 kernel: LNetError: 15831:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 220 previous similar messages Mar 10 11:02:58 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.232@o2ib7: 3 seconds Mar 10 11:02:58 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 111 previous similar messages Mar 10 11:04:15 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.239@o2ib7: -125 Mar 10 11:04:15 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 17 previous similar messages Mar 10 11:06:37 fir-io7-s1 kernel: LNetError: 15831:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 10 11:06:37 fir-io7-s1 kernel: LNetError: 15831:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 406 previous similar messages Mar 10 11:11:57 fir-io7-s1 kernel: LNetError: 16370:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 11:11:57 fir-io7-s1 kernel: LNetError: 16370:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 220 previous similar messages Mar 10 11:13:02 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.235@o2ib7: 0 seconds Mar 10 11:13:02 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 152 previous similar messages Mar 10 11:14:15 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 Mar 10 11:14:15 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 10 previous similar messages Mar 10 11:16:37 fir-io7-s1 kernel: LNetError: 16564:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 10 11:16:37 fir-io7-s1 kernel: LNetError: 16564:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 385 previous similar messages Mar 10 11:21:57 fir-io7-s1 kernel: LNetError: 16745:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 11:21:57 fir-io7-s1 kernel: LNetError: 16745:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 200 previous similar messages Mar 10 11:23:22 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.236@o2ib7: 0 seconds Mar 10 11:23:22 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 71 previous similar messages Mar 10 11:24:15 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.236@o2ib7: -125 Mar 10 11:24:15 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 18 previous similar messages Mar 10 11:26:37 fir-io7-s1 kernel: LNetError: 16745:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 10 11:26:37 fir-io7-s1 kernel: LNetError: 16745:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 348 previous similar messages Mar 10 11:31:57 fir-io7-s1 kernel: LNetError: 17372:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 11:31:57 fir-io7-s1 kernel: LNetError: 17372:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 171 previous similar messages Mar 10 11:33:42 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.232@o2ib7: 0 seconds Mar 10 11:33:42 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 97 previous similar messages Mar 10 11:36:37 fir-io7-s1 kernel: LNetError: 17372:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 10 11:36:37 fir-io7-s1 kernel: LNetError: 17372:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 368 previous similar messages Mar 10 11:36:54 fir-io7-s1 kernel: Lustre: fir-OST004a: Connection restored to eccb4691-6953-4 (at 10.50.8.35@o2ib2) Mar 10 11:36:54 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 10 11:41:57 fir-io7-s1 kernel: LNetError: 17939:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 11:41:57 fir-io7-s1 kernel: LNetError: 17939:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 152 previous similar messages Mar 10 11:43:55 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.225@o2ib7: 0 seconds Mar 10 11:43:55 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 53 previous similar messages Mar 10 11:46:38 fir-io7-s1 kernel: LNetError: 17813:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 10 11:46:38 fir-io7-s1 kernel: LNetError: 17813:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 352 previous similar messages Mar 10 11:51:58 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 11:51:58 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 169 previous similar messages Mar 10 11:53:56 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.235@o2ib7: 0 seconds Mar 10 11:53:56 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 132 previous similar messages Mar 10 11:55:02 fir-io7-s1 kernel: Lustre: fir-OST004c: haven't heard from client 3289c24b-04bc-4 (at 10.49.0.61@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c746bc8a400, cur 1583866502 expire 1583866352 last 1583866275 Mar 10 11:55:02 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 10 11:56:42 fir-io7-s1 kernel: LNetError: 85823:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 10 11:56:42 fir-io7-s1 kernel: LNetError: 85823:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 398 previous similar messages Mar 10 12:02:02 fir-io7-s1 kernel: LNetError: 18909:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 12:02:02 fir-io7-s1 kernel: LNetError: 18909:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 193 previous similar messages Mar 10 12:03:57 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 0 seconds Mar 10 12:03:57 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 236 previous similar messages Mar 10 12:06:42 fir-io7-s1 kernel: LNetError: 18909:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 10 12:06:42 fir-io7-s1 kernel: LNetError: 18909:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 373 previous similar messages Mar 10 12:12:02 fir-io7-s1 kernel: LNetError: 18909:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 12:12:02 fir-io7-s1 kernel: LNetError: 18909:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 173 previous similar messages Mar 10 12:14:08 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.232@o2ib7: 0 seconds Mar 10 12:14:08 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 496 previous similar messages Mar 10 12:16:42 fir-io7-s1 kernel: LNetError: 19362:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 10 12:16:42 fir-io7-s1 kernel: LNetError: 19362:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 395 previous similar messages Mar 10 12:22:02 fir-io7-s1 kernel: LNetError: 19690:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 12:22:02 fir-io7-s1 kernel: LNetError: 19690:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 219 previous similar messages Mar 10 12:24:19 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 1 seconds Mar 10 12:24:19 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 316 previous similar messages Mar 10 12:26:42 fir-io7-s1 kernel: LNetError: 19690:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 10 12:26:42 fir-io7-s1 kernel: LNetError: 19690:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 371 previous similar messages Mar 10 12:32:02 fir-io7-s1 kernel: LNetError: 19872:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 12:32:02 fir-io7-s1 kernel: LNetError: 19872:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 226 previous similar messages Mar 10 12:34:22 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.217@o2ib7: 0 seconds Mar 10 12:34:22 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 193 previous similar messages Mar 10 12:36:42 fir-io7-s1 kernel: LNetError: 20081:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 10 12:36:42 fir-io7-s1 kernel: LNetError: 20081:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 317 previous similar messages Mar 10 12:37:32 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 3289c24b-04bc-4 (at 10.49.0.61@o2ib1) Mar 10 12:37:32 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 10 12:42:02 fir-io7-s1 kernel: LNetError: 20367:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 12:42:02 fir-io7-s1 kernel: LNetError: 20367:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 211 previous similar messages Mar 10 12:44:28 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.238@o2ib7: 3 seconds Mar 10 12:44:28 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 267 previous similar messages Mar 10 12:46:42 fir-io7-s1 kernel: LNetError: 20081:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 10 12:46:42 fir-io7-s1 kernel: LNetError: 20081:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 346 previous similar messages Mar 10 12:52:02 fir-io7-s1 kernel: LNetError: 20081:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 12:52:02 fir-io7-s1 kernel: LNetError: 20081:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 209 previous similar messages Mar 10 12:54:42 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds Mar 10 12:54:42 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 226 previous similar messages Mar 10 12:56:42 fir-io7-s1 kernel: LNetError: 20789:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 10 12:56:42 fir-io7-s1 kernel: LNetError: 20789:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 379 previous similar messages Mar 10 13:02:02 fir-io7-s1 kernel: LNetError: 20543:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 13:02:02 fir-io7-s1 kernel: LNetError: 20543:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 210 previous similar messages Mar 10 13:04:52 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 0 seconds Mar 10 13:04:52 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 221 previous similar messages Mar 10 13:06:42 fir-io7-s1 kernel: LNetError: 20789:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 10 13:06:42 fir-io7-s1 kernel: LNetError: 20789:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 391 previous similar messages Mar 10 13:12:02 fir-io7-s1 kernel: LNetError: 21303:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 13:12:02 fir-io7-s1 kernel: LNetError: 21303:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 194 previous similar messages Mar 10 13:15:07 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.236@o2ib7: 0 seconds Mar 10 13:15:07 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 164 previous similar messages Mar 10 13:16:42 fir-io7-s1 kernel: LNetError: 21303:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 10 13:16:42 fir-io7-s1 kernel: LNetError: 21303:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 409 previous similar messages Mar 10 13:21:30 fir-io7-s1 kernel: LustreError: 84743:0:(tgt_grant.c:758:tgt_grant_check()) fir-OST004c: cli 38b8739f-ddc9-4 claims 32768 GRANT, real grant 28672 Mar 10 13:22:02 fir-io7-s1 kernel: LNetError: 21303:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 13:22:02 fir-io7-s1 kernel: LNetError: 21303:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 197 previous similar messages Mar 10 13:25:20 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.238@o2ib7: 0 seconds Mar 10 13:25:20 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 108 previous similar messages Mar 10 13:26:42 fir-io7-s1 kernel: LNetError: 21856:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 10 13:26:42 fir-io7-s1 kernel: LNetError: 21856:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 399 previous similar messages Mar 10 13:32:02 fir-io7-s1 kernel: LNetError: 22158:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 13:32:02 fir-io7-s1 kernel: LNetError: 22158:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 152 previous similar messages Mar 10 13:35:27 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.225@o2ib7: 0 seconds Mar 10 13:35:27 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 59 previous similar messages Mar 10 13:36:42 fir-io7-s1 kernel: LNetError: 22158:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 10 13:36:42 fir-io7-s1 kernel: LNetError: 22158:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 404 previous similar messages Mar 10 13:42:07 fir-io7-s1 kernel: LNetError: 22344:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 13:42:07 fir-io7-s1 kernel: LNetError: 22344:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 159 previous similar messages Mar 10 13:45:31 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 0 seconds Mar 10 13:45:31 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 88 previous similar messages Mar 10 13:46:47 fir-io7-s1 kernel: LNetError: 22553:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 10 13:46:47 fir-io7-s1 kernel: LNetError: 22553:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 393 previous similar messages Mar 10 13:52:07 fir-io7-s1 kernel: LNetError: 21506:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 13:52:07 fir-io7-s1 kernel: LNetError: 21506:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 155 previous similar messages Mar 10 13:55:33 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.235@o2ib7: 0 seconds Mar 10 13:55:33 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 48 previous similar messages Mar 10 13:56:47 fir-io7-s1 kernel: LNetError: 21506:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 10 13:56:47 fir-io7-s1 kernel: LNetError: 21506:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 337 previous similar messages Mar 10 14:02:07 fir-io7-s1 kernel: LNetError: 85823:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 14:02:07 fir-io7-s1 kernel: LNetError: 85823:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 149 previous similar messages Mar 10 14:05:37 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 0 seconds Mar 10 14:05:37 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 316 previous similar messages Mar 10 14:06:47 fir-io7-s1 kernel: LNetError: 23271:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 10 14:06:47 fir-io7-s1 kernel: LNetError: 23271:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 378 previous similar messages Mar 10 14:12:07 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 14:12:07 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 158 previous similar messages Mar 10 14:15:37 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.233@o2ib7: 0 seconds Mar 10 14:15:37 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 485 previous similar messages Mar 10 14:16:52 fir-io7-s1 kernel: LNetError: 23271:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 10 14:16:52 fir-io7-s1 kernel: LNetError: 23271:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 350 previous similar messages Mar 10 14:22:07 fir-io7-s1 kernel: LNetError: 23770:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 14:22:07 fir-io7-s1 kernel: LNetError: 23770:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 205 previous similar messages Mar 10 14:25:38 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.235@o2ib7: 0 seconds Mar 10 14:25:38 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 308 previous similar messages Mar 10 14:26:52 fir-io7-s1 kernel: LNetError: 23969:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 10 14:26:52 fir-io7-s1 kernel: LNetError: 23969:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 404 previous similar messages Mar 10 14:32:07 fir-io7-s1 kernel: LNetError: 24002:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 14:32:07 fir-io7-s1 kernel: LNetError: 24002:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 232 previous similar messages Mar 10 14:35:41 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.238@o2ib7: 0 seconds Mar 10 14:35:41 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 214 previous similar messages Mar 10 14:36:52 fir-io7-s1 kernel: LNetError: 23969:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 10 14:36:52 fir-io7-s1 kernel: LNetError: 23969:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 332 previous similar messages Mar 10 14:42:07 fir-io7-s1 kernel: LNetError: 24554:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 14:42:07 fir-io7-s1 kernel: LNetError: 24554:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 216 previous similar messages Mar 10 14:45:42 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.233@o2ib7: 0 seconds Mar 10 14:45:42 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 170 previous similar messages Mar 10 14:46:52 fir-io7-s1 kernel: LNetError: 24745:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 10 14:46:52 fir-io7-s1 kernel: LNetError: 24745:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 362 previous similar messages Mar 10 14:52:07 fir-io7-s1 kernel: LNetError: 24746:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 14:52:07 fir-io7-s1 kernel: LNetError: 24746:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 200 previous similar messages Mar 10 14:54:01 fir-io7-s1 kernel: LustreError: 68364:0:(tgt_grant.c:758:tgt_grant_check()) fir-OST004a: cli ef7e7ddb-0d3b-4 claims 16723968 GRANT, real grant 16347136 Mar 10 14:55:51 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.227@o2ib7: 0 seconds Mar 10 14:55:51 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 216 previous similar messages Mar 10 14:56:52 fir-io7-s1 kernel: LNetError: 24983:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 10 14:56:52 fir-io7-s1 kernel: LNetError: 24983:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 397 previous similar messages Mar 10 15:02:07 fir-io7-s1 kernel: LNetError: 25271:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 15:02:07 fir-io7-s1 kernel: LNetError: 25271:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 206 previous similar messages Mar 10 15:05:57 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds Mar 10 15:05:57 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 183 previous similar messages Mar 10 15:06:52 fir-io7-s1 kernel: LNetError: 25480:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 10 15:06:52 fir-io7-s1 kernel: LNetError: 25480:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 338 previous similar messages Mar 10 15:12:07 fir-io7-s1 kernel: LNetError: 25650:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 15:12:07 fir-io7-s1 kernel: LNetError: 25650:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 174 previous similar messages Mar 10 15:16:02 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds Mar 10 15:16:02 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 106 previous similar messages Mar 10 15:16:24 fir-io7-s1 kernel: Lustre: fir-OST0052: haven't heard from client c3b61f8f-6175-4 (at 10.50.13.1@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c75e4962c00, cur 1583878584 expire 1583878434 last 1583878357 Mar 10 15:16:24 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 10 15:17:02 fir-io7-s1 kernel: LNetError: 25650:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 10 15:17:02 fir-io7-s1 kernel: LNetError: 25650:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 416 previous similar messages Mar 10 15:22:07 fir-io7-s1 kernel: LNetError: 25617:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 15:22:07 fir-io7-s1 kernel: LNetError: 25617:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 187 previous similar messages Mar 10 15:26:13 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.232@o2ib7: 0 seconds Mar 10 15:26:13 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 153 previous similar messages Mar 10 15:27:02 fir-io7-s1 kernel: LNetError: 25995:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 10 15:27:02 fir-io7-s1 kernel: LNetError: 25995:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 350 previous similar messages Mar 10 15:32:07 fir-io7-s1 kernel: LNetError: 26351:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 15:32:07 fir-io7-s1 kernel: LNetError: 26351:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 198 previous similar messages Mar 10 15:36:13 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 2e5b5bd1-8cad-4 (at 10.50.13.1@o2ib2) Mar 10 15:36:13 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 10 15:36:21 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.236@o2ib7: 0 seconds Mar 10 15:36:21 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 123 previous similar messages Mar 10 15:37:02 fir-io7-s1 kernel: LNetError: 26351:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 10 15:37:02 fir-io7-s1 kernel: LNetError: 26351:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 406 previous similar messages Mar 10 15:42:07 fir-io7-s1 kernel: LNetError: 26351:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 15:42:07 fir-io7-s1 kernel: LNetError: 26351:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 164 previous similar messages Mar 10 15:44:13 fir-io7-s1 kernel: Lustre: fir-OST004a: Connection restored to 55f5a1fa-5644-4 (at 10.49.23.22@o2ib1) Mar 10 15:44:13 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 10 15:45:46 fir-io7-s1 kernel: Lustre: fir-OST004a: haven't heard from client 3e887b31-c0fb-4 (at 10.50.2.31@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c69929ca000, cur 1583880346 expire 1583880196 last 1583880119 Mar 10 15:45:46 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 10 15:47:02 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.225@o2ib7: 0 seconds Mar 10 15:47:02 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 85 previous similar messages Mar 10 15:47:02 fir-io7-s1 kernel: LNetError: 26882:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 10 15:47:02 fir-io7-s1 kernel: LNetError: 26882:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 357 previous similar messages Mar 10 15:52:07 fir-io7-s1 kernel: LNetError: 26882:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 15:52:07 fir-io7-s1 kernel: LNetError: 26882:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 160 previous similar messages Mar 10 15:57:02 fir-io7-s1 kernel: LNetError: 27238:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 10 15:57:02 fir-io7-s1 kernel: LNetError: 27238:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 385 previous similar messages Mar 10 15:57:03 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 1 seconds Mar 10 15:57:03 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 125 previous similar messages Mar 10 16:02:10 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 16:02:10 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 196 previous similar messages Mar 10 16:07:12 fir-io7-s1 kernel: LNetError: 27238:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 10 16:07:12 fir-io7-s1 kernel: LNetError: 27238:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 364 previous similar messages Mar 10 16:07:28 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.225@o2ib7: 2 seconds Mar 10 16:07:28 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 161 previous similar messages Mar 10 16:08:15 fir-io7-s1 kernel: Lustre: fir-OST004e: Connection restored to 8887833d-7791-4 (at 10.50.10.34@o2ib2) Mar 10 16:08:15 fir-io7-s1 kernel: Lustre: Skipped 4 previous similar messages Mar 10 16:08:33 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 6d5489c2-d040-4 (at 10.50.3.52@o2ib2) Mar 10 16:08:33 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 10 16:08:35 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 7ab3ec98-ea52-4 (at 10.50.3.61@o2ib2) Mar 10 16:08:35 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 10 16:08:55 fir-io7-s1 kernel: Lustre: fir-OST004a: Connection restored to b1f8c226-9e49-4 (at 10.50.8.1@o2ib2) Mar 10 16:08:55 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 10 16:10:31 fir-io7-s1 kernel: Lustre: fir-OST004a: Connection restored to a693f260-8215-4 (at 10.50.8.38@o2ib2) Mar 10 16:10:31 fir-io7-s1 kernel: Lustre: Skipped 4 previous similar messages Mar 10 16:10:40 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 3e887b31-c0fb-4 (at 10.50.2.31@o2ib2) Mar 10 16:10:40 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 10 16:12:12 fir-io7-s1 kernel: LNetError: 27813:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 16:12:12 fir-io7-s1 kernel: LNetError: 27813:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 167 previous similar messages Mar 10 16:17:12 fir-io7-s1 kernel: LNetError: 27813:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 10 16:17:12 fir-io7-s1 kernel: LNetError: 27813:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 394 previous similar messages Mar 10 16:17:40 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.232@o2ib7: 0 seconds Mar 10 16:17:40 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 86 previous similar messages Mar 10 16:20:13 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 10 16:20:13 fir-io7-s1 kernel: Lustre: Skipped 4 previous similar messages Mar 10 16:20:51 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 4ecd6e0d-238a-4 (at 10.50.9.37@o2ib2) Mar 10 16:20:51 fir-io7-s1 kernel: Lustre: Skipped 6 previous similar messages Mar 10 16:22:12 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 16:22:12 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 158 previous similar messages Mar 10 16:24:12 fir-io7-s1 kernel: Lustre: fir-OST004a: Connection restored to 4cb2ea92-45be-4 (at 10.50.14.3@o2ib2) Mar 10 16:24:12 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 10 16:27:12 fir-io7-s1 kernel: LNetError: 28134:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 10 16:27:12 fir-io7-s1 kernel: LNetError: 28134:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 399 previous similar messages Mar 10 16:27:42 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.225@o2ib7: 0 seconds Mar 10 16:27:42 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 497 previous similar messages Mar 10 16:31:49 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 10 16:31:49 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 10 16:32:12 fir-io7-s1 kernel: LNetError: 27633:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 16:32:12 fir-io7-s1 kernel: LNetError: 27633:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 205 previous similar messages Mar 10 16:32:48 fir-io7-s1 kernel: Lustre: fir-OST004c: haven't heard from client 6ec2deac-d729-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c698f63e000, cur 1583883168 expire 1583883018 last 1583882941 Mar 10 16:32:48 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 10 16:36:27 fir-io7-s1 kernel: Lustre: fir-OST0048: haven't heard from client f739db77-d08d-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6990ef9800, cur 1583883387 expire 1583883237 last 1583883160 Mar 10 16:36:27 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 10 16:37:12 fir-io7-s1 kernel: LNetError: 27633:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 10 16:37:12 fir-io7-s1 kernel: LNetError: 27633:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 402 previous similar messages Mar 10 16:37:47 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 0 seconds Mar 10 16:37:47 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 303 previous similar messages Mar 10 16:42:12 fir-io7-s1 kernel: LNetError: 28826:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 16:42:12 fir-io7-s1 kernel: LNetError: 28826:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 196 previous similar messages Mar 10 16:47:12 fir-io7-s1 kernel: LNetError: 28826:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 10 16:47:12 fir-io7-s1 kernel: LNetError: 28826:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 411 previous similar messages Mar 10 16:47:52 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.238@o2ib7: 0 seconds Mar 10 16:47:52 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 159 previous similar messages Mar 10 16:52:12 fir-io7-s1 kernel: LNetError: 28826:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 16:52:12 fir-io7-s1 kernel: LNetError: 28826:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 203 previous similar messages Mar 10 16:57:12 fir-io7-s1 kernel: LNetError: 29357:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 10 16:57:12 fir-io7-s1 kernel: LNetError: 29357:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 411 previous similar messages Mar 10 16:57:53 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.239@o2ib7: 0 seconds Mar 10 16:57:53 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 154 previous similar messages Mar 10 17:02:12 fir-io7-s1 kernel: LNetError: 29595:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 17:02:12 fir-io7-s1 kernel: LNetError: 29595:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 197 previous similar messages Mar 10 17:07:12 fir-io7-s1 kernel: LNetError: 29595:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 10 17:07:12 fir-io7-s1 kernel: LNetError: 29595:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 369 previous similar messages Mar 10 17:07:56 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.238@o2ib7: 0 seconds Mar 10 17:07:56 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 113 previous similar messages Mar 10 17:08:15 fir-io7-s1 kernel: Lustre: 68684:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1583885288/real 1583885288] req@ffff9c6fb9c71680 x1652475171091968/t0(0) o104->fir-OST004c@10.50.9.37@o2ib2:15/16 lens 296/224 e 0 to 1 dl 1583885295 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Mar 10 17:08:15 fir-io7-s1 kernel: Lustre: 68684:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 7 previous similar messages Mar 10 17:08:36 fir-io7-s1 kernel: Lustre: 68684:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1583885309/real 1583885309] req@ffff9c6fb9c71680 x1652475171091968/t0(0) o104->fir-OST004c@10.50.9.37@o2ib2:15/16 lens 296/224 e 0 to 1 dl 1583885316 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Mar 10 17:08:36 fir-io7-s1 kernel: Lustre: 68684:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Mar 10 17:09:11 fir-io7-s1 kernel: Lustre: 68684:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1583885344/real 1583885344] req@ffff9c6fb9c71680 x1652475171091968/t0(0) o104->fir-OST004c@10.50.9.37@o2ib2:15/16 lens 296/224 e 0 to 1 dl 1583885351 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Mar 10 17:09:11 fir-io7-s1 kernel: Lustre: 68684:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 4 previous similar messages Mar 10 17:10:21 fir-io7-s1 kernel: Lustre: 68684:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1583885414/real 1583885414] req@ffff9c6fb9c71680 x1652475171091968/t0(0) o104->fir-OST004c@10.50.9.37@o2ib2:15/16 lens 296/224 e 0 to 1 dl 1583885421 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Mar 10 17:10:21 fir-io7-s1 kernel: Lustre: 68684:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 9 previous similar messages Mar 10 17:10:42 fir-io7-s1 kernel: LustreError: 68684:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) ### client (nid 10.50.9.37@o2ib2) failed to reply to blocking AST (req@ffff9c6fb9c71680 x1652475171091968 status 0 rc -110), evict it ns: filter-fir-OST004c_UUID lock: ffff9c4a388a5e80/0x3bd9b85258d2c2a5 lrc: 4/0,0 mode: PW/PW res: [0x5902f5f:0x0:0x0].0x0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->126975) flags: 0x60000400030020 nid: 10.50.9.37@o2ib2 remote: 0xe8ebac1dee879ce8 expref: 28 pid: 85262 timeout: 7984846 lvb_type: 0 Mar 10 17:10:42 fir-io7-s1 kernel: LustreError: 68684:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) Skipped 1 previous similar message Mar 10 17:10:42 fir-io7-s1 kernel: LustreError: 138-a: fir-OST004c: A client on nid 10.50.9.37@o2ib2 was evicted due to a lock blocking callback time out: rc -110 Mar 10 17:10:42 fir-io7-s1 kernel: LustreError: Skipped 1 previous similar message Mar 10 17:10:42 fir-io7-s1 kernel: LustreError: 66897:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 154s: evicting client at 10.50.9.37@o2ib2 ns: filter-fir-OST004c_UUID lock: ffff9c4a388a5e80/0x3bd9b85258d2c2a5 lrc: 3/0,0 mode: PW/PW res: [0x5902f5f:0x0:0x0].0x0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->126975) flags: 0x60000400030020 nid: 10.50.9.37@o2ib2 remote: 0xe8ebac1dee879ce8 expref: 29 pid: 85262 timeout: 0 lvb_type: 0 Mar 10 17:11:37 fir-io7-s1 kernel: Lustre: fir-OST0050: haven't heard from client 8a6b175a-81e5-4 (at 10.50.9.37@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c78e1f1ac00, cur 1583885497 expire 1583885347 last 1583885270 Mar 10 17:11:37 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 10 17:11:54 fir-io7-s1 kernel: Lustre: fir-OST0048: haven't heard from client 8a6b175a-81e5-4 (at 10.50.9.37@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c88d07d8800, cur 1583885514 expire 1583885364 last 1583885287 Mar 10 17:12:12 fir-io7-s1 kernel: LNetError: 29894:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 17:12:12 fir-io7-s1 kernel: LNetError: 29894:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 210 previous similar messages Mar 10 17:12:27 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 4ecd6e0d-238a-4 (at 10.50.9.37@o2ib2) Mar 10 17:12:27 fir-io7-s1 kernel: Lustre: Skipped 11 previous similar messages Mar 10 17:17:17 fir-io7-s1 kernel: LNetError: 29894:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 10 17:17:17 fir-io7-s1 kernel: LNetError: 29894:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 381 previous similar messages Mar 10 17:18:02 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.232@o2ib7: 0 seconds Mar 10 17:18:02 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 206 previous similar messages Mar 10 17:22:12 fir-io7-s1 kernel: LNetError: 30376:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 17:22:12 fir-io7-s1 kernel: LNetError: 30376:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 221 previous similar messages Mar 10 17:27:22 fir-io7-s1 kernel: LNetError: 30242:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 10 17:27:22 fir-io7-s1 kernel: LNetError: 30242:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 382 previous similar messages Mar 10 17:28:05 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 0 seconds Mar 10 17:28:05 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 261 previous similar messages Mar 10 17:32:12 fir-io7-s1 kernel: LNetError: 30601:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 17:32:12 fir-io7-s1 kernel: LNetError: 30601:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 198 previous similar messages Mar 10 17:37:22 fir-io7-s1 kernel: LNetError: 30601:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 10 17:37:22 fir-io7-s1 kernel: LNetError: 30601:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 352 previous similar messages Mar 10 17:38:09 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.232@o2ib7: 0 seconds Mar 10 17:38:09 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 210 previous similar messages Mar 10 17:42:12 fir-io7-s1 kernel: LNetError: 30951:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 17:42:12 fir-io7-s1 kernel: LNetError: 30951:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 233 previous similar messages Mar 10 17:47:23 fir-io7-s1 kernel: LNetError: 30951:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 10 17:47:23 fir-io7-s1 kernel: LNetError: 30951:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 415 previous similar messages Mar 10 17:48:13 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.237@o2ib7: 0 seconds Mar 10 17:48:13 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 173 previous similar messages Mar 10 17:52:17 fir-io7-s1 kernel: LNetError: 31319:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 17:52:17 fir-io7-s1 kernel: LNetError: 31319:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 215 previous similar messages Mar 10 17:57:27 fir-io7-s1 kernel: LNetError: 31319:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 10 17:57:27 fir-io7-s1 kernel: LNetError: 31319:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 354 previous similar messages Mar 10 17:58:32 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.237@o2ib7: 5 seconds Mar 10 17:58:32 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 204 previous similar messages Mar 10 18:02:17 fir-io7-s1 kernel: LNetError: 31664:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 18:02:17 fir-io7-s1 kernel: LNetError: 31664:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 145 previous similar messages Mar 10 18:07:27 fir-io7-s1 kernel: LNetError: 31664:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 10 18:07:27 fir-io7-s1 kernel: LNetError: 31664:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 394 previous similar messages Mar 10 18:08:42 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 0 seconds Mar 10 18:08:42 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 50 previous similar messages Mar 10 18:12:17 fir-io7-s1 kernel: LNetError: 31664:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 18:12:17 fir-io7-s1 kernel: LNetError: 31664:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 129 previous similar messages Mar 10 18:17:32 fir-io7-s1 kernel: LNetError: 32210:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 10 18:17:32 fir-io7-s1 kernel: LNetError: 32210:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 378 previous similar messages Mar 10 18:18:47 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds Mar 10 18:18:47 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 498 previous similar messages Mar 10 18:22:17 fir-io7-s1 kernel: LNetError: 32385:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 18:22:17 fir-io7-s1 kernel: LNetError: 32385:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 204 previous similar messages Mar 10 18:27:42 fir-io7-s1 kernel: LNetError: 32635:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 10 18:27:42 fir-io7-s1 kernel: LNetError: 32635:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 344 previous similar messages Mar 10 18:28:57 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.225@o2ib7: 0 seconds Mar 10 18:28:57 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 429 previous similar messages Mar 10 18:32:17 fir-io7-s1 kernel: LNetError: 32139:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 18:32:17 fir-io7-s1 kernel: LNetError: 32139:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 236 previous similar messages Mar 10 18:37:42 fir-io7-s1 kernel: LNetError: 32635:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 10 18:37:42 fir-io7-s1 kernel: LNetError: 32635:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 356 previous similar messages Mar 10 18:39:00 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.237@o2ib7: 0 seconds Mar 10 18:39:00 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 430 previous similar messages Mar 10 18:42:17 fir-io7-s1 kernel: LNetError: 32635:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 18:42:17 fir-io7-s1 kernel: LNetError: 32635:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 215 previous similar messages Mar 10 18:46:39 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 86b9edab-73b3-4 (at 10.49.30.18@o2ib1) Mar 10 18:46:39 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 10 18:47:42 fir-io7-s1 kernel: LNetError: 33258:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 10 18:47:42 fir-io7-s1 kernel: LNetError: 33258:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 354 previous similar messages Mar 10 18:49:02 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds Mar 10 18:49:02 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 289 previous similar messages Mar 10 18:52:17 fir-io7-s1 kernel: LNetError: 27965:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 18:52:17 fir-io7-s1 kernel: LNetError: 27965:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 223 previous similar messages Mar 10 18:57:42 fir-io7-s1 kernel: LNetError: 33430:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 10 18:57:42 fir-io7-s1 kernel: LNetError: 33430:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 339 previous similar messages Mar 10 18:59:06 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 1 seconds Mar 10 18:59:06 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 228 previous similar messages Mar 10 19:00:40 fir-io7-s1 kernel: LustreError: 73718:0:(tgt_grant.c:758:tgt_grant_check()) fir-OST004e: cli 8b6b6a33-9ab5-4 claims 61440 GRANT, real grant 32768 Mar 10 19:02:17 fir-io7-s1 kernel: LNetError: 32139:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 19:02:17 fir-io7-s1 kernel: LNetError: 32139:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 231 previous similar messages Mar 10 19:07:47 fir-io7-s1 kernel: LNetError: 33785:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 10 19:07:47 fir-io7-s1 kernel: LNetError: 33785:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 322 previous similar messages Mar 10 19:09:08 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 0 seconds Mar 10 19:09:08 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 410 previous similar messages Mar 10 19:12:17 fir-io7-s1 kernel: LNetError: 34151:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 19:12:17 fir-io7-s1 kernel: LNetError: 34151:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 213 previous similar messages Mar 10 19:14:42 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 3e887b31-c0fb-4 (at 10.50.2.31@o2ib2) Mar 10 19:14:42 fir-io7-s1 kernel: Lustre: fir-OST004a: Connection restored to 3e887b31-c0fb-4 (at 10.50.2.31@o2ib2) Mar 10 19:14:42 fir-io7-s1 kernel: Lustre: Skipped 4 previous similar messages Mar 10 19:17:47 fir-io7-s1 kernel: LNetError: 34151:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 10 19:17:47 fir-io7-s1 kernel: LNetError: 34151:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 323 previous similar messages Mar 10 19:19:11 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.233@o2ib7: 0 seconds Mar 10 19:19:11 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 422 previous similar messages Mar 10 19:22:17 fir-io7-s1 kernel: LNetError: 34081:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 19:22:17 fir-io7-s1 kernel: LNetError: 34081:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 235 previous similar messages Mar 10 19:27:47 fir-io7-s1 kernel: LNetError: 34503:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 10 19:27:47 fir-io7-s1 kernel: LNetError: 34503:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 351 previous similar messages Mar 10 19:29:18 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.237@o2ib7: 7 seconds Mar 10 19:29:18 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 364 previous similar messages Mar 10 19:32:17 fir-io7-s1 kernel: LNetError: 34847:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 19:32:17 fir-io7-s1 kernel: LNetError: 34847:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 215 previous similar messages Mar 10 19:37:47 fir-io7-s1 kernel: LNetError: 34848:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 10 19:37:47 fir-io7-s1 kernel: LNetError: 34848:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 350 previous similar messages Mar 10 19:39:23 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.217@o2ib7: 4 seconds Mar 10 19:39:23 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 146 previous similar messages Mar 10 19:42:17 fir-io7-s1 kernel: LNetError: 35196:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 19:42:17 fir-io7-s1 kernel: LNetError: 35196:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 208 previous similar messages Mar 10 19:47:52 fir-io7-s1 kernel: LNetError: 35406:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 10 19:47:52 fir-io7-s1 kernel: LNetError: 35406:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 378 previous similar messages Mar 10 19:49:24 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.233@o2ib7: 1 seconds Mar 10 19:49:24 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 346 previous similar messages Mar 10 19:52:17 fir-io7-s1 kernel: LNetError: 34678:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 19:52:17 fir-io7-s1 kernel: LNetError: 34678:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 262 previous similar messages Mar 10 19:57:57 fir-io7-s1 kernel: LNetError: 35406:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 10 19:57:57 fir-io7-s1 kernel: LNetError: 35406:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 385 previous similar messages Mar 10 19:59:33 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 6 seconds Mar 10 19:59:33 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 426 previous similar messages Mar 10 20:02:22 fir-io7-s1 kernel: LNetError: 27965:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 20:02:22 fir-io7-s1 kernel: LNetError: 27965:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 234 previous similar messages Mar 10 20:07:57 fir-io7-s1 kernel: LNetError: 35917:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 10 20:07:57 fir-io7-s1 kernel: LNetError: 35917:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 357 previous similar messages Mar 10 20:09:38 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.238@o2ib7: 5 seconds Mar 10 20:09:38 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 315 previous similar messages Mar 10 20:12:22 fir-io7-s1 kernel: LNetError: 36286:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 20:12:22 fir-io7-s1 kernel: LNetError: 36286:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 222 previous similar messages Mar 10 20:17:57 fir-io7-s1 kernel: LNetError: 36286:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 10 20:17:57 fir-io7-s1 kernel: LNetError: 36286:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 390 previous similar messages Mar 10 20:19:43 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.233@o2ib7: 0 seconds Mar 10 20:19:43 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 387 previous similar messages Mar 10 20:22:22 fir-io7-s1 kernel: LNetError: 36637:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 20:22:22 fir-io7-s1 kernel: LNetError: 36637:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 223 previous similar messages Mar 10 20:28:02 fir-io7-s1 kernel: LNetError: 36827:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 10 20:28:02 fir-io7-s1 kernel: LNetError: 36827:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 348 previous similar messages Mar 10 20:29:45 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds Mar 10 20:29:45 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 205 previous similar messages Mar 10 20:32:22 fir-io7-s1 kernel: LNetError: 36827:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 20:32:22 fir-io7-s1 kernel: LNetError: 36827:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 214 previous similar messages Mar 10 20:38:03 fir-io7-s1 kernel: LNetError: 37143:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 10 20:38:03 fir-io7-s1 kernel: LNetError: 37143:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 398 previous similar messages Mar 10 20:39:46 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 1 seconds Mar 10 20:39:46 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 216 previous similar messages Mar 10 20:42:26 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 20:42:26 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 218 previous similar messages Mar 10 20:48:07 fir-io7-s1 kernel: LNetError: 37665:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 10 20:48:07 fir-io7-s1 kernel: LNetError: 37665:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 345 previous similar messages Mar 10 20:49:49 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds Mar 10 20:49:49 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 205 previous similar messages Mar 10 20:52:27 fir-io7-s1 kernel: LNetError: 37665:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 20:52:27 fir-io7-s1 kernel: LNetError: 37665:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 193 previous similar messages Mar 10 20:58:07 fir-io7-s1 kernel: LNetError: 37843:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 10 20:58:07 fir-io7-s1 kernel: LNetError: 37843:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 394 previous similar messages Mar 10 20:59:52 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.233@o2ib7: 2 seconds Mar 10 20:59:52 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 186 previous similar messages Mar 10 21:02:27 fir-io7-s1 kernel: LNetError: 38052:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 21:02:27 fir-io7-s1 kernel: LNetError: 38052:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 209 previous similar messages Mar 10 21:08:12 fir-io7-s1 kernel: LNetError: 38279:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 10 21:08:12 fir-io7-s1 kernel: LNetError: 38279:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 378 previous similar messages Mar 10 21:09:57 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.238@o2ib7: 0 seconds Mar 10 21:09:57 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 546 previous similar messages Mar 10 21:12:32 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 21:12:32 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 193 previous similar messages Mar 10 21:18:12 fir-io7-s1 kernel: LNetError: 38279:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 10 21:18:12 fir-io7-s1 kernel: LNetError: 38279:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 361 previous similar messages Mar 10 21:20:01 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 0 seconds Mar 10 21:20:01 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 570 previous similar messages Mar 10 21:22:37 fir-io7-s1 kernel: LNetError: 38769:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 21:22:37 fir-io7-s1 kernel: LNetError: 38769:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 222 previous similar messages Mar 10 21:28:12 fir-io7-s1 kernel: LNetError: 38769:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 10 21:28:12 fir-io7-s1 kernel: LNetError: 38769:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 369 previous similar messages Mar 10 21:30:02 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 0 seconds Mar 10 21:30:02 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 400 previous similar messages Mar 10 21:32:37 fir-io7-s1 kernel: LNetError: 38821:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 21:32:37 fir-io7-s1 kernel: LNetError: 38821:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 207 previous similar messages Mar 10 21:38:12 fir-io7-s1 kernel: LNetError: 39121:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 10 21:38:12 fir-io7-s1 kernel: LNetError: 39121:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 329 previous similar messages Mar 10 21:40:03 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 1 seconds Mar 10 21:40:03 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 305 previous similar messages Mar 10 21:42:37 fir-io7-s1 kernel: LNetError: 39469:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 21:42:37 fir-io7-s1 kernel: LNetError: 39469:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 209 previous similar messages Mar 10 21:48:12 fir-io7-s1 kernel: LNetError: 39469:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 10 21:48:12 fir-io7-s1 kernel: LNetError: 39469:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 378 previous similar messages Mar 10 21:50:09 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 1 seconds Mar 10 21:50:09 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 288 previous similar messages Mar 10 21:52:37 fir-io7-s1 kernel: LNetError: 39823:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 21:52:37 fir-io7-s1 kernel: LNetError: 39823:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 250 previous similar messages Mar 10 21:58:17 fir-io7-s1 kernel: LNetError: 39823:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 10 21:58:17 fir-io7-s1 kernel: LNetError: 39823:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 372 previous similar messages Mar 10 22:00:10 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.219@o2ib7: 0 seconds Mar 10 22:00:10 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 415 previous similar messages Mar 10 22:02:37 fir-io7-s1 kernel: LNetError: 40176:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 22:02:37 fir-io7-s1 kernel: LNetError: 40176:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 222 previous similar messages Mar 10 22:08:22 fir-io7-s1 kernel: LNetError: 40176:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 10 22:08:22 fir-io7-s1 kernel: LNetError: 40176:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 350 previous similar messages Mar 10 22:10:13 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 3 seconds Mar 10 22:10:13 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 259 previous similar messages Mar 10 22:12:37 fir-io7-s1 kernel: LNetError: 27965:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 22:12:37 fir-io7-s1 kernel: LNetError: 27965:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 204 previous similar messages Mar 10 22:18:16 fir-io7-s1 kernel: LustreError: 40244:0:(tgt_grant.c:758:tgt_grant_check()) fir-OST004a: cli dc0054a8-33f9-4 claims 90112 GRANT, real grant 0 Mar 10 22:18:22 fir-io7-s1 kernel: LNetError: 40564:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 10 22:18:22 fir-io7-s1 kernel: LNetError: 40564:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 315 previous similar messages Mar 10 22:20:17 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.236@o2ib7: 0 seconds Mar 10 22:20:17 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 295 previous similar messages Mar 10 22:22:37 fir-io7-s1 kernel: LNetError: 40318:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 22:22:37 fir-io7-s1 kernel: LNetError: 40318:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 209 previous similar messages Mar 10 22:28:22 fir-io7-s1 kernel: LNetError: 41011:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 10 22:28:22 fir-io7-s1 kernel: LNetError: 41011:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 320 previous similar messages Mar 10 22:30:18 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 9 seconds Mar 10 22:30:18 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 417 previous similar messages Mar 10 22:32:37 fir-io7-s1 kernel: LNetError: 41358:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 22:32:37 fir-io7-s1 kernel: LNetError: 41358:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 200 previous similar messages Mar 10 22:38:13 fir-io7-s1 kernel: Lustre: fir-OST004a: haven't heard from client 47fc00ff-043a-4 (at 10.50.9.37@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c698f086800, cur 1583905093 expire 1583904943 last 1583904866 Mar 10 22:38:13 fir-io7-s1 kernel: Lustre: Skipped 3 previous similar messages Mar 10 22:38:22 fir-io7-s1 kernel: LNetError: 41641:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 10 22:38:22 fir-io7-s1 kernel: LNetError: 41641:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 354 previous similar messages Mar 10 22:39:04 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 4ecd6e0d-238a-4 (at 10.50.9.37@o2ib2) Mar 10 22:39:04 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 10 22:40:37 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds Mar 10 22:40:37 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 304 previous similar messages Mar 10 22:42:37 fir-io7-s1 kernel: LNetError: 41641:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 22:42:37 fir-io7-s1 kernel: LNetError: 41641:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 197 previous similar messages Mar 10 22:48:27 fir-io7-s1 kernel: LNetError: 41851:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 10 22:48:27 fir-io7-s1 kernel: LNetError: 41851:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 314 previous similar messages Mar 10 22:50:38 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.237@o2ib7: 0 seconds Mar 10 22:50:38 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 222 previous similar messages Mar 10 22:52:37 fir-io7-s1 kernel: LNetError: 41851:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 22:52:37 fir-io7-s1 kernel: LNetError: 41851:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 191 previous similar messages Mar 10 22:58:27 fir-io7-s1 kernel: LNetError: 42201:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 10 22:58:27 fir-io7-s1 kernel: LNetError: 42201:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 312 previous similar messages Mar 10 23:00:38 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 8 seconds Mar 10 23:00:38 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 291 previous similar messages Mar 10 23:02:37 fir-io7-s1 kernel: LNetError: 42400:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 23:02:37 fir-io7-s1 kernel: LNetError: 42400:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 210 previous similar messages Mar 10 23:08:37 fir-io7-s1 kernel: LNetError: 42400:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 10 23:08:37 fir-io7-s1 kernel: LNetError: 42400:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 339 previous similar messages Mar 10 23:10:50 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.238@o2ib7: 0 seconds Mar 10 23:10:50 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 338 previous similar messages Mar 10 23:12:37 fir-io7-s1 kernel: LNetError: 42770:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 23:12:37 fir-io7-s1 kernel: LNetError: 42770:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 200 previous similar messages Mar 10 23:18:42 fir-io7-s1 kernel: LNetError: 42770:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 10 23:18:42 fir-io7-s1 kernel: LNetError: 42770:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 301 previous similar messages Mar 10 23:20:51 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.237@o2ib7: 0 seconds Mar 10 23:20:51 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 283 previous similar messages Mar 10 23:22:37 fir-io7-s1 kernel: LNetError: 43261:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 23:22:37 fir-io7-s1 kernel: LNetError: 43261:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 207 previous similar messages Mar 10 23:28:42 fir-io7-s1 kernel: LNetError: 43117:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 10 23:28:42 fir-io7-s1 kernel: LNetError: 43117:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 321 previous similar messages Mar 10 23:30:57 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.237@o2ib7: 0 seconds Mar 10 23:30:57 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 289 previous similar messages Mar 10 23:32:37 fir-io7-s1 kernel: LNetError: 42018:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 23:32:37 fir-io7-s1 kernel: LNetError: 42018:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 202 previous similar messages Mar 10 23:38:42 fir-io7-s1 kernel: LNetError: 43468:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 10 23:38:42 fir-io7-s1 kernel: LNetError: 43468:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 336 previous similar messages Mar 10 23:41:03 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.237@o2ib7: 0 seconds Mar 10 23:41:03 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 362 previous similar messages Mar 10 23:42:42 fir-io7-s1 kernel: LNetError: 43820:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 23:42:42 fir-io7-s1 kernel: LNetError: 43820:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 187 previous similar messages Mar 10 23:48:42 fir-io7-s1 kernel: LNetError: 43820:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 10 23:48:42 fir-io7-s1 kernel: LNetError: 43820:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 301 previous similar messages Mar 10 23:51:12 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.239@o2ib7: 0 seconds Mar 10 23:51:12 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 275 previous similar messages Mar 10 23:52:42 fir-io7-s1 kernel: LNetError: 44171:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 10 23:52:42 fir-io7-s1 kernel: LNetError: 44171:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 191 previous similar messages Mar 10 23:58:42 fir-io7-s1 kernel: LNetError: 44526:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 10 23:58:42 fir-io7-s1 kernel: LNetError: 44526:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 327 previous similar messages Mar 11 00:01:23 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.232@o2ib7: 5 seconds Mar 11 00:01:23 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 256 previous similar messages Mar 11 00:02:47 fir-io7-s1 kernel: LNetError: 44526:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 00:02:47 fir-io7-s1 kernel: LNetError: 44526:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 219 previous similar messages Mar 11 00:08:42 fir-io7-s1 kernel: LNetError: 44526:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 11 00:08:42 fir-io7-s1 kernel: LNetError: 44526:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 335 previous similar messages Mar 11 00:11:26 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.239@o2ib7: 1 seconds Mar 11 00:11:26 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 220 previous similar messages Mar 11 00:12:47 fir-io7-s1 kernel: LNetError: 44526:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 00:12:47 fir-io7-s1 kernel: LNetError: 44526:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 198 previous similar messages Mar 11 00:18:42 fir-io7-s1 kernel: LNetError: 44526:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 11 00:18:42 fir-io7-s1 kernel: LNetError: 44526:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 322 previous similar messages Mar 11 00:21:28 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.237@o2ib7: 0 seconds Mar 11 00:21:28 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 284 previous similar messages Mar 11 00:22:47 fir-io7-s1 kernel: LNetError: 44061:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 00:22:47 fir-io7-s1 kernel: LNetError: 44061:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 209 previous similar messages Mar 11 00:28:52 fir-io7-s1 kernel: LNetError: 45252:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 11 00:28:52 fir-io7-s1 kernel: LNetError: 45252:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 371 previous similar messages Mar 11 00:31:29 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.237@o2ib7: 1 seconds Mar 11 00:31:29 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 301 previous similar messages Mar 11 00:32:48 fir-io7-s1 kernel: LNetError: 45618:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 00:32:48 fir-io7-s1 kernel: LNetError: 45618:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 213 previous similar messages Mar 11 00:39:07 fir-io7-s1 kernel: LNetError: 45618:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 11 00:39:07 fir-io7-s1 kernel: LNetError: 45618:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 349 previous similar messages Mar 11 00:41:30 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.233@o2ib7: 1 seconds Mar 11 00:41:30 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 270 previous similar messages Mar 11 00:42:48 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 00:42:48 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 193 previous similar messages Mar 11 00:49:07 fir-io7-s1 kernel: LNetError: 45618:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 11 00:49:07 fir-io7-s1 kernel: LNetError: 45618:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 340 previous similar messages Mar 11 00:51:35 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 0 seconds Mar 11 00:51:35 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 196 previous similar messages Mar 11 00:52:52 fir-io7-s1 kernel: LNetError: 46325:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 00:52:52 fir-io7-s1 kernel: LNetError: 46325:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 201 previous similar messages Mar 11 00:59:07 fir-io7-s1 kernel: LNetError: 46325:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 11 00:59:07 fir-io7-s1 kernel: LNetError: 46325:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 365 previous similar messages Mar 11 01:01:43 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.232@o2ib7: 2 seconds Mar 11 01:01:43 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 215 previous similar messages Mar 11 01:02:52 fir-io7-s1 kernel: LNetError: 46681:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 01:02:52 fir-io7-s1 kernel: LNetError: 46681:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 204 previous similar messages Mar 11 01:09:07 fir-io7-s1 kernel: LNetError: 46681:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 11 01:09:07 fir-io7-s1 kernel: LNetError: 46681:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 319 previous similar messages Mar 11 01:11:48 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.235@o2ib7: 0 seconds Mar 11 01:11:48 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 290 previous similar messages Mar 11 01:12:52 fir-io7-s1 kernel: LNetError: 47051:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 01:12:52 fir-io7-s1 kernel: LNetError: 47051:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 220 previous similar messages Mar 11 01:19:07 fir-io7-s1 kernel: LNetError: 47051:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 11 01:19:07 fir-io7-s1 kernel: LNetError: 47051:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 347 previous similar messages Mar 11 01:21:51 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.227@o2ib7: 0 seconds Mar 11 01:21:51 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 667 previous similar messages Mar 11 01:22:52 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 01:22:52 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 185 previous similar messages Mar 11 01:29:07 fir-io7-s1 kernel: LNetError: 47408:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 11 01:29:07 fir-io7-s1 kernel: LNetError: 47408:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 327 previous similar messages Mar 11 01:31:53 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 2 seconds Mar 11 01:31:53 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 526 previous similar messages Mar 11 01:32:55 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 01:32:55 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 225 previous similar messages Mar 11 01:39:07 fir-io7-s1 kernel: LNetError: 47754:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 11 01:39:07 fir-io7-s1 kernel: LNetError: 47754:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 313 previous similar messages Mar 11 01:41:56 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.235@o2ib7: 1 seconds Mar 11 01:41:56 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 533 previous similar messages Mar 11 01:42:57 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 01:42:57 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 250 previous similar messages Mar 11 01:49:12 fir-io7-s1 kernel: LNetError: 47754:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 11 01:49:12 fir-io7-s1 kernel: LNetError: 47754:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 340 previous similar messages Mar 11 01:52:03 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.237@o2ib7: 0 seconds Mar 11 01:52:03 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 521 previous similar messages Mar 11 01:52:57 fir-io7-s1 kernel: LNetError: 47486:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 01:52:57 fir-io7-s1 kernel: LNetError: 47486:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 241 previous similar messages Mar 11 01:59:22 fir-io7-s1 kernel: LNetError: 48458:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 11 01:59:22 fir-io7-s1 kernel: LNetError: 48458:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 322 previous similar messages Mar 11 02:02:04 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.233@o2ib7: 3 seconds Mar 11 02:02:04 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 549 previous similar messages Mar 11 02:02:57 fir-io7-s1 kernel: LNetError: 48807:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 02:02:57 fir-io7-s1 kernel: LNetError: 48807:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 227 previous similar messages Mar 11 02:03:51 fir-io7-s1 kernel: Lustre: fir-OST0052: haven't heard from client 758e4337-ab25-4 (at 10.50.9.27@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c698f4e8c00, cur 1583917431 expire 1583917281 last 1583917204 Mar 11 02:03:51 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 11 02:08:44 fir-io7-s1 kernel: Lustre: fir-OST0048: Client bf9c0359-a266-4 (at 10.50.0.1@o2ib2) reconnecting Mar 11 02:08:44 fir-io7-s1 kernel: Lustre: fir-OST004c: Connection restored to bf9c0359-a266-4 (at 10.50.0.1@o2ib2) Mar 11 02:08:44 fir-io7-s1 kernel: Lustre: Skipped 7 previous similar messages Mar 11 02:09:09 fir-io7-s1 kernel: LustreError: 137-5: fir-OST0051_UUID: not available for connect from 10.50.0.1@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Mar 11 02:09:22 fir-io7-s1 kernel: LNetError: 48807:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 11 02:09:22 fir-io7-s1 kernel: LNetError: 48807:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 345 previous similar messages Mar 11 02:12:08 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.226@o2ib7: 1 seconds Mar 11 02:12:08 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 413 previous similar messages Mar 11 02:12:57 fir-io7-s1 kernel: LNetError: 49176:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 02:12:57 fir-io7-s1 kernel: LNetError: 49176:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 216 previous similar messages Mar 11 02:19:22 fir-io7-s1 kernel: LNetError: 49176:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 11 02:19:22 fir-io7-s1 kernel: LNetError: 49176:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 296 previous similar messages Mar 11 02:22:13 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.226@o2ib7: 1 seconds Mar 11 02:22:13 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 286 previous similar messages Mar 11 02:22:57 fir-io7-s1 kernel: LNetError: 49525:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 02:22:57 fir-io7-s1 kernel: LNetError: 49525:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 206 previous similar messages Mar 11 02:29:22 fir-io7-s1 kernel: LNetError: 49525:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 11 02:29:22 fir-io7-s1 kernel: LNetError: 49525:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 317 previous similar messages Mar 11 02:32:22 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds Mar 11 02:32:22 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 333 previous similar messages Mar 11 02:32:57 fir-io7-s1 kernel: LNetError: 49608:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 02:32:57 fir-io7-s1 kernel: LNetError: 49608:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 207 previous similar messages Mar 11 02:39:22 fir-io7-s1 kernel: LNetError: 49870:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 11 02:39:22 fir-io7-s1 kernel: LNetError: 49870:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 346 previous similar messages Mar 11 02:42:24 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.227@o2ib7: 3 seconds Mar 11 02:42:24 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 388 previous similar messages Mar 11 02:42:57 fir-io7-s1 kernel: LNetError: 50243:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 02:42:57 fir-io7-s1 kernel: LNetError: 50243:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 229 previous similar messages Mar 11 02:47:27 fir-io7-s1 kernel: LustreError: 40827:0:(tgt_grant.c:758:tgt_grant_check()) fir-OST004c: cli 20917503-c5d6-4 claims 32768 GRANT, real grant 0 Mar 11 02:49:22 fir-io7-s1 kernel: LNetError: 50243:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 11 02:49:22 fir-io7-s1 kernel: LNetError: 50243:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 335 previous similar messages Mar 11 02:52:29 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 0 seconds Mar 11 02:52:29 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 330 previous similar messages Mar 11 02:52:57 fir-io7-s1 kernel: LNetError: 27965:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 02:52:57 fir-io7-s1 kernel: LNetError: 27965:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 214 previous similar messages Mar 11 02:59:22 fir-io7-s1 kernel: LNetError: 50603:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 11 02:59:22 fir-io7-s1 kernel: LNetError: 50603:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 297 previous similar messages Mar 11 03:02:30 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.235@o2ib7: 0 seconds Mar 11 03:02:30 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 192 previous similar messages Mar 11 03:02:57 fir-io7-s1 kernel: LNetError: 50971:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 03:02:57 fir-io7-s1 kernel: LNetError: 50971:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 213 previous similar messages Mar 11 03:09:22 fir-io7-s1 kernel: LNetError: 50971:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 11 03:09:22 fir-io7-s1 kernel: LNetError: 50971:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 381 previous similar messages Mar 11 03:12:38 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 1 seconds Mar 11 03:12:38 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 489 previous similar messages Mar 11 03:12:57 fir-io7-s1 kernel: LNetError: 51344:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 03:12:57 fir-io7-s1 kernel: LNetError: 51344:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 255 previous similar messages Mar 11 03:19:22 fir-io7-s1 kernel: LNetError: 27965:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 11 03:19:22 fir-io7-s1 kernel: LNetError: 27965:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 369 previous similar messages Mar 11 03:22:46 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.227@o2ib7: 0 seconds Mar 11 03:22:46 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 297 previous similar messages Mar 11 03:22:57 fir-io7-s1 kernel: LNetError: 51808:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 03:22:57 fir-io7-s1 kernel: LNetError: 51808:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 235 previous similar messages Mar 11 03:29:22 fir-io7-s1 kernel: LNetError: 51808:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 11 03:29:22 fir-io7-s1 kernel: LNetError: 51808:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 343 previous similar messages Mar 11 03:31:48 fir-io7-s1 kernel: Lustre: fir-OST004e: haven't heard from client 267b1079-73f5-4 (at 10.50.9.37@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c4a0ea91000, cur 1583922708 expire 1583922558 last 1583922481 Mar 11 03:31:48 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 11 03:32:42 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 4ecd6e0d-238a-4 (at 10.50.9.37@o2ib2) Mar 11 03:32:42 fir-io7-s1 kernel: Lustre: Skipped 9 previous similar messages Mar 11 03:32:47 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.232@o2ib7: 0 seconds Mar 11 03:32:47 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 285 previous similar messages Mar 11 03:32:57 fir-io7-s1 kernel: LNetError: 52066:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 03:32:57 fir-io7-s1 kernel: LNetError: 52066:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 234 previous similar messages Mar 11 03:39:22 fir-io7-s1 kernel: LNetError: 52301:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 11 03:39:22 fir-io7-s1 kernel: LNetError: 52301:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 357 previous similar messages Mar 11 03:42:57 fir-io7-s1 kernel: LNetError: 52301:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 03:42:57 fir-io7-s1 kernel: LNetError: 52301:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 222 previous similar messages Mar 11 03:43:07 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.225@o2ib7: 0 seconds Mar 11 03:43:07 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 305 previous similar messages Mar 11 03:49:22 fir-io7-s1 kernel: LNetError: 52553:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 11 03:49:22 fir-io7-s1 kernel: LNetError: 52553:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 382 previous similar messages Mar 11 03:52:57 fir-io7-s1 kernel: LNetError: 52773:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 03:52:57 fir-io7-s1 kernel: LNetError: 52773:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 181 previous similar messages Mar 11 03:53:12 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.227@o2ib7: 0 seconds Mar 11 03:53:12 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 135 previous similar messages Mar 11 03:59:22 fir-io7-s1 kernel: LNetError: 52773:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 11 03:59:22 fir-io7-s1 kernel: LNetError: 52773:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 375 previous similar messages Mar 11 04:02:57 fir-io7-s1 kernel: LNetError: 53123:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 04:02:57 fir-io7-s1 kernel: LNetError: 53123:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 198 previous similar messages Mar 11 04:03:22 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.226@o2ib7: 1 seconds Mar 11 04:03:22 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 200 previous similar messages Mar 11 04:08:21 fir-io7-s1 kernel: Lustre: fir-OST0052: haven't heard from client bf9c0359-a266-4 (at 10.50.0.1@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c5a01202400, cur 1583924901 expire 1583924751 last 1583924674 Mar 11 04:08:21 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 11 04:09:22 fir-io7-s1 kernel: LNetError: 53123:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 11 04:09:22 fir-io7-s1 kernel: LNetError: 53123:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 351 previous similar messages Mar 11 04:12:57 fir-io7-s1 kernel: LNetError: 53493:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 04:12:57 fir-io7-s1 kernel: LNetError: 53493:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 182 previous similar messages Mar 11 04:13:28 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 2 seconds Mar 11 04:13:28 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 170 previous similar messages Mar 11 04:19:27 fir-io7-s1 kernel: LNetError: 53493:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 11 04:19:27 fir-io7-s1 kernel: LNetError: 53493:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 302 previous similar messages Mar 11 04:22:26 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to bf9c0359-a266-4 (at 10.50.0.1@o2ib2) Mar 11 04:22:26 fir-io7-s1 kernel: Lustre: Skipped 2 previous similar messages Mar 11 04:22:26 fir-io7-s1 kernel: Lustre: fir-OST004c: Connection restored to bf9c0359-a266-4 (at 10.50.0.1@o2ib2) Mar 11 04:22:26 fir-io7-s1 kernel: Lustre: Skipped 2 previous similar messages Mar 11 04:22:41 fir-io7-s1 kernel: LustreError: 137-5: fir-OST004d_UUID: not available for connect from 10.50.0.1@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Mar 11 04:22:41 fir-io7-s1 kernel: LustreError: Skipped 5 previous similar messages Mar 11 04:22:57 fir-io7-s1 kernel: LNetError: 53846:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 04:22:57 fir-io7-s1 kernel: LNetError: 53846:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 191 previous similar messages Mar 11 04:23:06 fir-io7-s1 kernel: Lustre: fir-OST0048: Client bf9c0359-a266-4 (at 10.50.0.1@o2ib2) reconnecting Mar 11 04:23:06 fir-io7-s1 kernel: Lustre: fir-OST004a: Connection restored to bf9c0359-a266-4 (at 10.50.0.1@o2ib2) Mar 11 04:23:06 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 11 04:23:35 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.227@o2ib7: 0 seconds Mar 11 04:23:35 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 451 previous similar messages Mar 11 04:25:55 fir-io7-s1 kernel: Lustre: fir-OST0052: Client bf9c0359-a266-4 (at 10.50.0.1@o2ib2) reconnecting Mar 11 04:25:55 fir-io7-s1 kernel: Lustre: fir-OST0052: Connection restored to bf9c0359-a266-4 (at 10.50.0.1@o2ib2) Mar 11 04:25:55 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 11 04:26:19 fir-io7-s1 kernel: LustreError: 137-5: fir-OST004b_UUID: not available for connect from 10.50.0.1@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Mar 11 04:26:19 fir-io7-s1 kernel: LustreError: Skipped 2 previous similar messages Mar 11 04:26:44 fir-io7-s1 kernel: Lustre: fir-OST0052: Client bf9c0359-a266-4 (at 10.50.0.1@o2ib2) reconnecting Mar 11 04:26:44 fir-io7-s1 kernel: Lustre: Skipped 3 previous similar messages Mar 11 04:26:44 fir-io7-s1 kernel: Lustre: fir-OST0052: Connection restored to bf9c0359-a266-4 (at 10.50.0.1@o2ib2) Mar 11 04:29:13 fir-io7-s1 kernel: Lustre: fir-OST0050: Client bf9c0359-a266-4 (at 10.50.0.1@o2ib2) reconnecting Mar 11 04:29:13 fir-io7-s1 kernel: Lustre: fir-OST004a: Connection restored to bf9c0359-a266-4 (at 10.50.0.1@o2ib2) Mar 11 04:29:13 fir-io7-s1 kernel: Lustre: Skipped 3 previous similar messages Mar 11 04:29:32 fir-io7-s1 kernel: LNetError: 53846:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 11 04:29:32 fir-io7-s1 kernel: LNetError: 53846:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 328 previous similar messages Mar 11 04:29:38 fir-io7-s1 kernel: LustreError: 137-5: fir-OST004b_UUID: not available for connect from 10.50.0.1@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Mar 11 04:29:38 fir-io7-s1 kernel: LustreError: Skipped 1 previous similar message Mar 11 04:30:03 fir-io7-s1 kernel: Lustre: fir-OST004a: Client bf9c0359-a266-4 (at 10.50.0.1@o2ib2) reconnecting Mar 11 04:30:03 fir-io7-s1 kernel: Lustre: fir-OST0050: Connection restored to bf9c0359-a266-4 (at 10.50.0.1@o2ib2) Mar 11 04:30:03 fir-io7-s1 kernel: Lustre: Skipped 3 previous similar messages Mar 11 04:30:03 fir-io7-s1 kernel: Lustre: Skipped 2 previous similar messages Mar 11 04:32:22 fir-io7-s1 kernel: LustreError: 137-5: fir-OST004b_UUID: not available for connect from 10.50.0.1@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Mar 11 04:32:22 fir-io7-s1 kernel: LustreError: Skipped 1 previous similar message Mar 11 04:32:57 fir-io7-s1 kernel: LNetError: 54199:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 04:32:57 fir-io7-s1 kernel: LNetError: 54199:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 186 previous similar messages Mar 11 04:33:38 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 5 seconds Mar 11 04:33:38 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 312 previous similar messages Mar 11 04:34:42 fir-io7-s1 kernel: LustreError: 137-5: fir-OST004b_UUID: not available for connect from 10.50.0.1@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Mar 11 04:34:42 fir-io7-s1 kernel: LustreError: Skipped 1 previous similar message Mar 11 04:36:36 fir-io7-s1 kernel: LustreError: 137-5: fir-OST004b_UUID: not available for connect from 10.50.0.1@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Mar 11 04:39:32 fir-io7-s1 kernel: LNetError: 54199:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 11 04:39:32 fir-io7-s1 kernel: LNetError: 54199:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 315 previous similar messages Mar 11 04:42:57 fir-io7-s1 kernel: LNetError: 54666:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 04:42:57 fir-io7-s1 kernel: LNetError: 54666:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 203 previous similar messages Mar 11 04:43:43 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 5 seconds Mar 11 04:43:43 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 398 previous similar messages Mar 11 04:49:42 fir-io7-s1 kernel: LNetError: 54666:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 11 04:49:42 fir-io7-s1 kernel: LNetError: 54666:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 300 previous similar messages Mar 11 04:52:57 fir-io7-s1 kernel: LNetError: 54908:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 04:52:57 fir-io7-s1 kernel: LNetError: 54908:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 167 previous similar messages Mar 11 04:53:48 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 0 seconds Mar 11 04:53:48 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 419 previous similar messages Mar 11 04:59:42 fir-io7-s1 kernel: LNetError: 54908:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 11 04:59:42 fir-io7-s1 kernel: LNetError: 54908:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 315 previous similar messages Mar 11 05:02:57 fir-io7-s1 kernel: LNetError: 55261:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 05:02:57 fir-io7-s1 kernel: LNetError: 55261:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 179 previous similar messages Mar 11 05:03:59 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 0 seconds Mar 11 05:03:59 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 356 previous similar messages Mar 11 05:09:47 fir-io7-s1 kernel: LNetError: 55261:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 11 05:09:47 fir-io7-s1 kernel: LNetError: 55261:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 315 previous similar messages Mar 11 05:12:57 fir-io7-s1 kernel: LNetError: 55694:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 05:12:57 fir-io7-s1 kernel: LNetError: 55694:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 199 previous similar messages Mar 11 05:14:07 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.225@o2ib7: 0 seconds Mar 11 05:14:07 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 287 previous similar messages Mar 11 05:19:53 fir-io7-s1 kernel: LNetError: 55629:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 11 05:19:53 fir-io7-s1 kernel: LNetError: 55629:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 11 05:23:03 fir-io7-s1 kernel: LNetError: 55993:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 05:23:03 fir-io7-s1 kernel: LNetError: 55993:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 188 previous similar messages Mar 11 05:24:08 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds Mar 11 05:24:08 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 360 previous similar messages Mar 11 05:30:03 fir-io7-s1 kernel: LNetError: 55993:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 11 05:30:03 fir-io7-s1 kernel: LNetError: 55993:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 308 previous similar messages Mar 11 05:33:07 fir-io7-s1 kernel: LNetError: 56447:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 05:33:07 fir-io7-s1 kernel: LNetError: 56447:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 168 previous similar messages Mar 11 05:34:11 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.233@o2ib7: 0 seconds Mar 11 05:34:11 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 200 previous similar messages Mar 11 05:40:07 fir-io7-s1 kernel: LNetError: 56372:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 11 05:40:07 fir-io7-s1 kernel: LNetError: 56372:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 314 previous similar messages Mar 11 05:43:07 fir-io7-s1 kernel: LNetError: 56724:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 05:43:07 fir-io7-s1 kernel: LNetError: 56724:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 193 previous similar messages Mar 11 05:44:12 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.227@o2ib7: 0 seconds Mar 11 05:44:12 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 270 previous similar messages Mar 11 05:50:07 fir-io7-s1 kernel: LNetError: 56372:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 11 05:50:07 fir-io7-s1 kernel: LNetError: 56372:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 308 previous similar messages Mar 11 05:53:07 fir-io7-s1 kernel: LNetError: 57071:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 05:53:07 fir-io7-s1 kernel: LNetError: 57071:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 176 previous similar messages Mar 11 05:54:22 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.236@o2ib7: 0 seconds Mar 11 05:54:22 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 256 previous similar messages Mar 11 06:00:07 fir-io7-s1 kernel: LNetError: 57071:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 11 06:00:07 fir-io7-s1 kernel: LNetError: 57071:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 315 previous similar messages Mar 11 06:03:07 fir-io7-s1 kernel: LNetError: 57422:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 06:03:07 fir-io7-s1 kernel: LNetError: 57422:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 193 previous similar messages Mar 11 06:04:27 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 5 seconds Mar 11 06:04:27 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 331 previous similar messages Mar 11 06:10:07 fir-io7-s1 kernel: LNetError: 57630:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 11 06:10:07 fir-io7-s1 kernel: LNetError: 57630:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 329 previous similar messages Mar 11 06:13:07 fir-io7-s1 kernel: LNetError: 57630:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 06:13:07 fir-io7-s1 kernel: LNetError: 57630:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 196 previous similar messages Mar 11 06:14:48 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.236@o2ib7: 0 seconds Mar 11 06:14:48 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 320 previous similar messages Mar 11 06:20:12 fir-io7-s1 kernel: LNetError: 57889:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 11 06:20:12 fir-io7-s1 kernel: LNetError: 57889:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 300 previous similar messages Mar 11 06:23:07 fir-io7-s1 kernel: LNetError: 58145:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 06:23:07 fir-io7-s1 kernel: LNetError: 58145:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 176 previous similar messages Mar 11 06:24:52 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 5 seconds Mar 11 06:24:52 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 201 previous similar messages Mar 11 06:30:12 fir-io7-s1 kernel: LNetError: 58145:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 11 06:30:12 fir-io7-s1 kernel: LNetError: 58145:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 11 06:33:07 fir-io7-s1 kernel: LNetError: 58145:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 06:33:07 fir-io7-s1 kernel: LNetError: 58145:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 190 previous similar messages Mar 11 06:34:53 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 6 seconds Mar 11 06:34:53 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 235 previous similar messages Mar 11 06:40:17 fir-io7-s1 kernel: LNetError: 58801:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 11 06:40:17 fir-io7-s1 kernel: LNetError: 58801:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 300 previous similar messages Mar 11 06:43:07 fir-io7-s1 kernel: LNetError: 56724:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 06:43:07 fir-io7-s1 kernel: LNetError: 56724:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 185 previous similar messages Mar 11 06:44:57 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.236@o2ib7: 3 seconds Mar 11 06:44:57 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 292 previous similar messages Mar 11 06:50:22 fir-io7-s1 kernel: LNetError: 58801:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 11 06:50:22 fir-io7-s1 kernel: LNetError: 58801:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 288 previous similar messages Mar 11 06:53:07 fir-io7-s1 kernel: LNetError: 59191:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 06:53:07 fir-io7-s1 kernel: LNetError: 59191:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 173 previous similar messages Mar 11 06:55:01 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.226@o2ib7: 0 seconds Mar 11 06:55:01 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 332 previous similar messages Mar 11 07:00:22 fir-io7-s1 kernel: LNetError: 59191:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 11 07:00:22 fir-io7-s1 kernel: LNetError: 59191:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 281 previous similar messages Mar 11 07:03:07 fir-io7-s1 kernel: LNetError: 59538:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 07:03:07 fir-io7-s1 kernel: LNetError: 59538:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 203 previous similar messages Mar 11 07:05:12 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds Mar 11 07:05:12 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 300 previous similar messages Mar 11 07:10:27 fir-io7-s1 kernel: LNetError: 59538:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 11 07:10:27 fir-io7-s1 kernel: LNetError: 59538:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 301 previous similar messages Mar 11 07:13:07 fir-io7-s1 kernel: LNetError: 59538:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 07:13:07 fir-io7-s1 kernel: LNetError: 59538:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 183 previous similar messages Mar 11 07:15:17 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 5 seconds Mar 11 07:15:17 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 555 previous similar messages Mar 11 07:20:32 fir-io7-s1 kernel: LNetError: 59538:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 11 07:20:32 fir-io7-s1 kernel: LNetError: 59538:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 308 previous similar messages Mar 11 07:23:07 fir-io7-s1 kernel: LNetError: 60259:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 07:23:07 fir-io7-s1 kernel: LNetError: 60259:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 160 previous similar messages Mar 11 07:25:18 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 1 seconds Mar 11 07:25:18 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 808 previous similar messages Mar 11 07:30:42 fir-io7-s1 kernel: LNetError: 60259:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 11 07:30:42 fir-io7-s1 kernel: LNetError: 60259:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 289 previous similar messages Mar 11 07:33:12 fir-io7-s1 kernel: LNetError: 60613:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 07:33:12 fir-io7-s1 kernel: LNetError: 60613:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 96 previous similar messages Mar 11 07:35:23 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 1 seconds Mar 11 07:35:23 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 659 previous similar messages Mar 11 07:40:42 fir-io7-s1 kernel: LNetError: 60613:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 11 07:40:42 fir-io7-s1 kernel: LNetError: 60613:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 280 previous similar messages Mar 11 07:43:17 fir-io7-s1 kernel: LNetError: 60000:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 07:43:17 fir-io7-s1 kernel: LNetError: 60000:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 146 previous similar messages Mar 11 07:45:28 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.232@o2ib7: 0 seconds Mar 11 07:45:28 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 523 previous similar messages Mar 11 07:50:42 fir-io7-s1 kernel: LNetError: 60966:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 11 07:50:42 fir-io7-s1 kernel: LNetError: 60966:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 11 07:53:22 fir-io7-s1 kernel: LNetError: 61315:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 07:53:22 fir-io7-s1 kernel: LNetError: 61315:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 144 previous similar messages Mar 11 07:55:29 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.239@o2ib7: 7 seconds Mar 11 07:55:29 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 598 previous similar messages Mar 11 08:00:47 fir-io7-s1 kernel: LNetError: 61315:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 11 08:00:47 fir-io7-s1 kernel: LNetError: 61315:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 11 08:03:32 fir-io7-s1 kernel: LNetError: 61668:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 08:03:32 fir-io7-s1 kernel: LNetError: 61668:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 130 previous similar messages Mar 11 08:05:34 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.238@o2ib7: 0 seconds Mar 11 08:05:34 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 588 previous similar messages Mar 11 08:10:47 fir-io7-s1 kernel: LNetError: 61953:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 11 08:10:47 fir-io7-s1 kernel: LNetError: 61953:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 11 08:13:32 fir-io7-s1 kernel: LNetError: 61953:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 08:13:32 fir-io7-s1 kernel: LNetError: 61953:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 162 previous similar messages Mar 11 08:15:43 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 7 seconds Mar 11 08:15:43 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 539 previous similar messages Mar 11 08:20:47 fir-io7-s1 kernel: LNetError: 62131:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 11 08:20:47 fir-io7-s1 kernel: LNetError: 62131:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 280 previous similar messages Mar 11 08:23:42 fir-io7-s1 kernel: LNetError: 62379:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 08:23:42 fir-io7-s1 kernel: LNetError: 62379:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 133 previous similar messages Mar 11 08:25:48 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.238@o2ib7: 4 seconds Mar 11 08:25:48 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 603 previous similar messages Mar 11 08:30:47 fir-io7-s1 kernel: LNetError: 60000:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 11 08:30:47 fir-io7-s1 kernel: LNetError: 60000:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 299 previous similar messages Mar 11 08:33:42 fir-io7-s1 kernel: LNetError: 62379:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 08:33:42 fir-io7-s1 kernel: LNetError: 62379:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 143 previous similar messages Mar 11 08:35:49 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.237@o2ib7: 7 seconds Mar 11 08:35:49 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 617 previous similar messages Mar 11 08:40:47 fir-io7-s1 kernel: LNetError: 62824:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 11 08:40:47 fir-io7-s1 kernel: LNetError: 62824:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 292 previous similar messages Mar 11 08:43:46 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 08:43:46 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 187 previous similar messages Mar 11 08:45:52 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.227@o2ib7: 0 seconds Mar 11 08:45:52 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 484 previous similar messages Mar 11 08:48:39 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 2f501a16-5689-4 (at 10.50.9.27@o2ib2) Mar 11 08:48:39 fir-io7-s1 kernel: Lustre: Skipped 7 previous similar messages Mar 11 08:50:47 fir-io7-s1 kernel: LNetError: 63071:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 11 08:50:47 fir-io7-s1 kernel: LNetError: 63071:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 11 08:53:47 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 08:53:47 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 197 previous similar messages Mar 11 08:56:03 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 1 seconds Mar 11 08:56:03 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 541 previous similar messages Mar 11 09:00:47 fir-io7-s1 kernel: LNetError: 63424:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 11 09:00:47 fir-io7-s1 kernel: LNetError: 63424:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 308 previous similar messages Mar 11 09:03:47 fir-io7-s1 kernel: LNetError: 63775:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 09:03:47 fir-io7-s1 kernel: LNetError: 63775:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 167 previous similar messages Mar 11 09:06:06 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.235@o2ib7: 0 seconds Mar 11 09:06:06 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 546 previous similar messages Mar 11 09:10:47 fir-io7-s1 kernel: LNetError: 63775:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 11 09:10:47 fir-io7-s1 kernel: LNetError: 63775:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 290 previous similar messages Mar 11 09:13:47 fir-io7-s1 kernel: LNetError: 63993:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 09:13:47 fir-io7-s1 kernel: LNetError: 63993:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 196 previous similar messages Mar 11 09:16:12 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 0 seconds Mar 11 09:16:12 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 402 previous similar messages Mar 11 09:20:47 fir-io7-s1 kernel: LNetError: 64167:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 11 09:20:47 fir-io7-s1 kernel: LNetError: 64167:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 280 previous similar messages Mar 11 09:23:57 fir-io7-s1 kernel: LNetError: 64518:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 09:23:57 fir-io7-s1 kernel: LNetError: 64518:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 195 previous similar messages Mar 11 09:26:28 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.233@o2ib7: 3 seconds Mar 11 09:26:28 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 498 previous similar messages Mar 11 09:30:57 fir-io7-s1 kernel: LNetError: 64518:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 11 09:30:57 fir-io7-s1 kernel: LNetError: 64518:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 11 09:34:02 fir-io7-s1 kernel: LNetError: 60000:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 09:34:02 fir-io7-s1 kernel: LNetError: 60000:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 184 previous similar messages Mar 11 09:36:38 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.233@o2ib7: 3 seconds Mar 11 09:36:38 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 563 previous similar messages Mar 11 09:41:07 fir-io7-s1 kernel: LNetError: 64982:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 11 09:41:07 fir-io7-s1 kernel: LNetError: 64982:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 11 09:44:12 fir-io7-s1 kernel: LNetError: 65333:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 09:44:12 fir-io7-s1 kernel: LNetError: 65333:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 183 previous similar messages Mar 11 09:46:39 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.236@o2ib7: 1 seconds Mar 11 09:46:39 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 520 previous similar messages Mar 11 09:51:07 fir-io7-s1 kernel: LNetError: 65333:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 11 09:51:07 fir-io7-s1 kernel: LNetError: 65333:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 11 09:54:12 fir-io7-s1 kernel: LNetError: 65694:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 09:54:12 fir-io7-s1 kernel: LNetError: 65694:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 191 previous similar messages Mar 11 09:56:43 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.227@o2ib7: 0 seconds Mar 11 09:56:43 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 451 previous similar messages Mar 11 09:57:48 fir-io7-s1 kernel: LustreError: 73706:0:(tgt_grant.c:758:tgt_grant_check()) fir-OST004a: cli 932170e3-8d55-4 claims 28672 GRANT, real grant 16384 Mar 11 10:01:07 fir-io7-s1 kernel: LNetError: 65694:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 11 10:01:07 fir-io7-s1 kernel: LNetError: 65694:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 280 previous similar messages Mar 11 10:04:12 fir-io7-s1 kernel: LNetError: 66062:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 10:04:12 fir-io7-s1 kernel: LNetError: 66062:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 219 previous similar messages Mar 11 10:06:45 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 0 seconds Mar 11 10:06:45 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 316 previous similar messages Mar 11 10:11:17 fir-io7-s1 kernel: LNetError: 66062:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 11 10:11:17 fir-io7-s1 kernel: LNetError: 66062:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 300 previous similar messages Mar 11 10:14:12 fir-io7-s1 kernel: LNetError: 66428:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 10:14:12 fir-io7-s1 kernel: LNetError: 66428:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 212 previous similar messages Mar 11 10:16:47 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds Mar 11 10:16:47 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 408 previous similar messages Mar 11 10:21:17 fir-io7-s1 kernel: LNetError: 66612:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 11 10:21:17 fir-io7-s1 kernel: LNetError: 66612:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 285 previous similar messages Mar 11 10:24:12 fir-io7-s1 kernel: LNetError: 66612:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 10:24:12 fir-io7-s1 kernel: LNetError: 66612:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 233 previous similar messages Mar 11 10:26:48 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 1 seconds Mar 11 10:26:48 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 346 previous similar messages Mar 11 10:31:22 fir-io7-s1 kernel: LNetError: 66947:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 11 10:31:22 fir-io7-s1 kernel: LNetError: 66947:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 299 previous similar messages Mar 11 10:34:12 fir-io7-s1 kernel: LNetError: 66947:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 10:34:12 fir-io7-s1 kernel: LNetError: 66947:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 234 previous similar messages Mar 11 10:36:52 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.226@o2ib7: 0 seconds Mar 11 10:36:52 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 302 previous similar messages Mar 11 10:41:27 fir-io7-s1 kernel: LNetError: 67549:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 11 10:41:27 fir-io7-s1 kernel: LNetError: 67549:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 11 10:44:12 fir-io7-s1 kernel: LNetError: 67549:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 10:44:12 fir-io7-s1 kernel: LNetError: 67549:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 229 previous similar messages Mar 11 10:46:58 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.227@o2ib7: 2 seconds Mar 11 10:46:58 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 284 previous similar messages Mar 11 10:51:27 fir-io7-s1 kernel: LNetError: 67549:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 11 10:51:27 fir-io7-s1 kernel: LNetError: 67549:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 11 10:54:12 fir-io7-s1 kernel: LNetError: 67902:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 10:54:12 fir-io7-s1 kernel: LNetError: 67902:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 234 previous similar messages Mar 11 10:57:03 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 1 seconds Mar 11 10:57:03 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 341 previous similar messages Mar 11 11:01:27 fir-io7-s1 kernel: LNetError: 67549:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 11 11:01:27 fir-io7-s1 kernel: LNetError: 67549:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 293 previous similar messages Mar 11 11:04:12 fir-io7-s1 kernel: LNetError: 68318:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 11:04:12 fir-io7-s1 kernel: LNetError: 68318:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 217 previous similar messages Mar 11 11:07:06 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.217@o2ib7: 1 seconds Mar 11 11:07:06 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 444 previous similar messages Mar 11 11:11:27 fir-io7-s1 kernel: LNetError: 68318:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 11 11:11:27 fir-io7-s1 kernel: LNetError: 68318:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 289 previous similar messages Mar 11 11:14:12 fir-io7-s1 kernel: LNetError: 69161:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 11:14:12 fir-io7-s1 kernel: LNetError: 69161:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 229 previous similar messages Mar 11 11:17:08 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 6 seconds Mar 11 11:17:08 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 290 previous similar messages Mar 11 11:21:37 fir-io7-s1 kernel: LNetError: 69161:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 11 11:21:37 fir-io7-s1 kernel: LNetError: 69161:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 282 previous similar messages Mar 11 11:24:12 fir-io7-s1 kernel: LNetError: 69520:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 11:24:12 fir-io7-s1 kernel: LNetError: 69520:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 200 previous similar messages Mar 11 11:27:11 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.235@o2ib7: 0 seconds Mar 11 11:27:11 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 409 previous similar messages Mar 11 11:31:37 fir-io7-s1 kernel: LNetError: 69788:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 11 11:31:37 fir-io7-s1 kernel: LNetError: 69788:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 308 previous similar messages Mar 11 11:34:12 fir-io7-s1 kernel: LNetError: 69788:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 11:34:12 fir-io7-s1 kernel: LNetError: 69788:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 204 previous similar messages Mar 11 11:37:22 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds Mar 11 11:37:22 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 266 previous similar messages Mar 11 11:41:37 fir-io7-s1 kernel: LNetError: 69979:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 11 11:41:37 fir-io7-s1 kernel: LNetError: 69979:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 404 previous similar messages Mar 11 11:44:12 fir-io7-s1 kernel: LNetError: 70224:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 11:44:12 fir-io7-s1 kernel: LNetError: 70224:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 146 previous similar messages Mar 11 11:47:56 fir-io7-s1 kernel: LustreError: 68861:0:(tgt_grant.c:758:tgt_grant_check()) fir-OST004a: cli 932170e3-8d55-4 claims 28672 GRANT, real grant 24576 Mar 11 11:49:28 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.239@o2ib7: 0 seconds Mar 11 11:49:28 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 45 previous similar messages Mar 11 11:51:37 fir-io7-s1 kernel: LNetError: 60000:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 11 11:51:37 fir-io7-s1 kernel: LNetError: 60000:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 403 previous similar messages Mar 11 11:54:12 fir-io7-s1 kernel: LNetError: 70224:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 11:54:12 fir-io7-s1 kernel: LNetError: 70224:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 136 previous similar messages Mar 11 11:59:42 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.217@o2ib7: 0 seconds Mar 11 11:59:42 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 134 previous similar messages Mar 11 12:01:37 fir-io7-s1 kernel: LNetError: 70704:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 11 12:01:37 fir-io7-s1 kernel: LNetError: 70704:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 304 previous similar messages Mar 11 12:04:12 fir-io7-s1 kernel: LNetError: 69842:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 12:04:12 fir-io7-s1 kernel: LNetError: 69842:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 198 previous similar messages Mar 11 12:09:52 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.219@o2ib7: 0 seconds Mar 11 12:09:52 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 232 previous similar messages Mar 11 12:11:37 fir-io7-s1 kernel: LNetError: 71314:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 11 12:11:37 fir-io7-s1 kernel: LNetError: 71314:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 281 previous similar messages Mar 11 12:14:17 fir-io7-s1 kernel: LNetError: 71314:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 12:14:17 fir-io7-s1 kernel: LNetError: 71314:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 209 previous similar messages Mar 11 12:19:53 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.232@o2ib7: 5 seconds Mar 11 12:19:53 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 240 previous similar messages Mar 11 12:21:47 fir-io7-s1 kernel: LNetError: 71314:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 11 12:21:47 fir-io7-s1 kernel: LNetError: 71314:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 291 previous similar messages Mar 11 12:24:17 fir-io7-s1 kernel: LNetError: 70547:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 12:24:17 fir-io7-s1 kernel: LNetError: 70547:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 198 previous similar messages Mar 11 12:29:55 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.238@o2ib7: 0 seconds Mar 11 12:29:55 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 234 previous similar messages Mar 11 12:31:47 fir-io7-s1 kernel: LNetError: 8644:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 11 12:31:47 fir-io7-s1 kernel: LNetError: 8644:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 309 previous similar messages Mar 11 12:34:17 fir-io7-s1 kernel: LNetError: 71524:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 12:34:17 fir-io7-s1 kernel: LNetError: 71524:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 200 previous similar messages Mar 11 12:40:03 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.217@o2ib7: 4 seconds Mar 11 12:40:03 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 299 previous similar messages Mar 11 12:41:52 fir-io7-s1 kernel: LNetError: 71671:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 11 12:41:52 fir-io7-s1 kernel: LNetError: 71671:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 308 previous similar messages Mar 11 12:44:17 fir-io7-s1 kernel: LNetError: 72198:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 12:44:17 fir-io7-s1 kernel: LNetError: 72198:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 203 previous similar messages Mar 11 12:50:13 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.233@o2ib7: 3 seconds Mar 11 12:50:13 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 263 previous similar messages Mar 11 12:51:52 fir-io7-s1 kernel: LNetError: 24744:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 11 12:51:52 fir-io7-s1 kernel: LNetError: 24744:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 302 previous similar messages Mar 11 12:54:17 fir-io7-s1 kernel: LNetError: 72392:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 12:54:17 fir-io7-s1 kernel: LNetError: 72392:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 199 previous similar messages Mar 11 13:00:19 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.217@o2ib7: 0 seconds Mar 11 13:00:19 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 303 previous similar messages Mar 11 13:01:44 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 8ead35c6-eb56-4 (at 10.49.0.11@o2ib1) Mar 11 13:01:44 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 11 13:01:52 fir-io7-s1 kernel: LNetError: 72869:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 11 13:01:52 fir-io7-s1 kernel: LNetError: 72869:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 299 previous similar messages Mar 11 13:04:17 fir-io7-s1 kernel: LNetError: 73121:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 13:04:17 fir-io7-s1 kernel: LNetError: 73121:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 190 previous similar messages Mar 11 13:05:47 fir-io7-s1 kernel: Lustre: fir-OST004e: haven't heard from client ead48f8c-3fb8-4 (at 10.50.8.20@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c71d111ec00, cur 1583957147 expire 1583956997 last 1583956920 Mar 11 13:05:47 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 11 13:06:30 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ce69cf14-a454-4 (at 10.50.8.20@o2ib2) Mar 11 13:06:30 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 11 13:10:29 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.236@o2ib7: 0 seconds Mar 11 13:10:29 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 217 previous similar messages Mar 11 13:11:52 fir-io7-s1 kernel: LNetError: 73121:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 11 13:11:52 fir-io7-s1 kernel: LNetError: 73121:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 291 previous similar messages Mar 11 13:14:17 fir-io7-s1 kernel: LNetError: 73468:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 13:14:17 fir-io7-s1 kernel: LNetError: 73468:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 213 previous similar messages Mar 11 13:20:31 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.217@o2ib7: 0 seconds Mar 11 13:20:31 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 231 previous similar messages Mar 11 13:21:52 fir-io7-s1 kernel: LNetError: 73666:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 11 13:21:52 fir-io7-s1 kernel: LNetError: 73666:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 298 previous similar messages Mar 11 13:24:17 fir-io7-s1 kernel: LNetError: 73511:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 13:24:17 fir-io7-s1 kernel: LNetError: 73511:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 204 previous similar messages Mar 11 13:30:32 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.236@o2ib7: 1 seconds Mar 11 13:30:32 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 368 previous similar messages Mar 11 13:31:57 fir-io7-s1 kernel: LNetError: 73666:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 11 13:31:57 fir-io7-s1 kernel: LNetError: 73666:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 301 previous similar messages Mar 11 13:34:17 fir-io7-s1 kernel: LNetError: 74213:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 13:34:17 fir-io7-s1 kernel: LNetError: 74213:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 200 previous similar messages Mar 11 13:40:43 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 5 seconds Mar 11 13:40:43 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 439 previous similar messages Mar 11 13:42:07 fir-io7-s1 kernel: LNetError: 45595:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 11 13:42:07 fir-io7-s1 kernel: LNetError: 45595:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 320 previous similar messages Mar 11 13:44:17 fir-io7-s1 kernel: LNetError: 73953:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 13:44:17 fir-io7-s1 kernel: LNetError: 73953:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 210 previous similar messages Mar 11 13:50:48 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.232@o2ib7: 4 seconds Mar 11 13:50:48 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 401 previous similar messages Mar 11 13:52:12 fir-io7-s1 kernel: LNetError: 74609:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 11 13:52:12 fir-io7-s1 kernel: LNetError: 74609:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 326 previous similar messages Mar 11 13:54:17 fir-io7-s1 kernel: LNetError: 74978:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 13:54:17 fir-io7-s1 kernel: LNetError: 74978:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 214 previous similar messages Mar 11 14:00:53 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 5 seconds Mar 11 14:00:53 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 372 previous similar messages Mar 11 14:02:22 fir-io7-s1 kernel: LNetError: 74944:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 11 14:02:22 fir-io7-s1 kernel: LNetError: 74944:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 322 previous similar messages Mar 11 14:04:17 fir-io7-s1 kernel: LNetError: 74767:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 14:04:17 fir-io7-s1 kernel: LNetError: 74767:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 196 previous similar messages Mar 11 14:10:54 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.233@o2ib7: 0 seconds Mar 11 14:10:54 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 406 previous similar messages Mar 11 14:12:32 fir-io7-s1 kernel: LNetError: 75309:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 11 14:12:32 fir-io7-s1 kernel: LNetError: 75309:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 308 previous similar messages Mar 11 14:14:17 fir-io7-s1 kernel: LNetError: 74767:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 14:14:17 fir-io7-s1 kernel: LNetError: 74767:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 195 previous similar messages Mar 11 14:20:57 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds Mar 11 14:20:57 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 423 previous similar messages Mar 11 14:22:32 fir-io7-s1 kernel: LNetError: 75660:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 11 14:22:32 fir-io7-s1 kernel: LNetError: 75660:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 11 14:24:17 fir-io7-s1 kernel: LNetError: 76009:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 14:24:17 fir-io7-s1 kernel: LNetError: 76009:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 196 previous similar messages Mar 11 14:30:58 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 5 seconds Mar 11 14:30:58 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 364 previous similar messages Mar 11 14:32:32 fir-io7-s1 kernel: LNetError: 76009:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 11 14:32:32 fir-io7-s1 kernel: LNetError: 76009:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 301 previous similar messages Mar 11 14:34:17 fir-io7-s1 kernel: LNetError: 76140:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 14:34:17 fir-io7-s1 kernel: LNetError: 76140:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 189 previous similar messages Mar 11 14:41:01 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 0 seconds Mar 11 14:41:01 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 287 previous similar messages Mar 11 14:42:32 fir-io7-s1 kernel: LNetError: 76359:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 11 14:42:32 fir-io7-s1 kernel: LNetError: 76359:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 301 previous similar messages Mar 11 14:44:17 fir-io7-s1 kernel: LNetError: 76668:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 14:44:17 fir-io7-s1 kernel: LNetError: 76668:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 198 previous similar messages Mar 11 14:51:02 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.235@o2ib7: 0 seconds Mar 11 14:51:02 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 394 previous similar messages Mar 11 14:52:42 fir-io7-s1 kernel: LNetError: 76709:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 11 14:52:42 fir-io7-s1 kernel: LNetError: 76709:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 301 previous similar messages Mar 11 14:54:17 fir-io7-s1 kernel: LNetError: 77058:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 14:54:17 fir-io7-s1 kernel: LNetError: 77058:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 213 previous similar messages Mar 11 15:01:07 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.236@o2ib7: 0 seconds Mar 11 15:01:07 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 202 previous similar messages Mar 11 15:02:52 fir-io7-s1 kernel: LNetError: 77058:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 11 15:02:52 fir-io7-s1 kernel: LNetError: 77058:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 391 previous similar messages Mar 11 15:04:22 fir-io7-s1 kernel: LNetError: 77435:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 15:04:22 fir-io7-s1 kernel: LNetError: 77435:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 173 previous similar messages Mar 11 15:11:43 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.239@o2ib7: 0 seconds Mar 11 15:11:43 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 102 previous similar messages Mar 11 15:12:52 fir-io7-s1 kernel: LNetError: 77664:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 11 15:12:52 fir-io7-s1 kernel: LNetError: 77664:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 349 previous similar messages Mar 11 15:14:22 fir-io7-s1 kernel: LNetError: 77664:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 15:14:22 fir-io7-s1 kernel: LNetError: 77664:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 156 previous similar messages Mar 11 15:21:53 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.236@o2ib7: 0 seconds Mar 11 15:21:53 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 148 previous similar messages Mar 11 15:22:52 fir-io7-s1 kernel: LNetError: 60000:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 11 15:22:52 fir-io7-s1 kernel: LNetError: 60000:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 419 previous similar messages Mar 11 15:24:22 fir-io7-s1 kernel: LNetError: 77560:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 15:24:22 fir-io7-s1 kernel: LNetError: 77560:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 151 previous similar messages Mar 11 15:32:08 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 132 seconds Mar 11 15:32:08 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 60 previous similar messages Mar 11 15:32:52 fir-io7-s1 kernel: LNetError: 77924:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 11 15:32:52 fir-io7-s1 kernel: LNetError: 77924:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 417 previous similar messages Mar 11 15:34:22 fir-io7-s1 kernel: LNetError: 77924:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 15:34:22 fir-io7-s1 kernel: LNetError: 77924:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 129 previous similar messages Mar 11 15:42:22 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 0 seconds Mar 11 15:42:22 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 165 previous similar messages Mar 11 15:42:57 fir-io7-s1 kernel: LNetError: 78617:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 11 15:42:57 fir-io7-s1 kernel: LNetError: 78617:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 417 previous similar messages Mar 11 15:44:27 fir-io7-s1 kernel: LNetError: 78840:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 15:44:27 fir-io7-s1 kernel: LNetError: 78840:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 147 previous similar messages Mar 11 15:50:14 fir-io7-s1 kernel: Lustre: fir-OST0052: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) Mar 11 15:50:14 fir-io7-s1 kernel: Lustre: fir-OST004e: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) Mar 11 15:50:14 fir-io7-s1 kernel: Lustre: Skipped 3 previous similar messages Mar 11 15:50:59 fir-io7-s1 kernel: Lustre: fir-OST0050: haven't heard from client f97c9058-7bce-4 (at 10.49.0.63@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c698f5e0400, cur 1583967059 expire 1583966909 last 1583966832 Mar 11 15:50:59 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 11 15:52:23 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.219@o2ib7: 0 seconds Mar 11 15:52:23 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 88 previous similar messages Mar 11 15:52:57 fir-io7-s1 kernel: LNetError: 78840:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 11 15:52:57 fir-io7-s1 kernel: LNetError: 78840:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 417 previous similar messages Mar 11 15:54:27 fir-io7-s1 kernel: LNetError: 79184:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 15:54:27 fir-io7-s1 kernel: LNetError: 79184:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 127 previous similar messages Mar 11 16:02:28 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.236@o2ib7: 0 seconds Mar 11 16:02:28 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 147 previous similar messages Mar 11 16:03:02 fir-io7-s1 kernel: LNetError: 79550:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 11 16:03:02 fir-io7-s1 kernel: LNetError: 79550:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 417 previous similar messages Mar 11 16:04:32 fir-io7-s1 kernel: LNetError: 79550:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 16:04:32 fir-io7-s1 kernel: LNetError: 79550:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 139 previous similar messages Mar 11 16:12:29 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.233@o2ib7: 1 seconds Mar 11 16:12:29 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 84 previous similar messages Mar 11 16:13:02 fir-io7-s1 kernel: LNetError: 79760:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 11 16:13:02 fir-io7-s1 kernel: LNetError: 79760:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 417 previous similar messages Mar 11 16:14:32 fir-io7-s1 kernel: LNetError: 79760:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 16:14:32 fir-io7-s1 kernel: LNetError: 79760:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 138 previous similar messages Mar 11 16:20:26 fir-io7-s1 kernel: Lustre: fir-OST0052: Connection restored to e225f3d7-7aff-4 (at 10.50.0.62@o2ib2) Mar 11 16:20:26 fir-io7-s1 kernel: Lustre: Skipped 3 previous similar messages Mar 11 16:22:37 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 0 seconds Mar 11 16:22:37 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 65 previous similar messages Mar 11 16:23:07 fir-io7-s1 kernel: LNetError: 80190:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 11 16:23:07 fir-io7-s1 kernel: LNetError: 80190:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 417 previous similar messages Mar 11 16:24:37 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 16:24:37 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 133 previous similar messages Mar 11 16:32:48 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.227@o2ib7: 0 seconds Mar 11 16:32:48 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 116 previous similar messages Mar 11 16:33:07 fir-io7-s1 kernel: LNetError: 80497:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 11 16:33:07 fir-io7-s1 kernel: LNetError: 80497:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 417 previous similar messages Mar 11 16:34:37 fir-io7-s1 kernel: LNetError: 80497:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 16:34:37 fir-io7-s1 kernel: LNetError: 80497:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 138 previous similar messages Mar 11 16:38:18 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) Mar 11 16:38:18 fir-io7-s1 kernel: Lustre: Skipped 7 previous similar messages Mar 11 16:39:12 fir-io7-s1 kernel: Lustre: fir-OST004c: haven't heard from client bbd23226-f0ab-4 (at 10.49.0.63@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c766cf66000, cur 1583969952 expire 1583969802 last 1583969725 Mar 11 16:39:12 fir-io7-s1 kernel: Lustre: Skipped 11 previous similar messages Mar 11 16:39:22 fir-io7-s1 kernel: Lustre: fir-OST004a: haven't heard from client bbd23226-f0ab-4 (at 10.49.0.63@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c79cd34a400, cur 1583969962 expire 1583969812 last 1583969735 Mar 11 16:39:22 fir-io7-s1 kernel: Lustre: Skipped 3 previous similar messages Mar 11 16:43:12 fir-io7-s1 kernel: LNetError: 45595:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 11 16:43:12 fir-io7-s1 kernel: LNetError: 45595:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 419 previous similar messages Mar 11 16:43:22 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.227@o2ib7: 0 seconds Mar 11 16:43:22 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 73 previous similar messages Mar 11 16:44:37 fir-io7-s1 kernel: Lustre: fir-OST0050: haven't heard from client 9faebe15-a529-4 (at 10.49.0.63@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c773ef62400, cur 1583970277 expire 1583970127 last 1583970050 Mar 11 16:44:37 fir-io7-s1 kernel: Lustre: Skipped 1 previous similar message Mar 11 16:44:42 fir-io7-s1 kernel: LNetError: 80987:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 16:44:42 fir-io7-s1 kernel: LNetError: 80987:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 138 previous similar messages Mar 11 16:53:12 fir-io7-s1 kernel: LNetError: 80987:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 11 16:53:12 fir-io7-s1 kernel: LNetError: 80987:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 417 previous similar messages Mar 11 16:53:33 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 0 seconds Mar 11 16:53:33 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 144 previous similar messages Mar 11 16:54:42 fir-io7-s1 kernel: LNetError: 80921:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 16:54:42 fir-io7-s1 kernel: LNetError: 80921:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 141 previous similar messages Mar 11 17:03:13 fir-io7-s1 kernel: LNetError: 81690:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 11 17:03:13 fir-io7-s1 kernel: LNetError: 81690:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 418 previous similar messages Mar 11 17:04:08 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 0 seconds Mar 11 17:04:08 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 67 previous similar messages Mar 11 17:04:43 fir-io7-s1 kernel: LNetError: 45595:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 17:04:43 fir-io7-s1 kernel: LNetError: 45595:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 130 previous similar messages Mar 11 17:13:13 fir-io7-s1 kernel: LNetError: 81958:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 11 17:13:13 fir-io7-s1 kernel: LNetError: 81958:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 417 previous similar messages Mar 11 17:14:13 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.225@o2ib7: 0 seconds Mar 11 17:14:13 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 138 previous similar messages Mar 11 17:14:43 fir-io7-s1 kernel: LNetError: 81751:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 17:14:43 fir-io7-s1 kernel: LNetError: 81751:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 142 previous similar messages Mar 11 17:23:14 fir-io7-s1 kernel: LNetError: 81958:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 11 17:23:14 fir-io7-s1 kernel: LNetError: 81958:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 417 previous similar messages Mar 11 17:23:28 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 7ef34a8a-27c8-4 (at 10.49.0.63@o2ib1) Mar 11 17:23:28 fir-io7-s1 kernel: Lustre: Skipped 4 previous similar messages Mar 11 17:24:48 fir-io7-s1 kernel: LNetError: 82309:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 17:24:48 fir-io7-s1 kernel: LNetError: 82309:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 128 previous similar messages Mar 11 17:25:23 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.233@o2ib7: 0 seconds Mar 11 17:25:23 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 46 previous similar messages Mar 11 17:33:18 fir-io7-s1 kernel: LNetError: 82617:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 11 17:33:18 fir-io7-s1 kernel: LNetError: 82617:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 417 previous similar messages Mar 11 17:34:48 fir-io7-s1 kernel: LNetError: 82617:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 17:34:48 fir-io7-s1 kernel: LNetError: 82617:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 135 previous similar messages Mar 11 17:35:49 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 0 seconds Mar 11 17:35:49 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 125 previous similar messages Mar 11 17:43:18 fir-io7-s1 kernel: LNetError: 82800:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 11 17:43:18 fir-io7-s1 kernel: LNetError: 82800:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 417 previous similar messages Mar 11 17:44:49 fir-io7-s1 kernel: LNetError: 82415:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 17:44:49 fir-io7-s1 kernel: LNetError: 82415:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 130 previous similar messages Mar 11 17:46:47 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.239@o2ib7: 0 seconds Mar 11 17:46:47 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 52 previous similar messages Mar 11 17:53:23 fir-io7-s1 kernel: LNetError: 83397:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 11 17:53:23 fir-io7-s1 kernel: LNetError: 83397:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 417 previous similar messages Mar 11 17:54:53 fir-io7-s1 kernel: LNetError: 82415:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 17:54:53 fir-io7-s1 kernel: LNetError: 82415:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 140 previous similar messages Mar 11 17:57:08 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.239@o2ib7: 0 seconds Mar 11 17:57:08 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 164 previous similar messages Mar 11 18:03:23 fir-io7-s1 kernel: LNetError: 83802:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 11 18:03:23 fir-io7-s1 kernel: LNetError: 83802:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 417 previous similar messages Mar 11 18:04:53 fir-io7-s1 kernel: LNetError: 83802:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 18:04:53 fir-io7-s1 kernel: LNetError: 83802:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 127 previous similar messages Mar 11 18:07:23 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds Mar 11 18:07:23 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 31 previous similar messages Mar 11 18:13:28 fir-io7-s1 kernel: LNetError: 83802:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 11 18:13:28 fir-io7-s1 kernel: LNetError: 83802:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 417 previous similar messages Mar 11 18:14:58 fir-io7-s1 kernel: LNetError: 84184:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 18:14:58 fir-io7-s1 kernel: LNetError: 84184:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 141 previous similar messages Mar 11 18:17:48 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds Mar 11 18:17:48 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 181 previous similar messages Mar 11 18:23:28 fir-io7-s1 kernel: LNetError: 84184:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 11 18:23:28 fir-io7-s1 kernel: LNetError: 84184:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 417 previous similar messages Mar 11 18:24:58 fir-io7-s1 kernel: LNetError: 84184:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 18:24:58 fir-io7-s1 kernel: LNetError: 84184:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 132 previous similar messages Mar 11 18:28:19 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 0 seconds Mar 11 18:28:19 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 65 previous similar messages Mar 11 18:33:33 fir-io7-s1 kernel: LNetError: 84594:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 11 18:33:33 fir-io7-s1 kernel: LNetError: 84594:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 417 previous similar messages Mar 11 18:35:03 fir-io7-s1 kernel: LNetError: 84865:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 18:35:03 fir-io7-s1 kernel: LNetError: 84865:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 126 previous similar messages Mar 11 18:38:23 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 0 seconds Mar 11 18:38:23 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 165 previous similar messages Mar 11 18:43:33 fir-io7-s1 kernel: LNetError: 84594:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 11 18:43:33 fir-io7-s1 kernel: LNetError: 84594:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 417 previous similar messages Mar 11 18:45:03 fir-io7-s1 kernel: LNetError: 84865:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 18:45:03 fir-io7-s1 kernel: LNetError: 84865:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 145 previous similar messages Mar 11 18:48:58 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.217@o2ib7: 0 seconds Mar 11 18:48:58 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 104 previous similar messages Mar 11 18:53:33 fir-io7-s1 kernel: LNetError: 85249:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 11 18:53:33 fir-io7-s1 kernel: LNetError: 85249:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 417 previous similar messages Mar 11 18:55:03 fir-io7-s1 kernel: LNetError: 85249:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 18:55:03 fir-io7-s1 kernel: LNetError: 85249:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 131 previous similar messages Mar 11 18:59:38 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 0 seconds Mar 11 18:59:38 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 57 previous similar messages Mar 11 19:03:38 fir-io7-s1 kernel: LNetError: 85249:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 11 19:03:38 fir-io7-s1 kernel: LNetError: 85249:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 417 previous similar messages Mar 11 19:05:08 fir-io7-s1 kernel: LNetError: 85249:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 19:05:08 fir-io7-s1 kernel: LNetError: 85249:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 140 previous similar messages Mar 11 19:09:44 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds Mar 11 19:09:44 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 129 previous similar messages Mar 11 19:13:38 fir-io7-s1 kernel: LNetError: 86296:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 11 19:13:38 fir-io7-s1 kernel: LNetError: 86296:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 417 previous similar messages Mar 11 19:15:08 fir-io7-s1 kernel: LNetError: 86228:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 19:15:08 fir-io7-s1 kernel: LNetError: 86228:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 129 previous similar messages Mar 11 19:20:13 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.232@o2ib7: 0 seconds Mar 11 19:20:13 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 64 previous similar messages Mar 11 19:23:43 fir-io7-s1 kernel: LNetError: 86296:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 11 19:23:43 fir-io7-s1 kernel: LNetError: 86296:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 417 previous similar messages Mar 11 19:25:13 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 19:25:13 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 131 previous similar messages Mar 11 19:30:34 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.232@o2ib7: 0 seconds Mar 11 19:30:34 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 138 previous similar messages Mar 11 19:33:43 fir-io7-s1 kernel: LNetError: 86905:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 11 19:33:43 fir-io7-s1 kernel: LNetError: 86905:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 417 previous similar messages Mar 11 19:35:13 fir-io7-s1 kernel: LNetError: 87076:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 19:35:13 fir-io7-s1 kernel: LNetError: 87076:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 134 previous similar messages Mar 11 19:41:54 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.232@o2ib7: 0 seconds Mar 11 19:41:54 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 59 previous similar messages Mar 11 19:43:44 fir-io7-s1 kernel: LNetError: 86905:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 11 19:43:44 fir-io7-s1 kernel: LNetError: 86905:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 417 previous similar messages Mar 11 19:45:14 fir-io7-s1 kernel: LNetError: 84965:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 19:45:14 fir-io7-s1 kernel: LNetError: 84965:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 130 previous similar messages Mar 11 19:52:10 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.239@o2ib7: 0 seconds Mar 11 19:52:10 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 91 previous similar messages Mar 11 19:53:44 fir-io7-s1 kernel: LNetError: 86905:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 11 19:53:44 fir-io7-s1 kernel: LNetError: 86905:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 417 previous similar messages Mar 11 19:55:14 fir-io7-s1 kernel: LNetError: 86890:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 19:55:14 fir-io7-s1 kernel: LNetError: 86890:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 139 previous similar messages Mar 11 20:03:25 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.235@o2ib7: 0 seconds Mar 11 20:03:25 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 68 previous similar messages Mar 11 20:03:45 fir-io7-s1 kernel: LNetError: 86905:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 11 20:03:45 fir-io7-s1 kernel: LNetError: 86905:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 417 previous similar messages Mar 11 20:05:15 fir-io7-s1 kernel: LNetError: 86905:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 20:05:15 fir-io7-s1 kernel: LNetError: 86905:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 128 previous similar messages Mar 11 20:13:45 fir-io7-s1 kernel: LNetError: 88323:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 11 20:13:45 fir-io7-s1 kernel: LNetError: 88323:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 417 previous similar messages Mar 11 20:14:00 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.239@o2ib7: 0 seconds Mar 11 20:14:00 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 173 previous similar messages Mar 11 20:15:15 fir-io7-s1 kernel: LNetError: 87884:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 20:15:15 fir-io7-s1 kernel: LNetError: 87884:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 146 previous similar messages Mar 11 20:23:45 fir-io7-s1 kernel: LNetError: 88783:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 11 20:23:45 fir-io7-s1 kernel: LNetError: 88783:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 417 previous similar messages Mar 11 20:25:15 fir-io7-s1 kernel: LNetError: 88783:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 20:25:15 fir-io7-s1 kernel: LNetError: 88783:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 133 previous similar messages Mar 11 20:25:46 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.232@o2ib7: 1 seconds Mar 11 20:25:46 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 92 previous similar messages Mar 11 20:33:50 fir-io7-s1 kernel: LNetError: 88783:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 11 20:33:50 fir-io7-s1 kernel: LNetError: 88783:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 417 previous similar messages Mar 11 20:35:20 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 20:35:20 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 136 previous similar messages Mar 11 20:35:46 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.219@o2ib7: 0 seconds Mar 11 20:35:46 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 133 previous similar messages Mar 11 20:43:50 fir-io7-s1 kernel: LNetError: 72346:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 11 20:43:50 fir-io7-s1 kernel: LNetError: 72346:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 419 previous similar messages Mar 11 20:45:20 fir-io7-s1 kernel: LNetError: 89405:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 20:45:20 fir-io7-s1 kernel: LNetError: 89405:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 134 previous similar messages Mar 11 20:45:56 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.239@o2ib7: 2 seconds Mar 11 20:45:56 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 82 previous similar messages Mar 11 20:53:51 fir-io7-s1 kernel: LNetError: 89632:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 11 20:53:51 fir-io7-s1 kernel: LNetError: 89632:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 417 previous similar messages Mar 11 20:55:25 fir-io7-s1 kernel: LNetError: 89525:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 20:55:25 fir-io7-s1 kernel: LNetError: 89525:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 128 previous similar messages Mar 11 20:56:22 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.219@o2ib7: 0 seconds Mar 11 20:56:22 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 49 previous similar messages Mar 11 21:03:55 fir-io7-s1 kernel: LNetError: 89632:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 11 21:03:55 fir-io7-s1 kernel: LNetError: 89632:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 417 previous similar messages Mar 11 21:05:25 fir-io7-s1 kernel: LNetError: 90282:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 21:05:25 fir-io7-s1 kernel: LNetError: 90282:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 140 previous similar messages Mar 11 21:06:30 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.239@o2ib7: 0 seconds Mar 11 21:06:30 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 134 previous similar messages Mar 11 21:13:55 fir-io7-s1 kernel: LNetError: 90534:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 11 21:13:55 fir-io7-s1 kernel: LNetError: 90534:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 417 previous similar messages Mar 11 21:15:25 fir-io7-s1 kernel: LNetError: 90534:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 21:15:25 fir-io7-s1 kernel: LNetError: 90534:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 127 previous similar messages Mar 11 21:16:41 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.236@o2ib7: 0 seconds Mar 11 21:16:41 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 46 previous similar messages Mar 11 21:24:00 fir-io7-s1 kernel: LNetError: 90534:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 11 21:24:00 fir-io7-s1 kernel: LNetError: 90534:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 417 previous similar messages Mar 11 21:25:26 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 21:25:26 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 130 previous similar messages Mar 11 21:27:45 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.238@o2ib7: 0 seconds Mar 11 21:27:45 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 49 previous similar messages Mar 11 21:34:00 fir-io7-s1 kernel: LNetError: 91042:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 11 21:34:00 fir-io7-s1 kernel: LNetError: 91042:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 417 previous similar messages Mar 11 21:35:30 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 21:35:30 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 142 previous similar messages Mar 11 21:37:46 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.233@o2ib7: 0 seconds Mar 11 21:37:46 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 166 previous similar messages Mar 11 21:44:00 fir-io7-s1 kernel: LNetError: 91395:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 11 21:44:00 fir-io7-s1 kernel: LNetError: 91395:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 417 previous similar messages Mar 11 21:45:30 fir-io7-s1 kernel: LNetError: 91747:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 21:45:30 fir-io7-s1 kernel: LNetError: 91747:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 154 previous similar messages Mar 11 21:48:06 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.239@o2ib7: 0 seconds Mar 11 21:48:06 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 85 previous similar messages Mar 11 21:54:00 fir-io7-s1 kernel: LNetError: 91747:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 11 21:54:00 fir-io7-s1 kernel: LNetError: 91747:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 417 previous similar messages Mar 11 21:55:30 fir-io7-s1 kernel: LNetError: 92092:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 21:55:30 fir-io7-s1 kernel: LNetError: 92092:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 139 previous similar messages Mar 11 21:59:15 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.239@o2ib7: 0 seconds Mar 11 21:59:15 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 38 previous similar messages Mar 11 22:04:00 fir-io7-s1 kernel: LNetError: 92092:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 11 22:04:00 fir-io7-s1 kernel: LNetError: 92092:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 417 previous similar messages Mar 11 22:05:30 fir-io7-s1 kernel: LNetError: 92456:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 22:05:30 fir-io7-s1 kernel: LNetError: 92456:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 131 previous similar messages Mar 11 22:10:24 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.217@o2ib7: 0 seconds Mar 11 22:10:24 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 58 previous similar messages Mar 11 22:14:01 fir-io7-s1 kernel: LNetError: 92650:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 11 22:14:01 fir-io7-s1 kernel: LNetError: 92650:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 417 previous similar messages Mar 11 22:15:35 fir-io7-s1 kernel: LNetError: 91100:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 22:15:35 fir-io7-s1 kernel: LNetError: 91100:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 133 previous similar messages Mar 11 22:20:54 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.233@o2ib7: 0 seconds Mar 11 22:20:54 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 22 previous similar messages Mar 11 22:24:15 fir-io7-s1 kernel: LNetError: 92942:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 11 22:24:15 fir-io7-s1 kernel: LNetError: 92942:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 406 previous similar messages Mar 11 22:25:35 fir-io7-s1 kernel: LNetError: 92479:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 22:25:35 fir-io7-s1 kernel: LNetError: 92479:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 132 previous similar messages Mar 11 22:31:00 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 0 seconds Mar 11 22:31:00 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 328 previous similar messages Mar 11 22:34:15 fir-io7-s1 kernel: LNetError: 93182:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 11 22:34:15 fir-io7-s1 kernel: LNetError: 93182:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 371 previous similar messages Mar 11 22:35:40 fir-io7-s1 kernel: LNetError: 93527:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 22:35:40 fir-io7-s1 kernel: LNetError: 93527:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 164 previous similar messages Mar 11 22:41:10 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 0 seconds Mar 11 22:41:10 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 302 previous similar messages Mar 11 22:44:20 fir-io7-s1 kernel: LNetError: 93527:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 11 22:44:20 fir-io7-s1 kernel: LNetError: 93527:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 392 previous similar messages Mar 11 22:45:40 fir-io7-s1 kernel: LNetError: 92479:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 22:45:40 fir-io7-s1 kernel: LNetError: 92479:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 169 previous similar messages Mar 11 22:52:25 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.236@o2ib7: 0 seconds Mar 11 22:52:25 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 64 previous similar messages Mar 11 22:54:20 fir-io7-s1 kernel: LNetError: 91381:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 11 22:54:20 fir-io7-s1 kernel: LNetError: 91381:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 405 previous similar messages Mar 11 22:55:40 fir-io7-s1 kernel: LNetError: 94126:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 22:55:40 fir-io7-s1 kernel: LNetError: 94126:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 145 previous similar messages Mar 11 23:03:30 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.226@o2ib7: 0 seconds Mar 11 23:03:30 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 54 previous similar messages Mar 11 23:04:20 fir-io7-s1 kernel: LNetError: 94126:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 11 23:04:20 fir-io7-s1 kernel: LNetError: 94126:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 399 previous similar messages Mar 11 23:05:40 fir-io7-s1 kernel: LNetError: 94588:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 23:05:40 fir-io7-s1 kernel: LNetError: 94588:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 140 previous similar messages Mar 11 23:13:35 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.239@o2ib7: 0 seconds Mar 11 23:13:35 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 46 previous similar messages Mar 11 23:14:20 fir-io7-s1 kernel: LNetError: 94773:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 11 23:14:20 fir-io7-s1 kernel: LNetError: 94773:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 356 previous similar messages Mar 11 23:15:40 fir-io7-s1 kernel: LNetError: 94773:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 23:15:40 fir-io7-s1 kernel: LNetError: 94773:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 156 previous similar messages Mar 11 23:24:20 fir-io7-s1 kernel: LNetError: 94974:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 11 23:24:20 fir-io7-s1 kernel: LNetError: 94974:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 380 previous similar messages Mar 11 23:24:29 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.226@o2ib7: 0 seconds Mar 11 23:24:29 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 57 previous similar messages Mar 11 23:25:40 fir-io7-s1 kernel: LNetError: 95290:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 23:25:40 fir-io7-s1 kernel: LNetError: 95290:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 139 previous similar messages Mar 11 23:34:20 fir-io7-s1 kernel: LNetError: 95290:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 11 23:34:20 fir-io7-s1 kernel: LNetError: 95290:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 365 previous similar messages Mar 11 23:34:40 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 0 seconds Mar 11 23:34:40 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 48 previous similar messages Mar 11 23:35:40 fir-io7-s1 kernel: LNetError: 91381:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 23:35:40 fir-io7-s1 kernel: LNetError: 91381:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 148 previous similar messages Mar 11 23:44:21 fir-io7-s1 kernel: LNetError: 95642:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 11 23:44:21 fir-io7-s1 kernel: LNetError: 95642:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 308 previous similar messages Mar 11 23:44:51 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.219@o2ib7: 0 seconds Mar 11 23:44:51 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 69 previous similar messages Mar 11 23:45:41 fir-io7-s1 kernel: LNetError: 95990:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 23:45:41 fir-io7-s1 kernel: LNetError: 95990:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 157 previous similar messages Mar 11 23:54:30 fir-io7-s1 kernel: LNetError: 96200:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 11 23:54:30 fir-io7-s1 kernel: LNetError: 96200:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 330 previous similar messages Mar 11 23:54:56 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 0 seconds Mar 11 23:54:56 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 38 previous similar messages Mar 11 23:55:43 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 11 23:55:43 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 143 previous similar messages Mar 12 00:04:35 fir-io7-s1 kernel: LNetError: 96421:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 12 00:04:35 fir-io7-s1 kernel: LNetError: 96421:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 327 previous similar messages Mar 12 00:05:09 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.239@o2ib7: 0 seconds Mar 12 00:05:09 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 86 previous similar messages Mar 12 00:05:45 fir-io7-s1 kernel: LNetError: 96421:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 00:05:45 fir-io7-s1 kernel: LNetError: 96421:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 159 previous similar messages Mar 12 00:14:35 fir-io7-s1 kernel: LNetError: 96421:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 12 00:14:35 fir-io7-s1 kernel: LNetError: 96421:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 341 previous similar messages Mar 12 00:15:18 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.226@o2ib7: 0 seconds Mar 12 00:15:18 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 102 previous similar messages Mar 12 00:15:45 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 00:15:45 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 151 previous similar messages Mar 12 00:24:35 fir-io7-s1 kernel: LNetError: 97423:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 12 00:24:35 fir-io7-s1 kernel: LNetError: 97423:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 317 previous similar messages Mar 12 00:25:20 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.225@o2ib7: 0 seconds Mar 12 00:25:20 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 656 previous similar messages Mar 12 00:25:45 fir-io7-s1 kernel: LNetError: 97423:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 00:25:45 fir-io7-s1 kernel: LNetError: 97423:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 144 previous similar messages Mar 12 00:34:45 fir-io7-s1 kernel: LNetError: 97423:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 12 00:34:45 fir-io7-s1 kernel: LNetError: 97423:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 334 previous similar messages Mar 12 00:35:32 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.233@o2ib7: 3 seconds Mar 12 00:35:32 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 481 previous similar messages Mar 12 00:35:45 fir-io7-s1 kernel: LNetError: 96023:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 00:35:45 fir-io7-s1 kernel: LNetError: 96023:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 167 previous similar messages Mar 12 00:38:52 fir-io7-s1 kernel: md: md4: data-check done. Mar 12 00:44:45 fir-io7-s1 kernel: LNetError: 97423:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 12 00:44:45 fir-io7-s1 kernel: LNetError: 97423:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 304 previous similar messages Mar 12 00:45:45 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.226@o2ib7: 0 seconds Mar 12 00:45:45 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 436 previous similar messages Mar 12 00:45:45 fir-io7-s1 kernel: LNetError: 98142:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 00:45:45 fir-io7-s1 kernel: LNetError: 98142:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 179 previous similar messages Mar 12 00:54:55 fir-io7-s1 kernel: LNetError: 98142:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 12 00:54:55 fir-io7-s1 kernel: LNetError: 98142:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 318 previous similar messages Mar 12 00:55:45 fir-io7-s1 kernel: LNetError: 98499:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 00:55:45 fir-io7-s1 kernel: LNetError: 98499:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 190 previous similar messages Mar 12 00:55:51 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.233@o2ib7: 0 seconds Mar 12 00:55:51 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 361 previous similar messages Mar 12 01:04:55 fir-io7-s1 kernel: LNetError: 98499:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 12 01:04:55 fir-io7-s1 kernel: LNetError: 98499:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 299 previous similar messages Mar 12 01:05:45 fir-io7-s1 kernel: LNetError: 98888:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 01:05:45 fir-io7-s1 kernel: LNetError: 98888:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 213 previous similar messages Mar 12 01:05:52 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.219@o2ib7: 3 seconds Mar 12 01:05:52 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 371 previous similar messages Mar 12 01:15:00 fir-io7-s1 kernel: LNetError: 98888:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 12 01:15:00 fir-io7-s1 kernel: LNetError: 98888:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 309 previous similar messages Mar 12 01:15:45 fir-io7-s1 kernel: LNetError: 99241:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 01:15:45 fir-io7-s1 kernel: LNetError: 99241:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 211 previous similar messages Mar 12 01:15:53 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.233@o2ib7: 0 seconds Mar 12 01:15:53 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 385 previous similar messages Mar 12 01:25:00 fir-io7-s1 kernel: LNetError: 99241:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 12 01:25:00 fir-io7-s1 kernel: LNetError: 99241:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 12 01:25:45 fir-io7-s1 kernel: LNetError: 99588:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 01:25:45 fir-io7-s1 kernel: LNetError: 99588:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 217 previous similar messages Mar 12 01:26:01 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.233@o2ib7: 4 seconds Mar 12 01:26:01 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 389 previous similar messages Mar 12 01:35:00 fir-io7-s1 kernel: LNetError: 99588:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 12 01:35:00 fir-io7-s1 kernel: LNetError: 99588:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 288 previous similar messages Mar 12 01:35:45 fir-io7-s1 kernel: LNetError: 99941:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 01:35:45 fir-io7-s1 kernel: LNetError: 99941:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 206 previous similar messages Mar 12 01:36:06 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 4 seconds Mar 12 01:36:06 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 343 previous similar messages Mar 12 01:45:10 fir-io7-s1 kernel: LNetError: 100326:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 12 01:45:10 fir-io7-s1 kernel: LNetError: 100326:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 291 previous similar messages Mar 12 01:45:45 fir-io7-s1 kernel: LNetError: 99860:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 01:45:45 fir-io7-s1 kernel: LNetError: 99860:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 206 previous similar messages Mar 12 01:46:11 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.225@o2ib7: 5 seconds Mar 12 01:46:11 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 306 previous similar messages Mar 12 01:55:10 fir-io7-s1 kernel: LNetError: 100720:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 12 01:55:10 fir-io7-s1 kernel: LNetError: 100720:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 311 previous similar messages Mar 12 01:55:45 fir-io7-s1 kernel: LNetError: 100327:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 01:55:45 fir-io7-s1 kernel: LNetError: 100327:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 191 previous similar messages Mar 12 01:56:22 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.226@o2ib7: 3 seconds Mar 12 01:56:22 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 244 previous similar messages Mar 12 02:05:10 fir-io7-s1 kernel: LNetError: 101018:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 12 02:05:10 fir-io7-s1 kernel: LNetError: 101018:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 336 previous similar messages Mar 12 02:05:45 fir-io7-s1 kernel: LNetError: 101018:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 02:05:45 fir-io7-s1 kernel: LNetError: 101018:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 224 previous similar messages Mar 12 02:06:33 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 1 seconds Mar 12 02:06:33 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 520 previous similar messages Mar 12 02:15:10 fir-io7-s1 kernel: LNetError: 101018:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 12 02:15:10 fir-io7-s1 kernel: LNetError: 101018:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 332 previous similar messages Mar 12 02:15:45 fir-io7-s1 kernel: LNetError: 101505:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 02:15:45 fir-io7-s1 kernel: LNetError: 101505:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 215 previous similar messages Mar 12 02:16:41 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 3 seconds Mar 12 02:16:41 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 449 previous similar messages Mar 12 02:25:10 fir-io7-s1 kernel: LNetError: 101505:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 12 02:25:10 fir-io7-s1 kernel: LNetError: 101505:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 308 previous similar messages Mar 12 02:25:45 fir-io7-s1 kernel: LNetError: 100327:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 02:25:45 fir-io7-s1 kernel: LNetError: 100327:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 183 previous similar messages Mar 12 02:26:45 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 1 seconds Mar 12 02:26:45 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 403 previous similar messages Mar 12 02:35:15 fir-io7-s1 kernel: LNetError: 101855:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 12 02:35:15 fir-io7-s1 kernel: LNetError: 101855:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 301 previous similar messages Mar 12 02:35:50 fir-io7-s1 kernel: LNetError: 100327:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 02:35:50 fir-io7-s1 kernel: LNetError: 100327:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 208 previous similar messages Mar 12 02:36:46 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 6 seconds Mar 12 02:36:46 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 390 previous similar messages Mar 12 02:45:25 fir-io7-s1 kernel: LNetError: 102437:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 12 02:45:25 fir-io7-s1 kernel: LNetError: 102437:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 12 02:45:55 fir-io7-s1 kernel: LNetError: 102437:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 02:45:55 fir-io7-s1 kernel: LNetError: 102437:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 149 previous similar messages Mar 12 02:46:56 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.237@o2ib7: 0 seconds Mar 12 02:46:56 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 562 previous similar messages Mar 12 02:55:25 fir-io7-s1 kernel: LNetError: 102668:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 12 02:55:25 fir-io7-s1 kernel: LNetError: 102668:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 12 02:55:55 fir-io7-s1 kernel: LNetError: 102537:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 02:55:55 fir-io7-s1 kernel: LNetError: 102537:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 200 previous similar messages Mar 12 02:57:01 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.233@o2ib7: 0 seconds Mar 12 02:57:01 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 455 previous similar messages Mar 12 03:05:25 fir-io7-s1 kernel: LNetError: 103146:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 12 03:05:25 fir-io7-s1 kernel: LNetError: 103146:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 12 03:05:55 fir-io7-s1 kernel: LNetError: 103146:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 03:05:55 fir-io7-s1 kernel: LNetError: 103146:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 192 previous similar messages Mar 12 03:07:11 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.217@o2ib7: 2 seconds Mar 12 03:07:11 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 367 previous similar messages Mar 12 03:15:25 fir-io7-s1 kernel: LNetError: 103146:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 12 03:15:25 fir-io7-s1 kernel: LNetError: 103146:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 277 previous similar messages Mar 12 03:15:55 fir-io7-s1 kernel: LNetError: 103253:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 03:15:55 fir-io7-s1 kernel: LNetError: 103253:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 128 previous similar messages Mar 12 03:17:14 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.236@o2ib7: 0 seconds Mar 12 03:17:14 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 426 previous similar messages Mar 12 03:25:35 fir-io7-s1 kernel: LNetError: 103629:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 12 03:25:35 fir-io7-s1 kernel: LNetError: 103629:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 12 03:26:05 fir-io7-s1 kernel: LNetError: 103981:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 03:26:05 fir-io7-s1 kernel: LNetError: 103981:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 187 previous similar messages Mar 12 03:27:16 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.227@o2ib7: 0 seconds Mar 12 03:27:16 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 382 previous similar messages Mar 12 03:35:35 fir-io7-s1 kernel: LNetError: 103981:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 12 03:35:35 fir-io7-s1 kernel: LNetError: 103981:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 300 previous similar messages Mar 12 03:36:05 fir-io7-s1 kernel: LNetError: 104349:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 03:36:05 fir-io7-s1 kernel: LNetError: 104349:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 191 previous similar messages Mar 12 03:37:17 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.225@o2ib7: 0 seconds Mar 12 03:37:17 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 307 previous similar messages Mar 12 03:45:40 fir-io7-s1 kernel: LNetError: 104349:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 12 03:45:40 fir-io7-s1 kernel: LNetError: 104349:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 293 previous similar messages Mar 12 03:46:10 fir-io7-s1 kernel: LNetError: 104349:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 03:46:10 fir-io7-s1 kernel: LNetError: 104349:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 181 previous similar messages Mar 12 03:47:21 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 7 seconds Mar 12 03:47:21 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 371 previous similar messages Mar 12 03:55:45 fir-io7-s1 kernel: LNetError: 104349:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 12 03:55:45 fir-io7-s1 kernel: LNetError: 104349:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 291 previous similar messages Mar 12 03:56:15 fir-io7-s1 kernel: LNetError: 105050:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 03:56:15 fir-io7-s1 kernel: LNetError: 105050:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 175 previous similar messages Mar 12 03:57:26 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds Mar 12 03:57:26 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 357 previous similar messages Mar 12 04:05:55 fir-io7-s1 kernel: LNetError: 105343:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 12 04:05:55 fir-io7-s1 kernel: LNetError: 105343:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 291 previous similar messages Mar 12 04:06:15 fir-io7-s1 kernel: LNetError: 104908:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 04:06:15 fir-io7-s1 kernel: LNetError: 104908:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 169 previous similar messages Mar 12 04:07:31 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 1 seconds Mar 12 04:07:31 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 442 previous similar messages Mar 12 04:15:55 fir-io7-s1 kernel: LNetError: 91381:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 12 04:15:55 fir-io7-s1 kernel: LNetError: 91381:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 12 04:16:15 fir-io7-s1 kernel: LNetError: 105528:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 04:16:15 fir-io7-s1 kernel: LNetError: 105528:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 165 previous similar messages Mar 12 04:17:50 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.235@o2ib7: 0 seconds Mar 12 04:17:50 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 246 previous similar messages Mar 12 04:25:55 fir-io7-s1 kernel: LNetError: 105800:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 12 04:25:55 fir-io7-s1 kernel: LNetError: 105800:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 291 previous similar messages Mar 12 04:26:15 fir-io7-s1 kernel: LNetError: 106122:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 04:26:15 fir-io7-s1 kernel: LNetError: 106122:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 175 previous similar messages Mar 12 04:28:01 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 0 seconds Mar 12 04:28:01 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 202 previous similar messages Mar 12 04:35:55 fir-io7-s1 kernel: LNetError: 106328:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 12 04:35:55 fir-io7-s1 kernel: LNetError: 106328:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 12 04:36:15 fir-io7-s1 kernel: LNetError: 105873:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 04:36:15 fir-io7-s1 kernel: LNetError: 105873:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 151 previous similar messages Mar 12 04:38:11 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.233@o2ib7: 2 seconds Mar 12 04:38:11 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 478 previous similar messages Mar 12 04:46:00 fir-io7-s1 kernel: LNetError: 106328:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 12 04:46:00 fir-io7-s1 kernel: LNetError: 106328:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 12 04:46:15 fir-io7-s1 kernel: LNetError: 106823:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 04:46:15 fir-io7-s1 kernel: LNetError: 106823:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 130 previous similar messages Mar 12 04:48:12 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.232@o2ib7: 7 seconds Mar 12 04:48:12 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 507 previous similar messages Mar 12 04:56:00 fir-io7-s1 kernel: LNetError: 106823:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 12 04:56:00 fir-io7-s1 kernel: LNetError: 106823:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 280 previous similar messages Mar 12 04:56:15 fir-io7-s1 kernel: LNetError: 107172:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 04:56:15 fir-io7-s1 kernel: LNetError: 107172:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 131 previous similar messages Mar 12 04:58:16 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 5 seconds Mar 12 04:58:16 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 466 previous similar messages Mar 12 05:06:00 fir-io7-s1 kernel: LNetError: 107389:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 12 05:06:00 fir-io7-s1 kernel: LNetError: 107389:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 289 previous similar messages Mar 12 05:06:15 fir-io7-s1 kernel: LNetError: 107389:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 05:06:15 fir-io7-s1 kernel: LNetError: 107389:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 141 previous similar messages Mar 12 05:08:18 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 0 seconds Mar 12 05:08:18 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 330 previous similar messages Mar 12 05:16:00 fir-io7-s1 kernel: LNetError: 107389:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 12 05:16:00 fir-io7-s1 kernel: LNetError: 107389:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 280 previous similar messages Mar 12 05:16:15 fir-io7-s1 kernel: LNetError: 107891:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 05:16:15 fir-io7-s1 kernel: LNetError: 107891:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 205 previous similar messages Mar 12 05:18:19 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.238@o2ib7: 0 seconds Mar 12 05:18:19 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 224 previous similar messages Mar 12 05:26:05 fir-io7-s1 kernel: LNetError: 107891:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 12 05:26:05 fir-io7-s1 kernel: LNetError: 107891:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 290 previous similar messages Mar 12 05:26:15 fir-io7-s1 kernel: LNetError: 107850:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 05:26:15 fir-io7-s1 kernel: LNetError: 107850:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 201 previous similar messages Mar 12 05:28:26 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 4 seconds Mar 12 05:28:26 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 253 previous similar messages Mar 12 05:31:35 fir-io7-s1 kernel: LustreError: 90867:0:(tgt_grant.c:758:tgt_grant_check()) fir-OST0048: cli 26b2dfcf-77f7-4 claims 32768 GRANT, real grant 0 Mar 12 05:36:05 fir-io7-s1 kernel: LNetError: 108256:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 12 05:36:05 fir-io7-s1 kernel: LNetError: 108256:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 280 previous similar messages Mar 12 05:36:19 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 05:36:19 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 174 previous similar messages Mar 12 05:38:32 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.217@o2ib7: 0 seconds Mar 12 05:38:32 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 359 previous similar messages Mar 12 05:46:05 fir-io7-s1 kernel: LNetError: 108604:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 12 05:46:05 fir-io7-s1 kernel: LNetError: 108604:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 12 05:46:20 fir-io7-s1 kernel: LNetError: 108957:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 05:46:20 fir-io7-s1 kernel: LNetError: 108957:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 177 previous similar messages Mar 12 05:48:36 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 3 seconds Mar 12 05:48:36 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 322 previous similar messages Mar 12 05:56:10 fir-io7-s1 kernel: LNetError: 108957:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 12 05:56:10 fir-io7-s1 kernel: LNetError: 108957:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 12 05:56:20 fir-io7-s1 kernel: LNetError: 109312:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 05:56:20 fir-io7-s1 kernel: LNetError: 109312:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 184 previous similar messages Mar 12 05:58:40 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.236@o2ib7: 0 seconds Mar 12 05:58:40 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 349 previous similar messages Mar 12 06:06:10 fir-io7-s1 kernel: LNetError: 109312:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 12 06:06:10 fir-io7-s1 kernel: LNetError: 109312:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 12 06:06:20 fir-io7-s1 kernel: LNetError: 109622:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 06:06:20 fir-io7-s1 kernel: LNetError: 109622:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 188 previous similar messages Mar 12 06:08:53 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.219@o2ib7: 0 seconds Mar 12 06:08:53 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 182 previous similar messages Mar 12 06:16:10 fir-io7-s1 kernel: LNetError: 109990:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 12 06:16:10 fir-io7-s1 kernel: LNetError: 109990:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 12 06:16:25 fir-io7-s1 kernel: LNetError: 109990:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 06:16:25 fir-io7-s1 kernel: LNetError: 109990:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 181 previous similar messages Mar 12 06:18:56 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.233@o2ib7: 3 seconds Mar 12 06:18:56 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 285 previous similar messages Mar 12 06:26:10 fir-io7-s1 kernel: LNetError: 109990:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 12 06:26:10 fir-io7-s1 kernel: LNetError: 109990:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 12 06:26:25 fir-io7-s1 kernel: LNetError: 110380:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 06:26:25 fir-io7-s1 kernel: LNetError: 110380:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 150 previous similar messages Mar 12 06:29:06 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 6 seconds Mar 12 06:29:06 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 442 previous similar messages Mar 12 06:36:20 fir-io7-s1 kernel: LNetError: 110380:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 12 06:36:20 fir-io7-s1 kernel: LNetError: 110380:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 12 06:36:25 fir-io7-s1 kernel: LNetError: 110528:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 06:36:25 fir-io7-s1 kernel: LNetError: 110528:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 160 previous similar messages Mar 12 06:39:16 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 7 seconds Mar 12 06:39:16 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 490 previous similar messages Mar 12 06:46:20 fir-io7-s1 kernel: LNetError: 110736:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 12 06:46:20 fir-io7-s1 kernel: LNetError: 110736:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 12 06:46:25 fir-io7-s1 kernel: LNetError: 110278:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 06:46:25 fir-io7-s1 kernel: LNetError: 110278:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 168 previous similar messages Mar 12 06:49:26 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.226@o2ib7: 5 seconds Mar 12 06:49:26 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 457 previous similar messages Mar 12 06:56:20 fir-io7-s1 kernel: LNetError: 111085:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 12 06:56:20 fir-io7-s1 kernel: LNetError: 111085:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 290 previous similar messages Mar 12 06:56:25 fir-io7-s1 kernel: LNetError: 110278:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 06:56:25 fir-io7-s1 kernel: LNetError: 110278:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 167 previous similar messages Mar 12 06:59:27 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 6 seconds Mar 12 06:59:27 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 320 previous similar messages Mar 12 07:06:25 fir-io7-s1 kernel: LNetError: 111469:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 12 07:06:25 fir-io7-s1 kernel: LNetError: 111469:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 304 previous similar messages Mar 12 07:06:25 fir-io7-s1 kernel: LNetError: 108645:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 07:06:25 fir-io7-s1 kernel: LNetError: 108645:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 187 previous similar messages Mar 12 07:09:36 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 5 seconds Mar 12 07:09:36 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 286 previous similar messages Mar 12 07:16:25 fir-io7-s1 kernel: LNetError: 111354:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 07:16:25 fir-io7-s1 kernel: LNetError: 111354:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 189 previous similar messages Mar 12 07:16:35 fir-io7-s1 kernel: LNetError: 111831:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 12 07:16:35 fir-io7-s1 kernel: LNetError: 111831:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 303 previous similar messages Mar 12 07:19:46 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 2 seconds Mar 12 07:19:46 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 377 previous similar messages Mar 12 07:26:30 fir-io7-s1 kernel: LNetError: 112490:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 07:26:30 fir-io7-s1 kernel: LNetError: 112490:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 182 previous similar messages Mar 12 07:26:45 fir-io7-s1 kernel: LNetError: 112490:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 12 07:26:45 fir-io7-s1 kernel: LNetError: 112490:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 301 previous similar messages Mar 12 07:29:47 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.217@o2ib7: 0 seconds Mar 12 07:29:47 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 414 previous similar messages Mar 12 07:36:35 fir-io7-s1 kernel: LNetError: 112675:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 07:36:35 fir-io7-s1 kernel: LNetError: 112675:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 151 previous similar messages Mar 12 07:36:50 fir-io7-s1 kernel: LNetError: 112887:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 12 07:36:50 fir-io7-s1 kernel: LNetError: 112887:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 12 07:39:56 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.226@o2ib7: 1 seconds Mar 12 07:39:56 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 405 previous similar messages Mar 12 07:46:35 fir-io7-s1 kernel: LNetError: 112846:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 07:46:35 fir-io7-s1 kernel: LNetError: 112846:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 162 previous similar messages Mar 12 07:47:00 fir-io7-s1 kernel: LNetError: 112887:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 12 07:47:00 fir-io7-s1 kernel: LNetError: 112887:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 12 07:49:57 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.219@o2ib7: 6 seconds Mar 12 07:49:57 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 290 previous similar messages Mar 12 07:56:36 fir-io7-s1 kernel: LNetError: 113059:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 07:56:36 fir-io7-s1 kernel: LNetError: 113059:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 160 previous similar messages Mar 12 07:57:06 fir-io7-s1 kernel: LNetError: 113460:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 12 07:57:06 fir-io7-s1 kernel: LNetError: 113460:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 12 07:59:59 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 0 seconds Mar 12 07:59:59 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 373 previous similar messages Mar 12 08:06:38 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 08:06:38 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 186 previous similar messages Mar 12 08:07:20 fir-io7-s1 kernel: LNetError: 113460:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 12 08:07:20 fir-io7-s1 kernel: LNetError: 113460:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 304 previous similar messages Mar 12 08:10:10 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.219@o2ib7: 1 seconds Mar 12 08:10:10 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 292 previous similar messages Mar 12 08:16:39 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 08:16:39 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 203 previous similar messages Mar 12 08:17:20 fir-io7-s1 kernel: LNetError: 91381:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 12 08:17:20 fir-io7-s1 kernel: LNetError: 91381:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 326 previous similar messages Mar 12 08:20:14 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.239@o2ib7: 0 seconds Mar 12 08:20:14 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 412 previous similar messages Mar 12 08:26:40 fir-io7-s1 kernel: LNetError: 113993:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 08:26:40 fir-io7-s1 kernel: LNetError: 113993:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 196 previous similar messages Mar 12 08:27:25 fir-io7-s1 kernel: LNetError: 114668:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 12 08:27:25 fir-io7-s1 kernel: LNetError: 114668:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 295 previous similar messages Mar 12 08:30:20 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.236@o2ib7: 5 seconds Mar 12 08:30:20 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 277 previous similar messages Mar 12 08:36:40 fir-io7-s1 kernel: LNetError: 114668:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 08:36:40 fir-io7-s1 kernel: LNetError: 114668:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 198 previous similar messages Mar 12 08:37:25 fir-io7-s1 kernel: LNetError: 115014:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 12 08:37:25 fir-io7-s1 kernel: LNetError: 115014:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 12 08:40:25 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.233@o2ib7: 9 seconds Mar 12 08:40:25 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 422 previous similar messages Mar 12 08:46:40 fir-io7-s1 kernel: LNetError: 115014:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 08:46:40 fir-io7-s1 kernel: LNetError: 115014:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 167 previous similar messages Mar 12 08:47:25 fir-io7-s1 kernel: LNetError: 115368:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 12 08:47:25 fir-io7-s1 kernel: LNetError: 115368:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 12 08:50:26 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.232@o2ib7: 7 seconds Mar 12 08:50:26 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 537 previous similar messages Mar 12 08:56:40 fir-io7-s1 kernel: LNetError: 115336:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 08:56:40 fir-io7-s1 kernel: LNetError: 115336:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 162 previous similar messages Mar 12 08:57:25 fir-io7-s1 kernel: LNetError: 115368:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 12 08:57:25 fir-io7-s1 kernel: LNetError: 115368:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 301 previous similar messages Mar 12 09:00:30 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.217@o2ib7: 0 seconds Mar 12 09:00:30 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 567 previous similar messages Mar 12 09:06:40 fir-io7-s1 kernel: LNetError: 115368:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 09:06:40 fir-io7-s1 kernel: LNetError: 115368:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 151 previous similar messages Mar 12 09:07:25 fir-io7-s1 kernel: LNetError: 116079:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 12 09:07:25 fir-io7-s1 kernel: LNetError: 116079:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 12 09:10:41 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 2 seconds Mar 12 09:10:41 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 825 previous similar messages Mar 12 09:16:40 fir-io7-s1 kernel: LNetError: 116079:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 09:16:40 fir-io7-s1 kernel: LNetError: 116079:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 72 previous similar messages Mar 12 09:17:35 fir-io7-s1 kernel: LNetError: 116427:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 12 09:17:35 fir-io7-s1 kernel: LNetError: 116427:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 12 09:20:49 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.237@o2ib7: 0 seconds Mar 12 09:20:49 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 756 previous similar messages Mar 12 09:26:45 fir-io7-s1 kernel: LNetError: 116427:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 09:26:45 fir-io7-s1 kernel: LNetError: 116427:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 84 previous similar messages Mar 12 09:27:45 fir-io7-s1 kernel: LNetError: 116779:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 12 09:27:45 fir-io7-s1 kernel: LNetError: 116779:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 301 previous similar messages Mar 12 09:30:50 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.217@o2ib7: 0 seconds Mar 12 09:30:50 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 796 previous similar messages Mar 12 09:36:50 fir-io7-s1 kernel: LNetError: 116779:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 09:36:50 fir-io7-s1 kernel: LNetError: 116779:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 91 previous similar messages Mar 12 09:37:45 fir-io7-s1 kernel: LNetError: 117128:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 12 09:37:45 fir-io7-s1 kernel: LNetError: 117128:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 301 previous similar messages Mar 12 09:40:51 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.236@o2ib7: 2 seconds Mar 12 09:40:51 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 727 previous similar messages Mar 12 09:47:00 fir-io7-s1 kernel: LNetError: 117128:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 09:47:00 fir-io7-s1 kernel: LNetError: 117128:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 104 previous similar messages Mar 12 09:47:55 fir-io7-s1 kernel: LNetError: 117487:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 12 09:47:55 fir-io7-s1 kernel: LNetError: 117487:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 315 previous similar messages Mar 12 09:50:55 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds Mar 12 09:50:55 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 626 previous similar messages Mar 12 09:57:04 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 09:57:04 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 115 previous similar messages Mar 12 09:57:55 fir-io7-s1 kernel: LNetError: 117487:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 12 09:57:55 fir-io7-s1 kernel: LNetError: 117487:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 297 previous similar messages Mar 12 10:01:01 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 7 seconds Mar 12 10:01:01 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 593 previous similar messages Mar 12 10:07:10 fir-io7-s1 kernel: LNetError: 1368:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 10:07:10 fir-io7-s1 kernel: LNetError: 1368:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 117 previous similar messages Mar 12 10:07:55 fir-io7-s1 kernel: LNetError: 118216:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 12 10:07:55 fir-io7-s1 kernel: LNetError: 118216:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 12 10:09:36 fir-io7-s1 kernel: Lustre: fir-OST0052: haven't heard from client 5be5978c-6909-4 (at 10.50.9.37@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c4ca2740800, cur 1584032976 expire 1584032826 last 1584032749 Mar 12 10:09:36 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 12 10:10:30 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 4ecd6e0d-238a-4 (at 10.50.9.37@o2ib2) Mar 12 10:10:30 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 12 10:11:11 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.227@o2ib7: 0 seconds Mar 12 10:11:11 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 581 previous similar messages Mar 12 10:17:10 fir-io7-s1 kernel: LNetError: 116041:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 10:17:10 fir-io7-s1 kernel: LNetError: 116041:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 151 previous similar messages Mar 12 10:17:55 fir-io7-s1 kernel: LNetError: 117488:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 12 10:17:55 fir-io7-s1 kernel: LNetError: 117488:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 290 previous similar messages Mar 12 10:21:13 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.227@o2ib7: 0 seconds Mar 12 10:21:13 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 525 previous similar messages Mar 12 10:27:10 fir-io7-s1 kernel: LNetError: 118820:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 10:27:10 fir-io7-s1 kernel: LNetError: 118820:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 183 previous similar messages Mar 12 10:28:00 fir-io7-s1 kernel: LNetError: 118582:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 12 10:28:00 fir-io7-s1 kernel: LNetError: 118582:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 12 10:31:21 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 5 seconds Mar 12 10:31:21 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 552 previous similar messages Mar 12 10:37:10 fir-io7-s1 kernel: LNetError: 119202:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 10:37:10 fir-io7-s1 kernel: LNetError: 119202:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 190 previous similar messages Mar 12 10:38:00 fir-io7-s1 kernel: LNetError: 118927:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 12 10:38:00 fir-io7-s1 kernel: LNetError: 118927:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 315 previous similar messages Mar 12 10:41:22 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 0 seconds Mar 12 10:41:22 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 411 previous similar messages Mar 12 10:47:10 fir-io7-s1 kernel: LNetError: 119475:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 10:47:10 fir-io7-s1 kernel: LNetError: 119475:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 176 previous similar messages Mar 12 10:48:10 fir-io7-s1 kernel: LNetError: 119475:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 12 10:48:10 fir-io7-s1 kernel: LNetError: 119475:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 315 previous similar messages Mar 12 10:51:25 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.233@o2ib7: 1 seconds Mar 12 10:51:25 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 481 previous similar messages Mar 12 10:51:34 fir-io7-s1 kernel: Lustre: fir-OST0052: haven't heard from client 06263ade-43cf-4 (at 10.50.9.37@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c79cd34b000, cur 1584035494 expire 1584035344 last 1584035267 Mar 12 10:51:34 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 12 10:52:14 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 4ecd6e0d-238a-4 (at 10.50.9.37@o2ib2) Mar 12 10:52:14 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 12 10:57:15 fir-io7-s1 kernel: LNetError: 119475:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 10:57:15 fir-io7-s1 kernel: LNetError: 119475:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 202 previous similar messages Mar 12 10:58:15 fir-io7-s1 kernel: LNetError: 119475:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 12 10:58:15 fir-io7-s1 kernel: LNetError: 119475:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 297 previous similar messages Mar 12 11:01:26 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.236@o2ib7: 4 seconds Mar 12 11:01:26 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 416 previous similar messages Mar 12 11:07:15 fir-io7-s1 kernel: LNetError: 119475:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 11:07:15 fir-io7-s1 kernel: LNetError: 119475:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 171 previous similar messages Mar 12 11:08:15 fir-io7-s1 kernel: LNetError: 120335:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 12 11:08:15 fir-io7-s1 kernel: LNetError: 120335:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 280 previous similar messages Mar 12 11:11:31 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 6 seconds Mar 12 11:11:31 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 517 previous similar messages Mar 12 11:17:20 fir-io7-s1 kernel: LNetError: 120619:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 11:17:20 fir-io7-s1 kernel: LNetError: 120619:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 203 previous similar messages Mar 12 11:18:15 fir-io7-s1 kernel: LNetError: 120619:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 12 11:18:15 fir-io7-s1 kernel: LNetError: 120619:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 304 previous similar messages Mar 12 11:21:36 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.219@o2ib7: 5 seconds Mar 12 11:21:36 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 481 previous similar messages Mar 12 11:27:25 fir-io7-s1 kernel: LNetError: 120619:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 11:27:25 fir-io7-s1 kernel: LNetError: 120619:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 216 previous similar messages Mar 12 11:28:25 fir-io7-s1 kernel: LNetError: 120619:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 12 11:28:25 fir-io7-s1 kernel: LNetError: 120619:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 301 previous similar messages Mar 12 11:31:39 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.239@o2ib7: 0 seconds Mar 12 11:31:39 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 449 previous similar messages Mar 12 11:37:30 fir-io7-s1 kernel: LNetError: 120619:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 11:37:30 fir-io7-s1 kernel: LNetError: 120619:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 183 previous similar messages Mar 12 11:38:25 fir-io7-s1 kernel: LNetError: 121388:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 12 11:38:25 fir-io7-s1 kernel: LNetError: 121388:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 12 11:41:40 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.232@o2ib7: 0 seconds Mar 12 11:41:40 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 531 previous similar messages Mar 12 11:47:30 fir-io7-s1 kernel: LNetError: 121388:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 11:47:30 fir-io7-s1 kernel: LNetError: 121388:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 176 previous similar messages Mar 12 11:48:30 fir-io7-s1 kernel: LNetError: 121388:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 12 11:48:30 fir-io7-s1 kernel: LNetError: 121388:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 12 11:51:40 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.239@o2ib7: 0 seconds Mar 12 11:51:40 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 549 previous similar messages Mar 12 11:57:35 fir-io7-s1 kernel: LNetError: 121776:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 11:57:35 fir-io7-s1 kernel: LNetError: 121776:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 178 previous similar messages Mar 12 11:58:30 fir-io7-s1 kernel: LNetError: 122086:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 12 11:58:30 fir-io7-s1 kernel: LNetError: 122086:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 12 12:00:25 fir-io7-s1 kernel: Lustre: fir-OST004e: haven't heard from client bdbf3ca1-16cb-4 (at 10.50.9.37@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c698f0a9c00, cur 1584039625 expire 1584039475 last 1584039398 Mar 12 12:00:25 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 12 12:00:32 fir-io7-s1 kernel: Lustre: fir-OST0048: haven't heard from client bdbf3ca1-16cb-4 (at 10.50.9.37@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c59fe2a8400, cur 1584039632 expire 1584039482 last 1584039405 Mar 12 12:00:32 fir-io7-s1 kernel: Lustre: Skipped 1 previous similar message Mar 12 12:01:10 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 4ecd6e0d-238a-4 (at 10.50.9.37@o2ib2) Mar 12 12:01:10 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 12 12:01:51 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.227@o2ib7: 0 seconds Mar 12 12:01:51 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 594 previous similar messages Mar 12 12:07:35 fir-io7-s1 kernel: LNetError: 122197:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 12:07:35 fir-io7-s1 kernel: LNetError: 122197:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 203 previous similar messages Mar 12 12:08:30 fir-io7-s1 kernel: LNetError: 122086:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 12 12:08:30 fir-io7-s1 kernel: LNetError: 122086:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 301 previous similar messages Mar 12 12:11:56 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.233@o2ib7: 2 seconds Mar 12 12:11:56 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 573 previous similar messages Mar 12 12:17:40 fir-io7-s1 kernel: LNetError: 122796:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 12:17:40 fir-io7-s1 kernel: LNetError: 122796:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 181 previous similar messages Mar 12 12:18:40 fir-io7-s1 kernel: LNetError: 122796:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 12 12:18:40 fir-io7-s1 kernel: LNetError: 122796:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 12 12:22:06 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.219@o2ib7: 6 seconds Mar 12 12:22:06 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 389 previous similar messages Mar 12 12:27:45 fir-io7-s1 kernel: LNetError: 122796:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 12:27:45 fir-io7-s1 kernel: LNetError: 122796:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 189 previous similar messages Mar 12 12:28:40 fir-io7-s1 kernel: LNetError: 123158:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 12 12:28:40 fir-io7-s1 kernel: LNetError: 123158:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 12 12:32:07 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.233@o2ib7: 0 seconds Mar 12 12:32:07 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 529 previous similar messages Mar 12 12:37:45 fir-io7-s1 kernel: LNetError: 122950:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 12:37:45 fir-io7-s1 kernel: LNetError: 122950:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 199 previous similar messages Mar 12 12:38:50 fir-io7-s1 kernel: LNetError: 123485:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 12 12:38:50 fir-io7-s1 kernel: LNetError: 123485:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 315 previous similar messages Mar 12 12:42:12 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 1 seconds Mar 12 12:42:12 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 554 previous similar messages Mar 12 12:47:45 fir-io7-s1 kernel: LNetError: 123790:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 12:47:45 fir-io7-s1 kernel: LNetError: 123790:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 189 previous similar messages Mar 12 12:48:50 fir-io7-s1 kernel: LNetError: 123485:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 12 12:48:50 fir-io7-s1 kernel: LNetError: 123485:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 12 12:52:16 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.236@o2ib7: 1 seconds Mar 12 12:52:16 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 488 previous similar messages Mar 12 12:57:45 fir-io7-s1 kernel: LNetError: 123897:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 12:57:45 fir-io7-s1 kernel: LNetError: 123897:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 174 previous similar messages Mar 12 12:58:50 fir-io7-s1 kernel: LNetError: 124211:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 12 12:58:50 fir-io7-s1 kernel: LNetError: 124211:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 301 previous similar messages Mar 12 13:02:26 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.235@o2ib7: 0 seconds Mar 12 13:02:26 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 353 previous similar messages Mar 12 13:07:45 fir-io7-s1 kernel: LNetError: 124211:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 13:07:45 fir-io7-s1 kernel: LNetError: 124211:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 191 previous similar messages Mar 12 13:08:55 fir-io7-s1 kernel: LNetError: 124576:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 12 13:08:55 fir-io7-s1 kernel: LNetError: 124576:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 308 previous similar messages Mar 12 13:12:30 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.225@o2ib7: 0 seconds Mar 12 13:12:30 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 446 previous similar messages Mar 12 13:12:42 fir-io7-s1 kernel: Lustre: fir-OST004c: haven't heard from client cf899337-e13a-4 (at 10.50.9.37@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c4a4723a800, cur 1584043962 expire 1584043812 last 1584043735 Mar 12 13:12:42 fir-io7-s1 kernel: Lustre: Skipped 3 previous similar messages Mar 12 13:13:24 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 4ecd6e0d-238a-4 (at 10.50.9.37@o2ib2) Mar 12 13:13:24 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 12 13:17:45 fir-io7-s1 kernel: LNetError: 124409:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 13:17:45 fir-io7-s1 kernel: LNetError: 124409:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 169 previous similar messages Mar 12 13:18:55 fir-io7-s1 kernel: LNetError: 124576:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 12 13:18:55 fir-io7-s1 kernel: LNetError: 124576:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 308 previous similar messages Mar 12 13:22:41 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.237@o2ib7: 0 seconds Mar 12 13:22:41 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 235 previous similar messages Mar 12 13:27:45 fir-io7-s1 kernel: LNetError: 124973:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 13:27:45 fir-io7-s1 kernel: LNetError: 124973:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 202 previous similar messages Mar 12 13:28:55 fir-io7-s1 kernel: LNetError: 125277:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 12 13:28:55 fir-io7-s1 kernel: LNetError: 125277:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 320 previous similar messages Mar 12 13:32:42 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.236@o2ib7: 4 seconds Mar 12 13:32:42 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 336 previous similar messages Mar 12 13:37:45 fir-io7-s1 kernel: LNetError: 125277:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 13:37:45 fir-io7-s1 kernel: LNetError: 125277:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 189 previous similar messages Mar 12 13:38:55 fir-io7-s1 kernel: LNetError: 125629:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 12 13:38:55 fir-io7-s1 kernel: LNetError: 125629:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 12 13:42:51 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 0 seconds Mar 12 13:42:51 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 405 previous similar messages Mar 12 13:43:32 fir-io7-s1 kernel: Lustre: fir-OST004a: haven't heard from client c5479be0-dfb0-4 (at 10.50.9.37@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c88d07df000, cur 1584045812 expire 1584045662 last 1584045585 Mar 12 13:43:32 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 12 13:44:09 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 4ecd6e0d-238a-4 (at 10.50.9.37@o2ib2) Mar 12 13:44:09 fir-io7-s1 kernel: Lustre: Skipped 4 previous similar messages Mar 12 13:47:45 fir-io7-s1 kernel: LNetError: 124952:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 13:47:45 fir-io7-s1 kernel: LNetError: 124952:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 194 previous similar messages Mar 12 13:48:55 fir-io7-s1 kernel: LNetError: 125894:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 12 13:48:55 fir-io7-s1 kernel: LNetError: 125894:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 293 previous similar messages Mar 12 13:53:28 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 1 seconds Mar 12 13:53:28 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 393 previous similar messages Mar 12 13:57:45 fir-io7-s1 kernel: LNetError: 126237:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 13:57:45 fir-io7-s1 kernel: LNetError: 126237:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 181 previous similar messages Mar 12 13:59:05 fir-io7-s1 kernel: LNetError: 126135:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 12 13:59:05 fir-io7-s1 kernel: LNetError: 126135:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 308 previous similar messages Mar 12 14:03:29 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.235@o2ib7: 0 seconds Mar 12 14:03:29 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 407 previous similar messages Mar 12 14:07:45 fir-io7-s1 kernel: LNetError: 126394:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 14:07:45 fir-io7-s1 kernel: LNetError: 126394:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 184 previous similar messages Mar 12 14:09:05 fir-io7-s1 kernel: LNetError: 126707:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 12 14:09:05 fir-io7-s1 kernel: LNetError: 126707:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 12 14:13:08 fir-io7-s1 kernel: Lustre: fir-OST004a: Connection restored to 2f501a16-5689-4 (at 10.50.9.27@o2ib2) Mar 12 14:13:08 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 12 14:13:31 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.235@o2ib7: 5 seconds Mar 12 14:13:31 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 442 previous similar messages Mar 12 14:17:50 fir-io7-s1 kernel: LNetError: 126962:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 14:17:50 fir-io7-s1 kernel: LNetError: 126962:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 204 previous similar messages Mar 12 14:19:05 fir-io7-s1 kernel: LNetError: 126962:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 12 14:19:05 fir-io7-s1 kernel: LNetError: 126962:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 280 previous similar messages Mar 12 14:23:33 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.237@o2ib7: 0 seconds Mar 12 14:23:33 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 443 previous similar messages Mar 12 14:23:34 fir-io7-s1 kernel: Lustre: fir-OST0052: haven't heard from client bf9c0359-a266-4 (at 10.50.0.1@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c698e4b6800, cur 1584048214 expire 1584048064 last 1584047987 Mar 12 14:23:34 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 12 14:23:35 fir-io7-s1 kernel: Lustre: fir-OST0050: haven't heard from client bf9c0359-a266-4 (at 10.50.0.1@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c5816afcc00, cur 1584048215 expire 1584048065 last 1584047988 Mar 12 14:23:36 fir-io7-s1 kernel: Lustre: fir-OST0048: haven't heard from client bf9c0359-a266-4 (at 10.50.0.1@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c698f084800, cur 1584048216 expire 1584048066 last 1584047989 Mar 12 14:23:36 fir-io7-s1 kernel: Lustre: Skipped 3 previous similar messages Mar 12 14:23:57 fir-io7-s1 kernel: Lustre: fir-OST004a: Connection restored to bf9c0359-a266-4 (at 10.50.0.1@o2ib2) Mar 12 14:23:57 fir-io7-s1 kernel: Lustre: Skipped 1 previous similar message Mar 12 14:24:13 fir-io7-s1 kernel: LustreError: 137-5: fir-OST004b_UUID: not available for connect from 10.50.0.1@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Mar 12 14:24:13 fir-io7-s1 kernel: LustreError: Skipped 3 previous similar messages Mar 12 14:24:38 fir-io7-s1 kernel: Lustre: fir-OST0048: Client bf9c0359-a266-4 (at 10.50.0.1@o2ib2) reconnecting Mar 12 14:24:38 fir-io7-s1 kernel: Lustre: fir-OST004a: Connection restored to bf9c0359-a266-4 (at 10.50.0.1@o2ib2) Mar 12 14:24:38 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 12 14:24:38 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 12 14:24:50 fir-io7-s1 kernel: Lustre: fir-OST0052: haven't heard from client b3b43261-e5ed-4 (at 10.50.9.27@o2ib2) in 166 seconds. I think it's dead, and I am evicting it. exp ffff9c88e57d7000, cur 1584048290 expire 1584048140 last 1584048124 Mar 12 14:25:51 fir-io7-s1 kernel: Lustre: fir-OST004e: haven't heard from client b3b43261-e5ed-4 (at 10.50.9.27@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c7319cf5c00, cur 1584048351 expire 1584048201 last 1584048124 Mar 12 14:25:51 fir-io7-s1 kernel: Lustre: Skipped 3 previous similar messages Mar 12 14:26:27 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 2f501a16-5689-4 (at 10.50.9.27@o2ib2) Mar 12 14:26:27 fir-io7-s1 kernel: Lustre: Skipped 10 previous similar messages Mar 12 14:27:06 fir-io7-s1 kernel: Lustre: fir-OST0048: Client bf9c0359-a266-4 (at 10.50.0.1@o2ib2) reconnecting Mar 12 14:27:06 fir-io7-s1 kernel: Lustre: fir-OST004a: Connection restored to bf9c0359-a266-4 (at 10.50.0.1@o2ib2) Mar 12 14:27:06 fir-io7-s1 kernel: Lustre: Skipped 1 previous similar message Mar 12 14:27:30 fir-io7-s1 kernel: LustreError: 137-5: fir-OST004d_UUID: not available for connect from 10.50.0.1@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Mar 12 14:27:30 fir-io7-s1 kernel: LustreError: Skipped 1 previous similar message Mar 12 14:27:55 fir-io7-s1 kernel: Lustre: fir-OST004a: Client bf9c0359-a266-4 (at 10.50.0.1@o2ib2) reconnecting Mar 12 14:27:55 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to bf9c0359-a266-4 (at 10.50.0.1@o2ib2) Mar 12 14:27:55 fir-io7-s1 kernel: Lustre: Skipped 1 previous similar message Mar 12 14:27:55 fir-io7-s1 kernel: Lustre: Skipped 1 previous similar message Mar 12 14:27:55 fir-io7-s1 kernel: LNetError: 126962:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 14:27:55 fir-io7-s1 kernel: LNetError: 126962:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 207 previous similar messages Mar 12 14:29:10 fir-io7-s1 kernel: LNetError: 127412:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 12 14:29:10 fir-io7-s1 kernel: LNetError: 127412:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 12 14:29:42 fir-io7-s1 kernel: Lustre: fir-OST0048: Client bf9c0359-a266-4 (at 10.50.0.1@o2ib2) reconnecting Mar 12 14:29:42 fir-io7-s1 kernel: Lustre: fir-OST004a: Connection restored to bf9c0359-a266-4 (at 10.50.0.1@o2ib2) Mar 12 14:29:42 fir-io7-s1 kernel: Lustre: Skipped 1 previous similar message Mar 12 14:29:42 fir-io7-s1 kernel: Lustre: Skipped 1 previous similar message Mar 12 14:30:20 fir-io7-s1 kernel: LustreError: 137-5: fir-OST004d_UUID: not available for connect from 10.50.0.1@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Mar 12 14:30:26 fir-io7-s1 kernel: Lustre: fir-OST0048: Client bf9c0359-a266-4 (at 10.50.0.1@o2ib2) reconnecting Mar 12 14:30:26 fir-io7-s1 kernel: Lustre: fir-OST004a: Connection restored to bf9c0359-a266-4 (at 10.50.0.1@o2ib2) Mar 12 14:30:26 fir-io7-s1 kernel: Lustre: Skipped 1 previous similar message Mar 12 14:30:26 fir-io7-s1 kernel: Lustre: Skipped 1 previous similar message Mar 12 14:33:44 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.236@o2ib7: 0 seconds Mar 12 14:33:44 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 337 previous similar messages Mar 12 14:38:00 fir-io7-s1 kernel: LNetError: 127412:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 14:38:00 fir-io7-s1 kernel: LNetError: 127412:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 186 previous similar messages Mar 12 14:39:10 fir-io7-s1 kernel: LNetError: 127757:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 12 14:39:10 fir-io7-s1 kernel: LNetError: 127757:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 299 previous similar messages Mar 12 14:43:46 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.217@o2ib7: 0 seconds Mar 12 14:43:46 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 296 previous similar messages Mar 12 14:48:00 fir-io7-s1 kernel: LNetError: 126677:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 14:48:00 fir-io7-s1 kernel: LNetError: 126677:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 166 previous similar messages Mar 12 14:49:11 fir-io7-s1 kernel: LNetError: 91381:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 12 14:49:11 fir-io7-s1 kernel: LNetError: 91381:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 282 previous similar messages Mar 12 14:53:55 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 0 seconds Mar 12 14:53:55 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 374 previous similar messages Mar 12 14:58:00 fir-io7-s1 kernel: LNetError: 128453:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 14:58:00 fir-io7-s1 kernel: LNetError: 128453:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 185 previous similar messages Mar 12 14:59:26 fir-io7-s1 kernel: LNetError: 128213:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 12 14:59:26 fir-io7-s1 kernel: LNetError: 128213:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 305 previous similar messages Mar 12 15:04:06 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.237@o2ib7: 7 seconds Mar 12 15:04:06 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 401 previous similar messages Mar 12 15:08:00 fir-io7-s1 kernel: LNetError: 128721:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 15:08:00 fir-io7-s1 kernel: LNetError: 128721:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 173 previous similar messages Mar 12 15:08:43 fir-io7-s1 kernel: Lustre: fir-OST0050: haven't heard from client 02f1d320-c50e-4 (at 10.50.9.27@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c8a3a212c00, cur 1584050923 expire 1584050773 last 1584050696 Mar 12 15:08:43 fir-io7-s1 kernel: Lustre: Skipped 1 previous similar message Mar 12 15:09:21 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 2f501a16-5689-4 (at 10.50.9.27@o2ib2) Mar 12 15:09:21 fir-io7-s1 kernel: Lustre: Skipped 6 previous similar messages Mar 12 15:09:30 fir-io7-s1 kernel: LNetError: 128721:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 12 15:09:30 fir-io7-s1 kernel: LNetError: 128721:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 292 previous similar messages Mar 12 15:12:32 fir-io7-s1 kernel: Lustre: fir-OST0050: haven't heard from client 4b609dd1-7c22-4 (at 10.50.9.37@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c69929a4400, cur 1584051152 expire 1584051002 last 1584050925 Mar 12 15:12:32 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 12 15:13:05 fir-io7-s1 kernel: Lustre: fir-OST004a: Connection restored to 4ecd6e0d-238a-4 (at 10.50.9.37@o2ib2) Mar 12 15:13:05 fir-io7-s1 kernel: Lustre: fir-OST004e: Connection restored to 4ecd6e0d-238a-4 (at 10.50.9.37@o2ib2) Mar 12 15:13:05 fir-io7-s1 kernel: Lustre: Skipped 3 previous similar messages Mar 12 15:14:16 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.236@o2ib7: 5 seconds Mar 12 15:14:16 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 353 previous similar messages Mar 12 15:18:05 fir-io7-s1 kernel: LNetError: 91381:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 15:18:05 fir-io7-s1 kernel: LNetError: 91381:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 187 previous similar messages Mar 12 15:18:10 fir-io7-s1 kernel: Lustre: fir-OST0052: haven't heard from client fa2a2088-beca-4 (at 10.50.9.27@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c4c55f1e400, cur 1584051490 expire 1584051340 last 1584051263 Mar 12 15:18:10 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 12 15:18:48 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 2f501a16-5689-4 (at 10.50.9.27@o2ib2) Mar 12 15:18:48 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 12 15:19:35 fir-io7-s1 kernel: LNetError: 128721:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 12 15:19:35 fir-io7-s1 kernel: LNetError: 128721:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 289 previous similar messages Mar 12 15:24:21 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.236@o2ib7: 6 seconds Mar 12 15:24:21 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 285 previous similar messages Mar 12 15:28:05 fir-io7-s1 kernel: LNetError: 129506:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 15:28:05 fir-io7-s1 kernel: LNetError: 129506:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 163 previous similar messages Mar 12 15:29:45 fir-io7-s1 kernel: LNetError: 129506:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 12 15:29:45 fir-io7-s1 kernel: LNetError: 129506:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 281 previous similar messages Mar 12 15:34:32 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.236@o2ib7: 0 seconds Mar 12 15:34:32 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 360 previous similar messages Mar 12 15:38:10 fir-io7-s1 kernel: LNetError: 129840:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 15:38:10 fir-io7-s1 kernel: LNetError: 129840:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 188 previous similar messages Mar 12 15:39:45 fir-io7-s1 kernel: LNetError: 129840:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 12 15:39:45 fir-io7-s1 kernel: LNetError: 129840:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 12 15:42:23 fir-io7-s1 kernel: Lustre: fir-OST004e: haven't heard from client a25d8b5f-d9de-4 (at 10.50.9.37@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c7319cf5800, cur 1584052943 expire 1584052793 last 1584052716 Mar 12 15:42:23 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 12 15:43:16 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 4ecd6e0d-238a-4 (at 10.50.9.37@o2ib2) Mar 12 15:43:16 fir-io7-s1 kernel: Lustre: Skipped 6 previous similar messages Mar 12 15:44:36 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 5 seconds Mar 12 15:44:36 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 384 previous similar messages Mar 12 15:44:56 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 8abbbcd1-5db6-4 (at 10.50.2.19@o2ib2) Mar 12 15:44:56 fir-io7-s1 kernel: Lustre: Skipped 4 previous similar messages Mar 12 15:48:10 fir-io7-s1 kernel: LNetError: 130222:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 15:48:10 fir-io7-s1 kernel: LNetError: 130222:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 179 previous similar messages Mar 12 15:49:55 fir-io7-s1 kernel: LNetError: 130222:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 12 15:49:55 fir-io7-s1 kernel: LNetError: 130222:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 291 previous similar messages Mar 12 15:54:44 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.236@o2ib7: 0 seconds Mar 12 15:54:44 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 316 previous similar messages Mar 12 15:58:10 fir-io7-s1 kernel: LNetError: 129905:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 15:58:10 fir-io7-s1 kernel: LNetError: 129905:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 174 previous similar messages Mar 12 15:59:49 fir-io7-s1 kernel: Lustre: fir-OST0052: haven't heard from client 38d87322-86be-4 (at 10.50.9.27@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6a1efe3000, cur 1584053989 expire 1584053839 last 1584053762 Mar 12 15:59:49 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 12 15:59:55 fir-io7-s1 kernel: LNetError: 130222:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 12 15:59:55 fir-io7-s1 kernel: LNetError: 130222:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 291 previous similar messages Mar 12 16:00:22 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 2f501a16-5689-4 (at 10.50.9.27@o2ib2) Mar 12 16:00:22 fir-io7-s1 kernel: Lustre: Skipped 4 previous similar messages Mar 12 16:04:51 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 0 seconds Mar 12 16:04:51 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 151 previous similar messages Mar 12 16:08:15 fir-io7-s1 kernel: LNetError: 130905:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 16:08:15 fir-io7-s1 kernel: LNetError: 130905:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 179 previous similar messages Mar 12 16:10:00 fir-io7-s1 kernel: LNetError: 130905:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 12 16:10:00 fir-io7-s1 kernel: LNetError: 130905:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 12 16:14:59 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.227@o2ib7: 0 seconds Mar 12 16:14:59 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 233 previous similar messages Mar 12 16:18:15 fir-io7-s1 kernel: LNetError: 380:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 16:18:15 fir-io7-s1 kernel: LNetError: 380:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 177 previous similar messages Mar 12 16:20:00 fir-io7-s1 kernel: LNetError: 380:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 12 16:20:00 fir-io7-s1 kernel: LNetError: 380:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 293 previous similar messages Mar 12 16:26:00 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 0 seconds Mar 12 16:26:00 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 126 previous similar messages Mar 12 16:26:47 fir-io7-s1 kernel: Lustre: fir-OST004e: haven't heard from client 406586a0-e725-4 (at 10.50.9.37@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c4a592dac00, cur 1584055607 expire 1584055457 last 1584055380 Mar 12 16:26:47 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 12 16:27:43 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 4ecd6e0d-238a-4 (at 10.50.9.37@o2ib2) Mar 12 16:27:43 fir-io7-s1 kernel: Lustre: Skipped 6 previous similar messages Mar 12 16:28:16 fir-io7-s1 kernel: LNetError: 130939:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 16:28:16 fir-io7-s1 kernel: LNetError: 130939:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 148 previous similar messages Mar 12 16:30:01 fir-io7-s1 kernel: LNetError: 913:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 12 16:30:01 fir-io7-s1 kernel: LNetError: 913:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 302 previous similar messages Mar 12 16:34:16 fir-io7-s1 kernel: Lustre: fir-OST004c: haven't heard from client 2099dd22-f3be-4 (at 10.50.9.27@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c8733ed2c00, cur 1584056056 expire 1584055906 last 1584055829 Mar 12 16:34:16 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 12 16:35:00 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 2f501a16-5689-4 (at 10.50.9.27@o2ib2) Mar 12 16:35:00 fir-io7-s1 kernel: Lustre: Skipped 4 previous similar messages Mar 12 16:36:35 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 0 seconds Mar 12 16:36:35 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 19 previous similar messages Mar 12 16:38:16 fir-io7-s1 kernel: LNetError: 1262:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 16:38:16 fir-io7-s1 kernel: LNetError: 1262:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 126 previous similar messages Mar 12 16:40:06 fir-io7-s1 kernel: LNetError: 1262:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 12 16:40:06 fir-io7-s1 kernel: LNetError: 1262:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 12 16:46:44 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.236@o2ib7: 0 seconds Mar 12 16:46:44 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 201 previous similar messages Mar 12 16:48:25 fir-io7-s1 kernel: LNetError: 1262:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 16:48:25 fir-io7-s1 kernel: LNetError: 1262:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 167 previous similar messages Mar 12 16:50:20 fir-io7-s1 kernel: LNetError: 1262:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 12 16:50:20 fir-io7-s1 kernel: LNetError: 1262:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 301 previous similar messages Mar 12 16:55:21 fir-io7-s1 kernel: Lustre: fir-OST0048: haven't heard from client 0371c58f-e47a-4 (at 10.50.9.37@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c8881e24800, cur 1584057321 expire 1584057171 last 1584057094 Mar 12 16:55:21 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 12 16:56:05 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 4ecd6e0d-238a-4 (at 10.50.9.37@o2ib2) Mar 12 16:56:05 fir-io7-s1 kernel: Lustre: Skipped 6 previous similar messages Mar 12 16:56:50 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 4 seconds Mar 12 16:56:50 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 374 previous similar messages Mar 12 16:58:25 fir-io7-s1 kernel: LNetError: 1769:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 16:58:25 fir-io7-s1 kernel: LNetError: 1769:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 197 previous similar messages Mar 12 17:00:25 fir-io7-s1 kernel: LNetError: 1969:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 12 17:00:25 fir-io7-s1 kernel: LNetError: 1969:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 12 17:06:55 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 5 seconds Mar 12 17:06:55 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 397 previous similar messages Mar 12 17:08:25 fir-io7-s1 kernel: LNetError: 1857:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 17:08:25 fir-io7-s1 kernel: LNetError: 1857:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 177 previous similar messages Mar 12 17:08:54 fir-io7-s1 kernel: Lustre: fir-OST004a: haven't heard from client 30d41e9e-2801-4 (at 10.50.9.27@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6990f70400, cur 1584058134 expire 1584057984 last 1584057907 Mar 12 17:08:54 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 12 17:09:35 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 2f501a16-5689-4 (at 10.50.9.27@o2ib2) Mar 12 17:09:35 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 12 17:10:30 fir-io7-s1 kernel: LNetError: 1969:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 12 17:10:30 fir-io7-s1 kernel: LNetError: 1969:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 301 previous similar messages Mar 12 17:17:05 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 5 seconds Mar 12 17:17:05 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 391 previous similar messages Mar 12 17:18:25 fir-io7-s1 kernel: LNetError: 2695:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 17:18:25 fir-io7-s1 kernel: LNetError: 2695:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 178 previous similar messages Mar 12 17:20:35 fir-io7-s1 kernel: LNetError: 2586:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 12 17:20:35 fir-io7-s1 kernel: LNetError: 2586:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 301 previous similar messages Mar 12 17:22:35 fir-io7-s1 kernel: Lustre: fir-OST004c: haven't heard from client 0d2327f5-de66-4 (at 10.50.9.27@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c51af4e0000, cur 1584058955 expire 1584058805 last 1584058728 Mar 12 17:22:35 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 12 17:23:10 fir-io7-s1 kernel: Lustre: fir-OST004a: Connection restored to 2f501a16-5689-4 (at 10.50.9.27@o2ib2) Mar 12 17:23:10 fir-io7-s1 kernel: Lustre: Skipped 4 previous similar messages Mar 12 17:27:13 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 0 seconds Mar 12 17:27:13 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 245 previous similar messages Mar 12 17:28:25 fir-io7-s1 kernel: LNetError: 2586:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 17:28:25 fir-io7-s1 kernel: LNetError: 2586:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 166 previous similar messages Mar 12 17:30:35 fir-io7-s1 kernel: LNetError: 119253:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 12 17:30:35 fir-io7-s1 kernel: LNetError: 119253:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 304 previous similar messages Mar 12 17:37:15 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds Mar 12 17:37:15 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 285 previous similar messages Mar 12 17:38:25 fir-io7-s1 kernel: LNetError: 3321:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 17:38:25 fir-io7-s1 kernel: LNetError: 3321:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 184 previous similar messages Mar 12 17:38:46 fir-io7-s1 kernel: Lustre: fir-OST004a: haven't heard from client 9e5d3deb-afde-4 (at 10.50.9.37@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c8a25359800, cur 1584059926 expire 1584059776 last 1584059699 Mar 12 17:38:46 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 12 17:39:30 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 4ecd6e0d-238a-4 (at 10.50.9.37@o2ib2) Mar 12 17:39:30 fir-io7-s1 kernel: Lustre: Skipped 6 previous similar messages Mar 12 17:40:35 fir-io7-s1 kernel: LNetError: 3042:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 12 17:40:35 fir-io7-s1 kernel: LNetError: 3042:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 328 previous similar messages Mar 12 17:47:20 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.237@o2ib7: 0 seconds Mar 12 17:47:20 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 256 previous similar messages Mar 12 17:48:25 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 17:48:25 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 149 previous similar messages Mar 12 17:50:45 fir-io7-s1 kernel: LNetError: 3496:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 12 17:50:45 fir-io7-s1 kernel: LNetError: 3496:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 316 previous similar messages Mar 12 17:57:06 fir-io7-s1 kernel: Lustre: fir-OST004e: haven't heard from client ce6d0dfe-9d94-4 (at 10.50.9.37@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c8733ed5000, cur 1584061026 expire 1584060876 last 1584060799 Mar 12 17:57:06 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 12 17:57:23 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 1 seconds Mar 12 17:57:23 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 783 previous similar messages Mar 12 17:58:00 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 4ecd6e0d-238a-4 (at 10.50.9.37@o2ib2) Mar 12 17:58:00 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 12 17:58:30 fir-io7-s1 kernel: LNetError: 3915:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 17:58:30 fir-io7-s1 kernel: LNetError: 3915:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 137 previous similar messages Mar 12 18:00:37 fir-io7-s1 kernel: Lustre: fir-OST0048: Client bf9c0359-a266-4 (at 10.50.0.1@o2ib2) reconnecting Mar 12 18:00:37 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to bf9c0359-a266-4 (at 10.50.0.1@o2ib2) Mar 12 18:00:41 fir-io7-s1 kernel: LustreError: 137-5: fir-OST004b_UUID: not available for connect from 10.50.0.1@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Mar 12 18:00:41 fir-io7-s1 kernel: LustreError: Skipped 1 previous similar message Mar 12 18:00:50 fir-io7-s1 kernel: LNetError: 3818:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 12 18:00:50 fir-io7-s1 kernel: LNetError: 3818:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 308 previous similar messages Mar 12 18:01:06 fir-io7-s1 kernel: Lustre: fir-OST0048: Client bf9c0359-a266-4 (at 10.50.0.1@o2ib2) reconnecting Mar 12 18:01:06 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to bf9c0359-a266-4 (at 10.50.0.1@o2ib2) Mar 12 18:03:28 fir-io7-s1 kernel: LustreError: 137-5: fir-OST0049_UUID: not available for connect from 10.50.0.1@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Mar 12 18:03:28 fir-io7-s1 kernel: LustreError: Skipped 5 previous similar messages Mar 12 18:03:52 fir-io7-s1 kernel: Lustre: fir-OST0048: Client bf9c0359-a266-4 (at 10.50.0.1@o2ib2) reconnecting Mar 12 18:03:52 fir-io7-s1 kernel: Lustre: fir-OST004a: Connection restored to bf9c0359-a266-4 (at 10.50.0.1@o2ib2) Mar 12 18:03:52 fir-io7-s1 kernel: Lustre: Skipped 1 previous similar message Mar 12 18:03:52 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 12 18:05:40 fir-io7-s1 kernel: Lustre: fir-OST0048: Client bf9c0359-a266-4 (at 10.50.0.1@o2ib2) reconnecting Mar 12 18:05:40 fir-io7-s1 kernel: Lustre: fir-OST004c: Client bf9c0359-a266-4 (at 10.50.0.1@o2ib2) reconnecting Mar 12 18:05:40 fir-io7-s1 kernel: Lustre: fir-OST004c: Connection restored to bf9c0359-a266-4 (at 10.50.0.1@o2ib2) Mar 12 18:05:40 fir-io7-s1 kernel: Lustre: Skipped 4 previous similar messages Mar 12 18:05:40 fir-io7-s1 kernel: Lustre: Skipped 1 previous similar message Mar 12 18:06:24 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to bf9c0359-a266-4 (at 10.50.0.1@o2ib2) Mar 12 18:06:24 fir-io7-s1 kernel: Lustre: Skipped 9 previous similar messages Mar 12 18:07:25 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.219@o2ib7: 0 seconds Mar 12 18:07:25 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 561 previous similar messages Mar 12 18:07:43 fir-io7-s1 kernel: Lustre: fir-OST004c: Connection restored to bf9c0359-a266-4 (at 10.50.0.1@o2ib2) Mar 12 18:08:14 fir-io7-s1 kernel: Lustre: fir-OST004c: Client bf9c0359-a266-4 (at 10.50.0.1@o2ib2) reconnecting Mar 12 18:08:14 fir-io7-s1 kernel: Lustre: Skipped 9 previous similar messages Mar 12 18:08:30 fir-io7-s1 kernel: LNetError: 4167:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 18:08:30 fir-io7-s1 kernel: LNetError: 4167:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 137 previous similar messages Mar 12 18:10:55 fir-io7-s1 kernel: LNetError: 4455:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 12 18:10:55 fir-io7-s1 kernel: LNetError: 4455:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 301 previous similar messages Mar 12 18:11:44 fir-io7-s1 kernel: Lustre: fir-OST004c: Connection restored to bf9c0359-a266-4 (at 10.50.0.1@o2ib2) Mar 12 18:11:44 fir-io7-s1 kernel: Lustre: Skipped 2 previous similar messages Mar 12 18:12:31 fir-io7-s1 kernel: Lustre: fir-OST0048: Client bf9c0359-a266-4 (at 10.50.0.1@o2ib2) reconnecting Mar 12 18:12:31 fir-io7-s1 kernel: Lustre: Skipped 11 previous similar messages Mar 12 18:17:31 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.232@o2ib7: 1 seconds Mar 12 18:17:31 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 485 previous similar messages Mar 12 18:18:35 fir-io7-s1 kernel: LNetError: 4793:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 18:18:35 fir-io7-s1 kernel: LNetError: 4793:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 160 previous similar messages Mar 12 18:20:55 fir-io7-s1 kernel: LNetError: 4793:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 12 18:20:55 fir-io7-s1 kernel: LNetError: 4793:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 295 previous similar messages Mar 12 18:27:39 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 0 seconds Mar 12 18:27:39 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 486 previous similar messages Mar 12 18:28:35 fir-io7-s1 kernel: LNetError: 4793:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 18:28:35 fir-io7-s1 kernel: LNetError: 4793:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 131 previous similar messages Mar 12 18:30:31 fir-io7-s1 kernel: Lustre: fir-OST004a: haven't heard from client 6071d03e-c75d-4 (at 10.50.9.27@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c773ef62c00, cur 1584063031 expire 1584062881 last 1584062804 Mar 12 18:30:31 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 12 18:31:00 fir-io7-s1 kernel: LNetError: 5181:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 12 18:31:00 fir-io7-s1 kernel: LNetError: 5181:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 351 previous similar messages Mar 12 18:31:10 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 2f501a16-5689-4 (at 10.50.9.27@o2ib2) Mar 12 18:31:10 fir-io7-s1 kernel: Lustre: Skipped 15 previous similar messages Mar 12 18:37:49 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds Mar 12 18:37:49 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 540 previous similar messages Mar 12 18:38:45 fir-io7-s1 kernel: LNetError: 5181:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 18:38:45 fir-io7-s1 kernel: LNetError: 5181:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 144 previous similar messages Mar 12 18:41:05 fir-io7-s1 kernel: LNetError: 5535:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 12 18:41:05 fir-io7-s1 kernel: LNetError: 5535:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 293 previous similar messages Mar 12 18:46:03 fir-io7-s1 kernel: Lustre: fir-OST004c: haven't heard from client 545aef2a-dacf-4 (at 10.50.9.37@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c7f3c55e800, cur 1584063963 expire 1584063813 last 1584063736 Mar 12 18:46:03 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 12 18:47:00 fir-io7-s1 kernel: Lustre: fir-OST004a: Connection restored to 4ecd6e0d-238a-4 (at 10.50.9.37@o2ib2) Mar 12 18:47:00 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 12 18:47:51 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 4 seconds Mar 12 18:47:51 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 652 previous similar messages Mar 12 18:48:45 fir-io7-s1 kernel: LNetError: 5535:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 18:48:45 fir-io7-s1 kernel: LNetError: 5535:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 104 previous similar messages Mar 12 18:51:10 fir-io7-s1 kernel: LNetError: 5912:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 12 18:51:10 fir-io7-s1 kernel: LNetError: 5912:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 12 18:58:01 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 6 seconds Mar 12 18:58:01 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 648 previous similar messages Mar 12 18:58:50 fir-io7-s1 kernel: LNetError: 5912:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 18:58:50 fir-io7-s1 kernel: LNetError: 5912:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 136 previous similar messages Mar 12 19:01:15 fir-io7-s1 kernel: LNetError: 6260:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 12 19:01:15 fir-io7-s1 kernel: LNetError: 6260:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 316 previous similar messages Mar 12 19:08:05 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.238@o2ib7: 0 seconds Mar 12 19:08:05 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 631 previous similar messages Mar 12 19:09:00 fir-io7-s1 kernel: LNetError: 6260:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 19:09:00 fir-io7-s1 kernel: LNetError: 6260:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 155 previous similar messages Mar 12 19:11:15 fir-io7-s1 kernel: LNetError: 6675:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 12 19:11:15 fir-io7-s1 kernel: LNetError: 6675:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 325 previous similar messages Mar 12 19:18:17 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 4 seconds Mar 12 19:18:17 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 447 previous similar messages Mar 12 19:19:00 fir-io7-s1 kernel: LNetError: 5256:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 19:19:00 fir-io7-s1 kernel: LNetError: 5256:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 183 previous similar messages Mar 12 19:19:42 fir-io7-s1 kernel: Lustre: fir-OST004e: haven't heard from client 65c91906-c9f9-4 (at 10.50.9.27@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c88144e3800, cur 1584065982 expire 1584065832 last 1584065755 Mar 12 19:19:42 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 12 19:20:18 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 2f501a16-5689-4 (at 10.50.9.27@o2ib2) Mar 12 19:20:18 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 12 19:21:20 fir-io7-s1 kernel: LNetError: 6949:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 12 19:21:20 fir-io7-s1 kernel: LNetError: 6949:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 327 previous similar messages Mar 12 19:28:18 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.227@o2ib7: 0 seconds Mar 12 19:28:18 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 334 previous similar messages Mar 12 19:29:00 fir-io7-s1 kernel: LNetError: 6910:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 19:29:00 fir-io7-s1 kernel: LNetError: 6910:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 199 previous similar messages Mar 12 19:30:05 fir-io7-s1 kernel: Lustre: fir-OST004c: haven't heard from client 577f465e-1762-4 (at 10.50.9.37@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c7ae1d7a400, cur 1584066605 expire 1584066455 last 1584066378 Mar 12 19:30:05 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 12 19:31:05 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 4ecd6e0d-238a-4 (at 10.50.9.37@o2ib2) Mar 12 19:31:05 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 12 19:31:20 fir-io7-s1 kernel: LNetError: 6949:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 12 19:31:20 fir-io7-s1 kernel: LNetError: 6949:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 325 previous similar messages Mar 12 19:38:20 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.219@o2ib7: 0 seconds Mar 12 19:38:20 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 254 previous similar messages Mar 12 19:39:00 fir-io7-s1 kernel: LNetError: 7713:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 19:39:00 fir-io7-s1 kernel: LNetError: 7713:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 174 previous similar messages Mar 12 19:41:25 fir-io7-s1 kernel: LNetError: 7713:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 12 19:41:25 fir-io7-s1 kernel: LNetError: 7713:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 339 previous similar messages Mar 12 19:48:33 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds Mar 12 19:48:33 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 274 previous similar messages Mar 12 19:49:05 fir-io7-s1 kernel: LNetError: 7713:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 19:49:05 fir-io7-s1 kernel: LNetError: 7713:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 195 previous similar messages Mar 12 19:51:30 fir-io7-s1 kernel: LNetError: 8083:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 12 19:51:30 fir-io7-s1 kernel: LNetError: 8083:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 358 previous similar messages Mar 12 19:58:36 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 5 seconds Mar 12 19:58:36 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 459 previous similar messages Mar 12 19:59:05 fir-io7-s1 kernel: LNetError: 7725:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 19:59:05 fir-io7-s1 kernel: LNetError: 7725:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 206 previous similar messages Mar 12 20:01:19 fir-io7-s1 kernel: Lustre: fir-OST0050: haven't heard from client 7534bce5-6c61-4 (at 10.50.9.27@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c79a4631000, cur 1584068479 expire 1584068329 last 1584068252 Mar 12 20:01:19 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 12 20:01:30 fir-io7-s1 kernel: LNetError: 8381:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 12 20:01:30 fir-io7-s1 kernel: LNetError: 8381:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 357 previous similar messages Mar 12 20:02:57 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 2f501a16-5689-4 (at 10.50.9.27@o2ib2) Mar 12 20:02:57 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 12 20:08:39 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds Mar 12 20:08:39 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 297 previous similar messages Mar 12 20:09:05 fir-io7-s1 kernel: LNetError: 6910:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 20:09:05 fir-io7-s1 kernel: LNetError: 6910:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 212 previous similar messages Mar 12 20:11:35 fir-io7-s1 kernel: LNetError: 8381:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 12 20:11:35 fir-io7-s1 kernel: LNetError: 8381:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 374 previous similar messages Mar 12 20:15:59 fir-io7-s1 kernel: Lustre: fir-OST0052: haven't heard from client 78a2a529-274b-4 (at 10.50.9.27@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6b0146cc00, cur 1584069359 expire 1584069209 last 1584069132 Mar 12 20:15:59 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 12 20:16:24 fir-io7-s1 kernel: Lustre: fir-OST004a: Connection restored to 2f501a16-5689-4 (at 10.50.9.27@o2ib2) Mar 12 20:16:24 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 12 20:18:40 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds Mar 12 20:18:40 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 213 previous similar messages Mar 12 20:19:05 fir-io7-s1 kernel: LNetError: 8889:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 20:19:05 fir-io7-s1 kernel: LNetError: 8889:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 190 previous similar messages Mar 12 20:21:40 fir-io7-s1 kernel: LNetError: 9159:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 12 20:21:40 fir-io7-s1 kernel: LNetError: 9159:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 371 previous similar messages Mar 12 20:29:05 fir-io7-s1 kernel: LNetError: 9159:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 20:29:05 fir-io7-s1 kernel: LNetError: 9159:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 193 previous similar messages Mar 12 20:30:00 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.232@o2ib7: 0 seconds Mar 12 20:30:00 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 244 previous similar messages Mar 12 20:31:40 fir-io7-s1 kernel: LNetError: 9508:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 12 20:31:40 fir-io7-s1 kernel: LNetError: 9508:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 411 previous similar messages Mar 12 20:38:50 fir-io7-s1 kernel: Lustre: fir-OST004e: haven't heard from client 513fd73c-6a06-4 (at 10.50.9.37@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c51afa55c00, cur 1584070730 expire 1584070580 last 1584070503 Mar 12 20:38:50 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 12 20:39:05 fir-io7-s1 kernel: LNetError: 9508:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 20:39:05 fir-io7-s1 kernel: LNetError: 9508:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 183 previous similar messages Mar 12 20:39:46 fir-io7-s1 kernel: Lustre: fir-OST004a: Connection restored to 4ecd6e0d-238a-4 (at 10.50.9.37@o2ib2) Mar 12 20:39:46 fir-io7-s1 kernel: Lustre: Skipped 4 previous similar messages Mar 12 20:40:01 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.233@o2ib7: 0 seconds Mar 12 20:40:01 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 226 previous similar messages Mar 12 20:41:45 fir-io7-s1 kernel: LNetError: 9865:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 12 20:41:45 fir-io7-s1 kernel: LNetError: 9865:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 396 previous similar messages Mar 12 20:46:32 fir-io7-s1 kernel: Lustre: fir-OST0052: haven't heard from client 1ef7efeb-6a9e-4 (at 10.50.9.27@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c56a8cc0800, cur 1584071192 expire 1584071042 last 1584070965 Mar 12 20:46:32 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 12 20:47:03 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 2f501a16-5689-4 (at 10.50.9.27@o2ib2) Mar 12 20:47:03 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 12 20:49:05 fir-io7-s1 kernel: LNetError: 9865:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 20:49:05 fir-io7-s1 kernel: LNetError: 9865:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 203 previous similar messages Mar 12 20:50:11 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.236@o2ib7: 0 seconds Mar 12 20:50:11 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 253 previous similar messages Mar 12 20:51:45 fir-io7-s1 kernel: LNetError: 9865:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 12 20:51:45 fir-io7-s1 kernel: LNetError: 9865:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 336 previous similar messages Mar 12 20:59:05 fir-io7-s1 kernel: LNetError: 10311:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 20:59:05 fir-io7-s1 kernel: LNetError: 10311:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 159 previous similar messages Mar 12 21:00:50 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.235@o2ib7: 0 seconds Mar 12 21:00:50 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 124 previous similar messages Mar 12 21:01:45 fir-io7-s1 kernel: LNetError: 10578:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 12 21:01:45 fir-io7-s1 kernel: LNetError: 10578:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 351 previous similar messages Mar 12 21:09:10 fir-io7-s1 kernel: LNetError: 10902:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 21:09:10 fir-io7-s1 kernel: LNetError: 10902:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 187 previous similar messages Mar 12 21:10:52 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.239@o2ib7: 1 seconds Mar 12 21:10:52 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 223 previous similar messages Mar 12 21:11:50 fir-io7-s1 kernel: LNetError: 10902:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 12 21:11:50 fir-io7-s1 kernel: LNetError: 10902:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 398 previous similar messages Mar 12 21:19:10 fir-io7-s1 kernel: LNetError: 10902:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 21:19:10 fir-io7-s1 kernel: LNetError: 10902:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 205 previous similar messages Mar 12 21:20:21 fir-io7-s1 kernel: Lustre: fir-OST004e: haven't heard from client 8a974271-bbb9-4 (at 10.50.9.37@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c5a285bbc00, cur 1584073221 expire 1584073071 last 1584072994 Mar 12 21:20:21 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 12 21:20:54 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.235@o2ib7: 0 seconds Mar 12 21:20:54 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 196 previous similar messages Mar 12 21:21:01 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 4ecd6e0d-238a-4 (at 10.50.9.37@o2ib2) Mar 12 21:21:01 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 12 21:21:50 fir-io7-s1 kernel: LNetError: 10902:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 12 21:21:50 fir-io7-s1 kernel: LNetError: 10902:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 361 previous similar messages Mar 12 21:26:23 fir-io7-s1 kernel: Lustre: fir-OST0048: haven't heard from client 52703429-1cd4-4 (at 10.50.9.27@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c4d8789c400, cur 1584073583 expire 1584073433 last 1584073356 Mar 12 21:26:23 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 12 21:26:57 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 2f501a16-5689-4 (at 10.50.9.27@o2ib2) Mar 12 21:26:57 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 12 21:29:10 fir-io7-s1 kernel: LNetError: 11666:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 21:29:10 fir-io7-s1 kernel: LNetError: 11666:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 168 previous similar messages Mar 12 21:30:57 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.227@o2ib7: 0 seconds Mar 12 21:30:57 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 188 previous similar messages Mar 12 21:31:50 fir-io7-s1 kernel: LNetError: 11666:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 12 21:31:50 fir-io7-s1 kernel: LNetError: 11666:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 350 previous similar messages Mar 12 21:39:10 fir-io7-s1 kernel: LNetError: 11666:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 21:39:10 fir-io7-s1 kernel: LNetError: 11666:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 167 previous similar messages Mar 12 21:41:04 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 0 seconds Mar 12 21:41:04 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 589 previous similar messages Mar 12 21:41:50 fir-io7-s1 kernel: LNetError: 12036:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 12 21:41:50 fir-io7-s1 kernel: LNetError: 12036:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 401 previous similar messages Mar 12 21:49:10 fir-io7-s1 kernel: LNetError: 12303:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 21:49:10 fir-io7-s1 kernel: LNetError: 12303:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 193 previous similar messages Mar 12 21:51:06 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 0 seconds Mar 12 21:51:06 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 328 previous similar messages Mar 12 21:51:50 fir-io7-s1 kernel: LNetError: 12303:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 12 21:51:50 fir-io7-s1 kernel: LNetError: 12303:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 391 previous similar messages Mar 12 21:59:10 fir-io7-s1 kernel: LNetError: 12481:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 21:59:10 fir-io7-s1 kernel: LNetError: 12481:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 195 previous similar messages Mar 12 22:01:10 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds Mar 12 22:01:10 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 312 previous similar messages Mar 12 22:01:50 fir-io7-s1 kernel: LNetError: 12859:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 12 22:01:50 fir-io7-s1 kernel: LNetError: 12859:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 392 previous similar messages Mar 12 22:03:22 fir-io7-s1 kernel: Lustre: fir-OST0052: haven't heard from client 604ecd2d-8fa1-4 (at 10.50.9.27@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c4a5f41e000, cur 1584075802 expire 1584075652 last 1584075575 Mar 12 22:03:22 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 12 22:04:09 fir-io7-s1 kernel: Lustre: fir-OST004a: Connection restored to 2f501a16-5689-4 (at 10.50.9.27@o2ib2) Mar 12 22:04:09 fir-io7-s1 kernel: Lustre: Skipped 4 previous similar messages Mar 12 22:09:10 fir-io7-s1 kernel: LNetError: 13127:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 22:09:10 fir-io7-s1 kernel: LNetError: 13127:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 221 previous similar messages Mar 12 22:10:49 fir-io7-s1 kernel: Lustre: fir-OST0052: haven't heard from client 2d01376d-9f43-4 (at 10.50.9.37@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c7e6f36f000, cur 1584076249 expire 1584076099 last 1584076022 Mar 12 22:10:49 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 12 22:11:16 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.239@o2ib7: 0 seconds Mar 12 22:11:16 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 346 previous similar messages Mar 12 22:11:29 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 4ecd6e0d-238a-4 (at 10.50.9.37@o2ib2) Mar 12 22:11:29 fir-io7-s1 kernel: Lustre: Skipped 6 previous similar messages Mar 12 22:11:50 fir-io7-s1 kernel: LNetError: 13312:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 12 22:11:50 fir-io7-s1 kernel: LNetError: 13312:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 409 previous similar messages Mar 12 22:19:10 fir-io7-s1 kernel: LNetError: 13312:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 22:19:10 fir-io7-s1 kernel: LNetError: 13312:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 218 previous similar messages Mar 12 22:21:29 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.232@o2ib7: 0 seconds Mar 12 22:21:29 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 328 previous similar messages Mar 12 22:21:50 fir-io7-s1 kernel: LNetError: 13312:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 12 22:21:50 fir-io7-s1 kernel: LNetError: 13312:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 357 previous similar messages Mar 12 22:29:10 fir-io7-s1 kernel: LNetError: 13917:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 22:29:10 fir-io7-s1 kernel: LNetError: 13917:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 217 previous similar messages Mar 12 22:31:30 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.238@o2ib7: 0 seconds Mar 12 22:31:30 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 311 previous similar messages Mar 12 22:31:55 fir-io7-s1 kernel: LNetError: 13675:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 12 22:31:55 fir-io7-s1 kernel: LNetError: 13675:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 381 previous similar messages Mar 12 22:39:10 fir-io7-s1 kernel: LNetError: 12859:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 22:39:10 fir-io7-s1 kernel: LNetError: 12859:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 207 previous similar messages Mar 12 22:39:14 fir-io7-s1 kernel: Lustre: fir-OST004a: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 12 22:39:14 fir-io7-s1 kernel: Lustre: Skipped 4 previous similar messages Mar 12 22:39:18 fir-io7-s1 kernel: Lustre: fir-OST0048: haven't heard from client f8f3a406-d14e-4 (at 10.50.9.27@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c64dde87c00, cur 1584077958 expire 1584077808 last 1584077731 Mar 12 22:39:18 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 12 22:40:05 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 2f501a16-5689-4 (at 10.50.9.27@o2ib2) Mar 12 22:40:05 fir-io7-s1 kernel: Lustre: Skipped 6 previous similar messages Mar 12 22:41:36 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.232@o2ib7: 0 seconds Mar 12 22:41:36 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 343 previous similar messages Mar 12 22:42:00 fir-io7-s1 kernel: LNetError: 91381:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 12 22:42:00 fir-io7-s1 kernel: LNetError: 91381:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 372 previous similar messages Mar 12 22:42:12 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 12 22:42:12 fir-io7-s1 kernel: Lustre: Skipped 4 previous similar messages Mar 12 22:43:27 fir-io7-s1 kernel: Lustre: fir-OST0052: haven't heard from client 845fae0c-54ef-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6b34ef8400, cur 1584078207 expire 1584078057 last 1584077980 Mar 12 22:43:27 fir-io7-s1 kernel: Lustre: Skipped 11 previous similar messages Mar 12 22:49:10 fir-io7-s1 kernel: LNetError: 14298:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 22:49:10 fir-io7-s1 kernel: LNetError: 14298:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 198 previous similar messages Mar 12 22:51:45 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds Mar 12 22:51:45 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 182 previous similar messages Mar 12 22:52:00 fir-io7-s1 kernel: LNetError: 14654:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 12 22:52:00 fir-io7-s1 kernel: LNetError: 14654:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 340 previous similar messages Mar 12 22:55:27 fir-io7-s1 kernel: Lustre: fir-OST004a: haven't heard from client 967c2c6a-16f9-4 (at 10.50.9.37@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c698e679c00, cur 1584078927 expire 1584078777 last 1584078700 Mar 12 22:55:27 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 12 22:56:09 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 4ecd6e0d-238a-4 (at 10.50.9.37@o2ib2) Mar 12 22:56:09 fir-io7-s1 kernel: Lustre: fir-OST004c: Connection restored to 4ecd6e0d-238a-4 (at 10.50.9.37@o2ib2) Mar 12 22:56:09 fir-io7-s1 kernel: Lustre: Skipped 1 previous similar message Mar 12 22:56:09 fir-io7-s1 kernel: Lustre: Skipped 4 previous similar messages Mar 12 22:59:10 fir-io7-s1 kernel: LNetError: 14721:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 22:59:10 fir-io7-s1 kernel: LNetError: 14721:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 169 previous similar messages Mar 12 23:01:54 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.236@o2ib7: 0 seconds Mar 12 23:01:54 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 115 previous similar messages Mar 12 23:02:00 fir-io7-s1 kernel: LNetError: 91381:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 12 23:02:00 fir-io7-s1 kernel: LNetError: 91381:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 402 previous similar messages Mar 12 23:09:10 fir-io7-s1 kernel: LNetError: 15127:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 23:09:10 fir-io7-s1 kernel: LNetError: 15127:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 186 previous similar messages Mar 12 23:12:05 fir-io7-s1 kernel: LNetError: 15374:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 12 23:12:05 fir-io7-s1 kernel: LNetError: 15374:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 348 previous similar messages Mar 12 23:12:06 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.226@o2ib7: 5 seconds Mar 12 23:12:06 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 252 previous similar messages Mar 12 23:15:50 fir-io7-s1 kernel: Lustre: fir-OST0052: haven't heard from client f2870079-5787-4 (at 10.50.9.37@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c796a31a800, cur 1584080150 expire 1584080000 last 1584079923 Mar 12 23:15:50 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 12 23:16:37 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 2f501a16-5689-4 (at 10.50.9.27@o2ib2) Mar 12 23:16:37 fir-io7-s1 kernel: Lustre: Skipped 4 previous similar messages Mar 12 23:16:43 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 4ecd6e0d-238a-4 (at 10.50.9.37@o2ib2) Mar 12 23:16:43 fir-io7-s1 kernel: Lustre: Skipped 6 previous similar messages Mar 12 23:19:10 fir-io7-s1 kernel: LNetError: 15374:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 23:19:10 fir-io7-s1 kernel: LNetError: 15374:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 192 previous similar messages Mar 12 23:22:05 fir-io7-s1 kernel: LNetError: 15728:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 12 23:22:05 fir-io7-s1 kernel: LNetError: 15728:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 350 previous similar messages Mar 12 23:22:13 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.237@o2ib7: 0 seconds Mar 12 23:22:13 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 355 previous similar messages Mar 12 23:29:15 fir-io7-s1 kernel: LNetError: 15728:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 23:29:15 fir-io7-s1 kernel: LNetError: 15728:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 214 previous similar messages Mar 12 23:32:20 fir-io7-s1 kernel: LNetError: 16082:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 12 23:32:20 fir-io7-s1 kernel: LNetError: 16082:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 411 previous similar messages Mar 12 23:32:20 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.226@o2ib7: 0 seconds Mar 12 23:32:20 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 256 previous similar messages Mar 12 23:32:33 fir-io7-s1 kernel: Lustre: fir-OST004e: haven't heard from client d9e0bd51-7019-4 (at 10.50.9.27@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c4c05acf400, cur 1584081153 expire 1584081003 last 1584080926 Mar 12 23:32:33 fir-io7-s1 kernel: Lustre: Skipped 11 previous similar messages Mar 12 23:33:23 fir-io7-s1 kernel: Lustre: fir-OST004a: Connection restored to 2f501a16-5689-4 (at 10.50.9.27@o2ib2) Mar 12 23:33:23 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 12 23:39:15 fir-io7-s1 kernel: LNetError: 16082:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 23:39:15 fir-io7-s1 kernel: LNetError: 16082:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 221 previous similar messages Mar 12 23:42:20 fir-io7-s1 kernel: LNetError: 16430:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 12 23:42:20 fir-io7-s1 kernel: LNetError: 16430:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 371 previous similar messages Mar 12 23:42:22 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 0 seconds Mar 12 23:42:22 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 204 previous similar messages Mar 12 23:49:15 fir-io7-s1 kernel: LNetError: 16620:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 23:49:15 fir-io7-s1 kernel: LNetError: 16620:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 177 previous similar messages Mar 12 23:52:20 fir-io7-s1 kernel: LNetError: 16620:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 12 23:52:20 fir-io7-s1 kernel: LNetError: 16620:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 363 previous similar messages Mar 12 23:52:26 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.237@o2ib7: 0 seconds Mar 12 23:52:26 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 337 previous similar messages Mar 12 23:59:15 fir-io7-s1 kernel: LNetError: 16902:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 12 23:59:15 fir-io7-s1 kernel: LNetError: 16902:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 174 previous similar messages Mar 13 00:02:20 fir-io7-s1 kernel: LNetError: 17143:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 13 00:02:20 fir-io7-s1 kernel: LNetError: 17143:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 402 previous similar messages Mar 13 00:02:30 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.235@o2ib7: 0 seconds Mar 13 00:02:30 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 531 previous similar messages Mar 13 00:05:43 fir-io7-s1 kernel: Lustre: fir-OST0052: haven't heard from client bcf81856-41d5-4 (at 10.50.9.37@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c698f764800, cur 1584083143 expire 1584082993 last 1584082916 Mar 13 00:05:43 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 13 00:06:16 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 4ecd6e0d-238a-4 (at 10.50.9.37@o2ib2) Mar 13 00:06:16 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 13 00:09:20 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 00:09:20 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 226 previous similar messages Mar 13 00:12:20 fir-io7-s1 kernel: LNetError: 17402:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 13 00:12:20 fir-io7-s1 kernel: LNetError: 17402:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 395 previous similar messages Mar 13 00:12:35 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.238@o2ib7: 0 seconds Mar 13 00:12:35 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 375 previous similar messages Mar 13 00:19:20 fir-io7-s1 kernel: LNetError: 16128:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 00:19:20 fir-io7-s1 kernel: LNetError: 16128:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 230 previous similar messages Mar 13 00:22:20 fir-io7-s1 kernel: LNetError: 17624:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 13 00:22:20 fir-io7-s1 kernel: LNetError: 17624:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 399 previous similar messages Mar 13 00:22:56 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.239@o2ib7: 1 seconds Mar 13 00:22:56 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 295 previous similar messages Mar 13 00:29:20 fir-io7-s1 kernel: LNetError: 18063:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 00:29:20 fir-io7-s1 kernel: LNetError: 18063:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 162 previous similar messages Mar 13 00:32:20 fir-io7-s1 kernel: LNetError: 18392:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 13 00:32:20 fir-io7-s1 kernel: LNetError: 18392:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 407 previous similar messages Mar 13 00:33:09 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 0 seconds Mar 13 00:33:09 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 67 previous similar messages Mar 13 00:39:20 fir-io7-s1 kernel: LNetError: 18392:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 00:39:20 fir-io7-s1 kernel: LNetError: 18392:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 175 previous similar messages Mar 13 00:42:20 fir-io7-s1 kernel: LNetError: 18829:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 13 00:42:20 fir-io7-s1 kernel: LNetError: 18829:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 397 previous similar messages Mar 13 00:43:15 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.219@o2ib7: 0 seconds Mar 13 00:43:15 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 129 previous similar messages Mar 13 00:49:20 fir-io7-s1 kernel: LNetError: 19048:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 00:49:20 fir-io7-s1 kernel: LNetError: 19048:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 208 previous similar messages Mar 13 00:52:25 fir-io7-s1 kernel: LNetError: 19048:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 13 00:52:25 fir-io7-s1 kernel: LNetError: 19048:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 399 previous similar messages Mar 13 00:53:18 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 0 seconds Mar 13 00:53:18 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 236 previous similar messages Mar 13 00:55:14 fir-io7-s1 kernel: Lustre: fir-OST004a: haven't heard from client b9223b3f-970b-4 (at 10.50.9.37@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c59f2285400, cur 1584086114 expire 1584085964 last 1584085887 Mar 13 00:55:14 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 13 00:56:02 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 4ecd6e0d-238a-4 (at 10.50.9.37@o2ib2) Mar 13 00:56:02 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 13 00:59:20 fir-io7-s1 kernel: LNetError: 19313:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 00:59:20 fir-io7-s1 kernel: LNetError: 19313:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 200 previous similar messages Mar 13 01:02:30 fir-io7-s1 kernel: LNetError: 19556:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 13 01:02:30 fir-io7-s1 kernel: LNetError: 19556:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 407 previous similar messages Mar 13 01:03:13 fir-io7-s1 kernel: Lustre: fir-OST0052: haven't heard from client c0aa9b44-03d3-4 (at 10.50.6.54@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c569c8e1000, cur 1584086593 expire 1584086443 last 1584086366 Mar 13 01:03:13 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 13 01:03:45 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.236@o2ib7: 0 seconds Mar 13 01:03:45 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 139 previous similar messages Mar 13 01:04:48 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to a282389e-3a6d-4 (at 10.50.6.54@o2ib2) Mar 13 01:04:48 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 13 01:09:20 fir-io7-s1 kernel: LNetError: 19556:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 01:09:20 fir-io7-s1 kernel: LNetError: 19556:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 164 previous similar messages Mar 13 01:12:30 fir-io7-s1 kernel: LNetError: 19556:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 13 01:12:30 fir-io7-s1 kernel: LNetError: 19556:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 382 previous similar messages Mar 13 01:13:47 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.237@o2ib7: 0 seconds Mar 13 01:13:47 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 143 previous similar messages Mar 13 01:19:20 fir-io7-s1 kernel: LNetError: 19556:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 01:19:20 fir-io7-s1 kernel: LNetError: 19556:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 196 previous similar messages Mar 13 01:22:35 fir-io7-s1 kernel: LNetError: 20296:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 13 01:22:35 fir-io7-s1 kernel: LNetError: 20296:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 397 previous similar messages Mar 13 01:23:59 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds Mar 13 01:23:59 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 154 previous similar messages Mar 13 01:26:36 fir-io7-s1 kernel: Lustre: fir-OST0052: haven't heard from client 9a15eded-b146-4 (at 10.50.9.37@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c698ea3f800, cur 1584087996 expire 1584087846 last 1584087769 Mar 13 01:26:36 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 13 01:27:25 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 4ecd6e0d-238a-4 (at 10.50.9.37@o2ib2) Mar 13 01:27:25 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 13 01:29:24 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 01:29:24 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 161 previous similar messages Mar 13 01:32:40 fir-io7-s1 kernel: LNetError: 20480:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 13 01:32:40 fir-io7-s1 kernel: LNetError: 20480:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 385 previous similar messages Mar 13 01:34:10 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.236@o2ib7: 1 seconds Mar 13 01:34:10 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 128 previous similar messages Mar 13 01:39:25 fir-io7-s1 kernel: LNetError: 20733:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 01:39:25 fir-io7-s1 kernel: LNetError: 20733:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 177 previous similar messages Mar 13 01:42:40 fir-io7-s1 kernel: LNetError: 21000:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 13 01:42:40 fir-io7-s1 kernel: LNetError: 21000:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 345 previous similar messages Mar 13 01:44:15 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 0 seconds Mar 13 01:44:15 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 268 previous similar messages Mar 13 01:45:07 fir-io7-s1 kernel: Lustre: fir-OST0052: haven't heard from client a5c47a0e-4a90-4 (at 10.50.9.27@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c4a5ae66400, cur 1584089107 expire 1584088957 last 1584088880 Mar 13 01:45:07 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 13 01:46:00 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 2f501a16-5689-4 (at 10.50.9.27@o2ib2) Mar 13 01:46:00 fir-io7-s1 kernel: Lustre: Skipped 4 previous similar messages Mar 13 01:49:25 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 01:49:25 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 170 previous similar messages Mar 13 01:52:40 fir-io7-s1 kernel: LNetError: 21251:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 13 01:52:40 fir-io7-s1 kernel: LNetError: 21251:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 408 previous similar messages Mar 13 01:54:20 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.226@o2ib7: 0 seconds Mar 13 01:54:20 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 386 previous similar messages Mar 13 01:59:25 fir-io7-s1 kernel: LNetError: 21251:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 01:59:25 fir-io7-s1 kernel: LNetError: 21251:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 206 previous similar messages Mar 13 02:02:45 fir-io7-s1 kernel: LNetError: 21251:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 13 02:02:45 fir-io7-s1 kernel: LNetError: 21251:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 393 previous similar messages Mar 13 02:04:25 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 0 seconds Mar 13 02:04:25 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 282 previous similar messages Mar 13 02:08:51 fir-io7-s1 kernel: Lustre: fir-OST004a: haven't heard from client cf8db701-b951-4 (at 10.50.9.37@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c80982d7800, cur 1584090531 expire 1584090381 last 1584090304 Mar 13 02:08:51 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 13 02:09:25 fir-io7-s1 kernel: LNetError: 21970:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 02:09:25 fir-io7-s1 kernel: LNetError: 21970:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 214 previous similar messages Mar 13 02:09:46 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 4ecd6e0d-238a-4 (at 10.50.9.37@o2ib2) Mar 13 02:09:46 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 13 02:12:45 fir-io7-s1 kernel: LNetError: 22054:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 13 02:12:45 fir-io7-s1 kernel: LNetError: 22054:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 365 previous similar messages Mar 13 02:14:35 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.232@o2ib7: 0 seconds Mar 13 02:14:35 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 351 previous similar messages Mar 13 02:19:25 fir-io7-s1 kernel: LNetError: 21181:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 02:19:25 fir-io7-s1 kernel: LNetError: 21181:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 196 previous similar messages Mar 13 02:22:45 fir-io7-s1 kernel: LNetError: 22054:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 13 02:22:45 fir-io7-s1 kernel: LNetError: 22054:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 412 previous similar messages Mar 13 02:24:51 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds Mar 13 02:24:51 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 170 previous similar messages Mar 13 02:27:26 fir-io7-s1 kernel: Lustre: fir-OST0052: haven't heard from client d742632a-e79c-4 (at 10.50.9.27@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c4ef55df000, cur 1584091646 expire 1584091496 last 1584091419 Mar 13 02:27:26 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 13 02:28:11 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 2f501a16-5689-4 (at 10.50.9.27@o2ib2) Mar 13 02:28:11 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 13 02:29:25 fir-io7-s1 kernel: LNetError: 22343:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 02:29:25 fir-io7-s1 kernel: LNetError: 22343:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 169 previous similar messages Mar 13 02:32:55 fir-io7-s1 kernel: LNetError: 88424:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 13 02:32:55 fir-io7-s1 kernel: LNetError: 88424:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 414 previous similar messages Mar 13 02:35:08 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.232@o2ib7: 0 seconds Mar 13 02:35:08 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 155 previous similar messages Mar 13 02:39:25 fir-io7-s1 kernel: LNetError: 91381:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 02:39:25 fir-io7-s1 kernel: LNetError: 91381:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 182 previous similar messages Mar 13 02:42:55 fir-io7-s1 kernel: LNetError: 22957:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 13 02:42:55 fir-io7-s1 kernel: LNetError: 22957:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 388 previous similar messages Mar 13 02:45:11 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds Mar 13 02:45:11 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 184 previous similar messages Mar 13 02:49:25 fir-io7-s1 kernel: LNetError: 22957:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 02:49:25 fir-io7-s1 kernel: LNetError: 22957:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 180 previous similar messages Mar 13 02:52:55 fir-io7-s1 kernel: LNetError: 23478:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 13 02:52:55 fir-io7-s1 kernel: LNetError: 23478:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 411 previous similar messages Mar 13 02:55:16 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.217@o2ib7: 0 seconds Mar 13 02:55:16 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 79 previous similar messages Mar 13 02:59:25 fir-io7-s1 kernel: LNetError: 23478:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 02:59:25 fir-io7-s1 kernel: LNetError: 23478:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 137 previous similar messages Mar 13 03:01:40 fir-io7-s1 kernel: Lustre: fir-OST0052: haven't heard from client e38564a8-1bec-4 (at 10.50.9.27@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c4e80ad9000, cur 1584093700 expire 1584093550 last 1584093473 Mar 13 03:01:40 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 13 03:02:20 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 2f501a16-5689-4 (at 10.50.9.27@o2ib2) Mar 13 03:02:20 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 13 03:03:00 fir-io7-s1 kernel: LNetError: 23828:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 13 03:03:00 fir-io7-s1 kernel: LNetError: 23828:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 411 previous similar messages Mar 13 03:05:19 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 0 seconds Mar 13 03:05:19 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 58 previous similar messages Mar 13 03:09:27 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 03:09:27 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 140 previous similar messages Mar 13 03:13:00 fir-io7-s1 kernel: LNetError: 23828:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 13 03:13:00 fir-io7-s1 kernel: LNetError: 23828:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 408 previous similar messages Mar 13 03:15:20 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.219@o2ib7: 0 seconds Mar 13 03:15:20 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 131 previous similar messages Mar 13 03:19:29 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 03:19:29 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 146 previous similar messages Mar 13 03:22:59 fir-io7-s1 kernel: Lustre: fir-OST004c: haven't heard from client e4a7e8a4-86ed-4 (at 10.50.9.37@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c4f84b96800, cur 1584094979 expire 1584094829 last 1584094752 Mar 13 03:22:59 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 13 03:23:00 fir-io7-s1 kernel: LNetError: 24698:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 13 03:23:00 fir-io7-s1 kernel: LNetError: 24698:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 411 previous similar messages Mar 13 03:23:43 fir-io7-s1 kernel: Lustre: fir-OST004a: Connection restored to 4ecd6e0d-238a-4 (at 10.50.9.37@o2ib2) Mar 13 03:23:43 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 4ecd6e0d-238a-4 (at 10.50.9.37@o2ib2) Mar 13 03:23:43 fir-io7-s1 kernel: Lustre: Skipped 3 previous similar messages Mar 13 03:23:45 fir-io7-s1 kernel: Lustre: fir-OST004e: Connection restored to 4ecd6e0d-238a-4 (at 10.50.9.37@o2ib2) Mar 13 03:25:29 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 1 seconds Mar 13 03:25:29 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 690 previous similar messages Mar 13 03:29:30 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 03:29:30 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 187 previous similar messages Mar 13 03:33:00 fir-io7-s1 kernel: LNetError: 24698:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 13 03:33:00 fir-io7-s1 kernel: LNetError: 24698:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 418 previous similar messages Mar 13 03:35:30 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 0 seconds Mar 13 03:35:30 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 432 previous similar messages Mar 13 03:37:04 fir-io7-s1 kernel: Lustre: fir-OST004c: haven't heard from client 21f6486e-eb98-4 (at 10.50.9.27@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6f1c5d5800, cur 1584095824 expire 1584095674 last 1584095597 Mar 13 03:37:04 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 13 03:39:30 fir-io7-s1 kernel: LNetError: 25272:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 03:39:30 fir-io7-s1 kernel: LNetError: 25272:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 219 previous similar messages Mar 13 03:40:38 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 2f501a16-5689-4 (at 10.50.9.27@o2ib2) Mar 13 03:40:38 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 13 03:43:00 fir-io7-s1 kernel: LNetError: 25063:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 13 03:43:00 fir-io7-s1 kernel: LNetError: 25063:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 411 previous similar messages Mar 13 03:45:34 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.233@o2ib7: 0 seconds Mar 13 03:45:34 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 405 previous similar messages Mar 13 03:49:30 fir-io7-s1 kernel: LNetError: 25432:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 03:49:30 fir-io7-s1 kernel: LNetError: 25432:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 231 previous similar messages Mar 13 03:53:00 fir-io7-s1 kernel: LNetError: 25667:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 13 03:53:00 fir-io7-s1 kernel: LNetError: 25667:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 410 previous similar messages Mar 13 03:55:35 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.219@o2ib7: 0 seconds Mar 13 03:55:35 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 340 previous similar messages Mar 13 03:58:38 fir-io7-s1 kernel: Lustre: fir-OST0052: haven't heard from client 09c6c766-2d4b-4 (at 10.50.9.27@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c52c8f03400, cur 1584097118 expire 1584096968 last 1584096891 Mar 13 03:58:38 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 13 03:59:30 fir-io7-s1 kernel: LNetError: 25667:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 03:59:30 fir-io7-s1 kernel: LNetError: 25667:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 190 previous similar messages Mar 13 04:03:00 fir-io7-s1 kernel: LNetError: 26012:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 13 04:03:00 fir-io7-s1 kernel: LNetError: 26012:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 414 previous similar messages Mar 13 04:04:17 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 2f501a16-5689-4 (at 10.50.9.27@o2ib2) Mar 13 04:04:17 fir-io7-s1 kernel: Lustre: Skipped 4 previous similar messages Mar 13 04:05:44 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 0 seconds Mar 13 04:05:44 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 117 previous similar messages Mar 13 04:06:22 fir-io7-s1 kernel: Lustre: fir-OST0050: haven't heard from client 47a08d64-a4dd-4 (at 10.50.9.37@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c581ffe9c00, cur 1584097582 expire 1584097432 last 1584097355 Mar 13 04:06:22 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 13 04:07:11 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 4ecd6e0d-238a-4 (at 10.50.9.37@o2ib2) Mar 13 04:07:11 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 13 04:09:30 fir-io7-s1 kernel: LNetError: 91381:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 04:09:30 fir-io7-s1 kernel: LNetError: 91381:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 191 previous similar messages Mar 13 04:13:00 fir-io7-s1 kernel: LNetError: 26468:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 13 04:13:00 fir-io7-s1 kernel: LNetError: 26468:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 388 previous similar messages Mar 13 04:14:37 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 2d360bb8-fa10-4 (at 10.50.1.51@o2ib2) Mar 13 04:14:37 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 13 04:15:45 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.217@o2ib7: 1 seconds Mar 13 04:15:45 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 212 previous similar messages Mar 13 04:15:54 fir-io7-s1 kernel: Lustre: fir-OST0050: haven't heard from client 2d360bb8-fa10-4 (at 10.50.1.51@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c4c7e863000, cur 1584098154 expire 1584098004 last 1584097927 Mar 13 04:15:54 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 13 04:15:56 fir-io7-s1 kernel: Lustre: fir-OST004c: haven't heard from client 2d360bb8-fa10-4 (at 10.50.1.51@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6bfe4cc800, cur 1584098156 expire 1584098006 last 1584097929 Mar 13 04:15:56 fir-io7-s1 kernel: Lustre: Skipped 3 previous similar messages Mar 13 04:19:30 fir-io7-s1 kernel: LNetError: 26468:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 04:19:30 fir-io7-s1 kernel: LNetError: 26468:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 200 previous similar messages Mar 13 04:23:05 fir-io7-s1 kernel: LNetError: 26746:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 13 04:23:05 fir-io7-s1 kernel: LNetError: 26746:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 406 previous similar messages Mar 13 04:24:46 fir-io7-s1 kernel: Lustre: fir-OST004c: haven't heard from client 8cabe4b9-6b7b-4 (at 10.50.9.37@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6eecef9800, cur 1584098686 expire 1584098536 last 1584098459 Mar 13 04:24:46 fir-io7-s1 kernel: Lustre: Skipped 1 previous similar message Mar 13 04:25:27 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 4ecd6e0d-238a-4 (at 10.50.9.37@o2ib2) Mar 13 04:25:27 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 13 04:25:46 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 0 seconds Mar 13 04:25:46 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 272 previous similar messages Mar 13 04:29:30 fir-io7-s1 kernel: LNetError: 26746:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 04:29:30 fir-io7-s1 kernel: LNetError: 26746:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 214 previous similar messages Mar 13 04:33:05 fir-io7-s1 kernel: LNetError: 26746:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 13 04:33:05 fir-io7-s1 kernel: LNetError: 26746:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 366 previous similar messages Mar 13 04:36:20 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.238@o2ib7: 0 seconds Mar 13 04:36:20 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 117 previous similar messages Mar 13 04:39:35 fir-io7-s1 kernel: LNetError: 26746:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 04:39:35 fir-io7-s1 kernel: LNetError: 26746:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 147 previous similar messages Mar 13 04:43:10 fir-io7-s1 kernel: LNetError: 27448:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 13 04:43:10 fir-io7-s1 kernel: LNetError: 27448:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 393 previous similar messages Mar 13 04:46:32 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.237@o2ib7: 0 seconds Mar 13 04:46:32 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 98 previous similar messages Mar 13 04:49:35 fir-io7-s1 kernel: LNetError: 27448:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 04:49:35 fir-io7-s1 kernel: LNetError: 27448:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 181 previous similar messages Mar 13 04:49:43 fir-io7-s1 kernel: Lustre: fir-OST004c: haven't heard from client 287c4e0b-c729-4 (at 10.50.9.37@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6992f74000, cur 1584100183 expire 1584100033 last 1584099956 Mar 13 04:49:43 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 13 04:50:35 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 4ecd6e0d-238a-4 (at 10.50.9.37@o2ib2) Mar 13 04:50:35 fir-io7-s1 kernel: Lustre: Skipped 3 previous similar messages Mar 13 04:53:10 fir-io7-s1 kernel: LNetError: 27809:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 13 04:53:10 fir-io7-s1 kernel: LNetError: 27809:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 407 previous similar messages Mar 13 04:56:34 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.237@o2ib7: 0 seconds Mar 13 04:56:34 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 358 previous similar messages Mar 13 04:59:35 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 04:59:35 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 177 previous similar messages Mar 13 05:03:10 fir-io7-s1 kernel: LNetError: 27809:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 13 05:03:10 fir-io7-s1 kernel: LNetError: 27809:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 394 previous similar messages Mar 13 05:06:36 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.237@o2ib7: 0 seconds Mar 13 05:06:36 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 430 previous similar messages Mar 13 05:09:35 fir-io7-s1 kernel: LNetError: 28310:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 05:09:35 fir-io7-s1 kernel: LNetError: 28310:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 214 previous similar messages Mar 13 05:13:10 fir-io7-s1 kernel: LNetError: 28533:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 13 05:13:10 fir-io7-s1 kernel: LNetError: 28533:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 403 previous similar messages Mar 13 05:16:45 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 0 seconds Mar 13 05:16:45 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 250 previous similar messages Mar 13 05:19:35 fir-io7-s1 kernel: LNetError: 28802:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 05:19:35 fir-io7-s1 kernel: LNetError: 28802:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 218 previous similar messages Mar 13 05:23:10 fir-io7-s1 kernel: LNetError: 28802:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 13 05:23:10 fir-io7-s1 kernel: LNetError: 28802:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 398 previous similar messages Mar 13 05:26:51 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 0 seconds Mar 13 05:26:51 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 240 previous similar messages Mar 13 05:28:15 fir-io7-s1 kernel: Lustre: fir-OST004e: haven't heard from client e3e8af2e-145d-4 (at 10.50.9.37@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c78e1f1c000, cur 1584102495 expire 1584102345 last 1584102268 Mar 13 05:28:15 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 13 05:29:04 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 4ecd6e0d-238a-4 (at 10.50.9.37@o2ib2) Mar 13 05:29:04 fir-io7-s1 kernel: Lustre: Skipped 6 previous similar messages Mar 13 05:29:35 fir-io7-s1 kernel: LNetError: 28802:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 05:29:35 fir-io7-s1 kernel: LNetError: 28802:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 198 previous similar messages Mar 13 05:33:10 fir-io7-s1 kernel: LNetError: 29239:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 13 05:33:10 fir-io7-s1 kernel: LNetError: 29239:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 364 previous similar messages Mar 13 05:36:59 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 0 seconds Mar 13 05:36:59 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 245 previous similar messages Mar 13 05:39:35 fir-io7-s1 kernel: LNetError: 29491:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 05:39:35 fir-io7-s1 kernel: LNetError: 29491:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 191 previous similar messages Mar 13 05:43:10 fir-io7-s1 kernel: LNetError: 29491:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 13 05:43:10 fir-io7-s1 kernel: LNetError: 29491:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 388 previous similar messages Mar 13 05:47:11 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.237@o2ib7: 0 seconds Mar 13 05:47:11 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 156 previous similar messages Mar 13 05:49:35 fir-io7-s1 kernel: LNetError: 29420:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 05:49:35 fir-io7-s1 kernel: LNetError: 29420:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 209 previous similar messages Mar 13 05:53:15 fir-io7-s1 kernel: LNetError: 29713:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 13 05:53:15 fir-io7-s1 kernel: LNetError: 29713:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 402 previous similar messages Mar 13 05:57:15 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.236@o2ib7: 0 seconds Mar 13 05:57:15 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 292 previous similar messages Mar 13 05:59:35 fir-io7-s1 kernel: LNetError: 30074:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 05:59:35 fir-io7-s1 kernel: LNetError: 30074:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 202 previous similar messages Mar 13 06:03:20 fir-io7-s1 kernel: LNetError: 30292:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 13 06:03:20 fir-io7-s1 kernel: LNetError: 30292:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 377 previous similar messages Mar 13 06:07:16 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.232@o2ib7: 0 seconds Mar 13 06:07:16 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 175 previous similar messages Mar 13 06:09:35 fir-io7-s1 kernel: LNetError: 29945:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 06:09:35 fir-io7-s1 kernel: LNetError: 29945:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 199 previous similar messages Mar 13 06:13:20 fir-io7-s1 kernel: LNetError: 30292:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 13 06:13:20 fir-io7-s1 kernel: LNetError: 30292:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 377 previous similar messages Mar 13 06:17:21 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 0 seconds Mar 13 06:17:21 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 109 previous similar messages Mar 13 06:19:35 fir-io7-s1 kernel: LNetError: 30809:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 06:19:35 fir-io7-s1 kernel: LNetError: 30809:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 165 previous similar messages Mar 13 06:23:25 fir-io7-s1 kernel: LNetError: 30809:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 13 06:23:25 fir-io7-s1 kernel: LNetError: 30809:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 387 previous similar messages Mar 13 06:27:25 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.219@o2ib7: 0 seconds Mar 13 06:27:25 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 122 previous similar messages Mar 13 06:29:35 fir-io7-s1 kernel: LNetError: 30809:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 06:29:35 fir-io7-s1 kernel: LNetError: 30809:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 193 previous similar messages Mar 13 06:33:26 fir-io7-s1 kernel: LNetError: 31393:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 13 06:33:26 fir-io7-s1 kernel: LNetError: 31393:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 392 previous similar messages Mar 13 06:37:28 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 0 seconds Mar 13 06:37:28 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 106 previous similar messages Mar 13 06:39:40 fir-io7-s1 kernel: LNetError: 31393:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 06:39:40 fir-io7-s1 kernel: LNetError: 31393:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 179 previous similar messages Mar 13 06:43:30 fir-io7-s1 kernel: LNetError: 31739:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 13 06:43:30 fir-io7-s1 kernel: LNetError: 31739:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 402 previous similar messages Mar 13 06:44:26 fir-io7-s1 kernel: Lustre: fir-OST004a: haven't heard from client 47aed9da-7770-4 (at 10.50.9.27@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c699095d800, cur 1584107066 expire 1584106916 last 1584106839 Mar 13 06:44:26 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 13 06:45:06 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 2f501a16-5689-4 (at 10.50.9.27@o2ib2) Mar 13 06:45:06 fir-io7-s1 kernel: Lustre: fir-OST004c: Connection restored to 2f501a16-5689-4 (at 10.50.9.27@o2ib2) Mar 13 06:45:06 fir-io7-s1 kernel: Lustre: Skipped 3 previous similar messages Mar 13 06:47:29 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds Mar 13 06:47:29 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 106 previous similar messages Mar 13 06:49:45 fir-io7-s1 kernel: LNetError: 31739:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 06:49:45 fir-io7-s1 kernel: LNetError: 31739:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 184 previous similar messages Mar 13 06:53:30 fir-io7-s1 kernel: LNetError: 32100:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 13 06:53:30 fir-io7-s1 kernel: LNetError: 32100:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 397 previous similar messages Mar 13 06:57:30 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.237@o2ib7: 0 seconds Mar 13 06:57:30 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 615 previous similar messages Mar 13 06:59:45 fir-io7-s1 kernel: LNetError: 32292:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 06:59:45 fir-io7-s1 kernel: LNetError: 32292:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 184 previous similar messages Mar 13 07:03:30 fir-io7-s1 kernel: LNetError: 32292:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 13 07:03:30 fir-io7-s1 kernel: LNetError: 32292:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 389 previous similar messages Mar 13 07:07:32 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.235@o2ib7: 0 seconds Mar 13 07:07:32 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 284 previous similar messages Mar 13 07:09:45 fir-io7-s1 kernel: LNetError: 32606:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 07:09:45 fir-io7-s1 kernel: LNetError: 32606:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 203 previous similar messages Mar 13 07:13:30 fir-io7-s1 kernel: LNetError: 32817:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 13 07:13:30 fir-io7-s1 kernel: LNetError: 32817:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 406 previous similar messages Mar 13 07:18:01 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.219@o2ib7: 0 seconds Mar 13 07:18:01 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 326 previous similar messages Mar 13 07:19:45 fir-io7-s1 kernel: LNetError: 32817:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 07:19:45 fir-io7-s1 kernel: LNetError: 32817:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 205 previous similar messages Mar 13 07:23:30 fir-io7-s1 kernel: LNetError: 33167:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 13 07:23:30 fir-io7-s1 kernel: LNetError: 33167:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 389 previous similar messages Mar 13 07:28:35 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.237@o2ib7: 0 seconds Mar 13 07:28:35 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 136 previous similar messages Mar 13 07:29:45 fir-io7-s1 kernel: LNetError: 33167:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 07:29:45 fir-io7-s1 kernel: LNetError: 33167:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 169 previous similar messages Mar 13 07:33:30 fir-io7-s1 kernel: LNetError: 33520:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 13 07:33:30 fir-io7-s1 kernel: LNetError: 33520:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 371 previous similar messages Mar 13 07:38:44 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 0 seconds Mar 13 07:38:44 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 140 previous similar messages Mar 13 07:39:45 fir-io7-s1 kernel: LNetError: 91381:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 07:39:45 fir-io7-s1 kernel: LNetError: 91381:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 168 previous similar messages Mar 13 07:43:30 fir-io7-s1 kernel: LNetError: 33520:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 13 07:43:30 fir-io7-s1 kernel: LNetError: 33520:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 397 previous similar messages Mar 13 07:48:46 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 0 seconds Mar 13 07:48:46 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 194 previous similar messages Mar 13 07:49:45 fir-io7-s1 kernel: LNetError: 33520:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 07:49:45 fir-io7-s1 kernel: LNetError: 33520:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 197 previous similar messages Mar 13 07:53:30 fir-io7-s1 kernel: LNetError: 33520:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 13 07:53:30 fir-io7-s1 kernel: LNetError: 33520:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 345 previous similar messages Mar 13 07:58:55 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 0 seconds Mar 13 07:58:55 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 99 previous similar messages Mar 13 07:59:45 fir-io7-s1 kernel: LNetError: 33520:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 07:59:45 fir-io7-s1 kernel: LNetError: 33520:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 159 previous similar messages Mar 13 08:03:30 fir-io7-s1 kernel: LNetError: 33520:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 13 08:03:30 fir-io7-s1 kernel: LNetError: 33520:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 355 previous similar messages Mar 13 08:09:05 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.226@o2ib7: 0 seconds Mar 13 08:09:05 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 75 previous similar messages Mar 13 08:09:45 fir-io7-s1 kernel: LNetError: 34732:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 08:09:45 fir-io7-s1 kernel: LNetError: 34732:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 141 previous similar messages Mar 13 08:13:30 fir-io7-s1 kernel: LNetError: 34945:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 13 08:13:30 fir-io7-s1 kernel: LNetError: 34945:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 376 previous similar messages Mar 13 08:19:15 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.232@o2ib7: 0 seconds Mar 13 08:19:15 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 35 previous similar messages Mar 13 08:19:45 fir-io7-s1 kernel: LNetError: 35155:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 08:19:45 fir-io7-s1 kernel: LNetError: 35155:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 141 previous similar messages Mar 13 08:23:35 fir-io7-s1 kernel: LNetError: 34945:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 13 08:23:35 fir-io7-s1 kernel: LNetError: 34945:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 352 previous similar messages Mar 13 08:29:24 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.232@o2ib7: 0 seconds Mar 13 08:29:24 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 107 previous similar messages Mar 13 08:29:50 fir-io7-s1 kernel: LNetError: 34770:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 08:29:50 fir-io7-s1 kernel: LNetError: 34770:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 180 previous similar messages Mar 13 08:33:35 fir-io7-s1 kernel: LNetError: 35744:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 13 08:33:35 fir-io7-s1 kernel: LNetError: 35744:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 393 previous similar messages Mar 13 08:39:25 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.217@o2ib7: 0 seconds Mar 13 08:39:25 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 196 previous similar messages Mar 13 08:39:50 fir-io7-s1 kernel: LNetError: 35744:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 08:39:50 fir-io7-s1 kernel: LNetError: 35744:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 169 previous similar messages Mar 13 08:43:40 fir-io7-s1 kernel: LNetError: 35991:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 13 08:43:40 fir-io7-s1 kernel: LNetError: 35991:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 345 previous similar messages Mar 13 08:49:30 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.219@o2ib7: 0 seconds Mar 13 08:49:30 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 614 previous similar messages Mar 13 08:49:50 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 08:49:50 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 153 previous similar messages Mar 13 08:53:45 fir-io7-s1 kernel: LNetError: 35991:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 13 08:53:45 fir-io7-s1 kernel: LNetError: 35991:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 364 previous similar messages Mar 13 08:59:40 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.238@o2ib7: 0 seconds Mar 13 08:59:40 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 275 previous similar messages Mar 13 08:59:50 fir-io7-s1 kernel: LNetError: 36486:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 08:59:50 fir-io7-s1 kernel: LNetError: 36486:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 189 previous similar messages Mar 13 09:03:50 fir-io7-s1 kernel: LNetError: 36689:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 13 09:03:50 fir-io7-s1 kernel: LNetError: 36689:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 357 previous similar messages Mar 13 09:09:50 fir-io7-s1 kernel: LNetError: 36689:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 09:09:50 fir-io7-s1 kernel: LNetError: 36689:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 223 previous similar messages Mar 13 09:09:55 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.232@o2ib7: 0 seconds Mar 13 09:09:55 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 237 previous similar messages Mar 13 09:13:50 fir-io7-s1 kernel: LNetError: 37061:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 13 09:13:50 fir-io7-s1 kernel: LNetError: 37061:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 336 previous similar messages Mar 13 09:19:50 fir-io7-s1 kernel: LNetError: 37061:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 09:19:50 fir-io7-s1 kernel: LNetError: 37061:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 219 previous similar messages Mar 13 09:20:00 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.239@o2ib7: 0 seconds Mar 13 09:20:00 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 272 previous similar messages Mar 13 09:23:55 fir-io7-s1 kernel: LNetError: 37408:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 13 09:23:55 fir-io7-s1 kernel: LNetError: 37408:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 337 previous similar messages Mar 13 09:29:55 fir-io7-s1 kernel: LNetError: 91381:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 09:29:55 fir-io7-s1 kernel: LNetError: 91381:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 216 previous similar messages Mar 13 09:30:15 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 0 seconds Mar 13 09:30:15 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 290 previous similar messages Mar 13 09:33:55 fir-io7-s1 kernel: LNetError: 37902:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 13 09:33:55 fir-io7-s1 kernel: LNetError: 37902:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 360 previous similar messages Mar 13 09:39:55 fir-io7-s1 kernel: LNetError: 91381:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 09:39:55 fir-io7-s1 kernel: LNetError: 91381:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 166 previous similar messages Mar 13 09:40:39 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.219@o2ib7: 0 seconds Mar 13 09:40:39 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 127 previous similar messages Mar 13 09:43:55 fir-io7-s1 kernel: LNetError: 37902:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 13 09:43:55 fir-io7-s1 kernel: LNetError: 37902:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 352 previous similar messages Mar 13 09:49:55 fir-io7-s1 kernel: LNetError: 37902:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 09:49:55 fir-io7-s1 kernel: LNetError: 37902:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 172 previous similar messages Mar 13 09:50:46 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 0 seconds Mar 13 09:50:46 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 157 previous similar messages Mar 13 09:53:55 fir-io7-s1 kernel: LNetError: 37902:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 13 09:53:55 fir-io7-s1 kernel: LNetError: 37902:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 353 previous similar messages Mar 13 09:59:55 fir-io7-s1 kernel: LNetError: 38704:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 09:59:55 fir-io7-s1 kernel: LNetError: 38704:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 193 previous similar messages Mar 13 10:01:05 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 0 seconds Mar 13 10:01:05 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 160 previous similar messages Mar 13 10:03:55 fir-io7-s1 kernel: LNetError: 38609:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 13 10:03:55 fir-io7-s1 kernel: LNetError: 38609:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 353 previous similar messages Mar 13 10:09:55 fir-io7-s1 kernel: LNetError: 39114:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 10:09:55 fir-io7-s1 kernel: LNetError: 39114:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 159 previous similar messages Mar 13 10:11:10 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.225@o2ib7: 0 seconds Mar 13 10:11:10 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 115 previous similar messages Mar 13 10:13:55 fir-io7-s1 kernel: LNetError: 38979:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 13 10:13:55 fir-io7-s1 kernel: LNetError: 38979:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 340 previous similar messages Mar 13 10:19:55 fir-io7-s1 kernel: LNetError: 38924:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 10:19:55 fir-io7-s1 kernel: LNetError: 38924:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 169 previous similar messages Mar 13 10:21:16 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.236@o2ib7: 0 seconds Mar 13 10:21:16 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 158 previous similar messages Mar 13 10:23:55 fir-io7-s1 kernel: LNetError: 39330:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 13 10:23:55 fir-io7-s1 kernel: LNetError: 39330:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 331 previous similar messages Mar 13 10:29:55 fir-io7-s1 kernel: LNetError: 39679:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 10:29:55 fir-io7-s1 kernel: LNetError: 39679:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 171 previous similar messages Mar 13 10:31:21 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.232@o2ib7: 0 seconds Mar 13 10:31:21 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 112 previous similar messages Mar 13 10:33:55 fir-io7-s1 kernel: LNetError: 39889:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 13 10:33:55 fir-io7-s1 kernel: LNetError: 39889:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 364 previous similar messages Mar 13 10:39:55 fir-io7-s1 kernel: LNetError: 88424:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 10:39:55 fir-io7-s1 kernel: LNetError: 88424:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 149 previous similar messages Mar 13 10:41:40 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 0 seconds Mar 13 10:41:40 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 58 previous similar messages Mar 13 10:43:55 fir-io7-s1 kernel: LNetError: 39889:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 13 10:43:55 fir-io7-s1 kernel: LNetError: 39889:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 348 previous similar messages Mar 13 10:48:38 fir-io7-s1 kernel: LustreError: 84743:0:(tgt_grant.c:758:tgt_grant_check()) fir-OST004e: cli a387b29d-067d-4 claims 28672 GRANT, real grant 4096 Mar 13 10:49:56 fir-io7-s1 kernel: LNetError: 40404:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 10:49:56 fir-io7-s1 kernel: LNetError: 40404:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 168 previous similar messages Mar 13 10:52:09 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.238@o2ib7: 0 seconds Mar 13 10:52:09 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 116 previous similar messages Mar 13 10:54:00 fir-io7-s1 kernel: LNetError: 40610:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 13 10:54:00 fir-io7-s1 kernel: LNetError: 40610:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 359 previous similar messages Mar 13 10:55:06 fir-io7-s1 kernel: Lustre: fir-OST0052: Client bf9c0359-a266-4 (at 10.50.0.1@o2ib2) reconnecting Mar 13 10:55:06 fir-io7-s1 kernel: Lustre: fir-OST0052: Connection restored to bf9c0359-a266-4 (at 10.50.0.1@o2ib2) Mar 13 10:55:06 fir-io7-s1 kernel: Lustre: Skipped 1 previous similar message Mar 13 10:55:45 fir-io7-s1 kernel: Lustre: fir-OST0052: Connection restored to bf9c0359-a266-4 (at 10.50.0.1@o2ib2) Mar 13 11:00:00 fir-io7-s1 kernel: LNetError: 40610:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 11:00:00 fir-io7-s1 kernel: LNetError: 40610:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 172 previous similar messages Mar 13 11:02:10 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds Mar 13 11:02:10 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 99 previous similar messages Mar 13 11:04:00 fir-io7-s1 kernel: LNetError: 41054:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 13 11:04:00 fir-io7-s1 kernel: LNetError: 41054:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 349 previous similar messages Mar 13 11:10:00 fir-io7-s1 kernel: LNetError: 41054:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 11:10:00 fir-io7-s1 kernel: LNetError: 41054:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 183 previous similar messages Mar 13 11:12:15 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds Mar 13 11:12:15 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 232 previous similar messages Mar 13 11:14:00 fir-io7-s1 kernel: LNetError: 41427:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 13 11:14:00 fir-io7-s1 kernel: LNetError: 41427:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 355 previous similar messages Mar 13 11:20:00 fir-io7-s1 kernel: LNetError: 41427:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 11:20:00 fir-io7-s1 kernel: LNetError: 41427:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 142 previous similar messages Mar 13 11:22:29 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.233@o2ib7: 0 seconds Mar 13 11:22:29 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 574 previous similar messages Mar 13 11:24:00 fir-io7-s1 kernel: LNetError: 41781:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 13 11:24:00 fir-io7-s1 kernel: LNetError: 41781:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 320 previous similar messages Mar 13 11:30:00 fir-io7-s1 kernel: LNetError: 41781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 11:30:00 fir-io7-s1 kernel: LNetError: 41781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 198 previous similar messages Mar 13 11:32:45 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.232@o2ib7: 0 seconds Mar 13 11:32:45 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 397 previous similar messages Mar 13 11:34:00 fir-io7-s1 kernel: LNetError: 42134:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 13 11:34:00 fir-io7-s1 kernel: LNetError: 42134:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 341 previous similar messages Mar 13 11:40:00 fir-io7-s1 kernel: LNetError: 42134:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 11:40:00 fir-io7-s1 kernel: LNetError: 42134:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 199 previous similar messages Mar 13 11:42:55 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.236@o2ib7: 0 seconds Mar 13 11:42:55 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 239 previous similar messages Mar 13 11:44:00 fir-io7-s1 kernel: LNetError: 42484:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 13 11:44:00 fir-io7-s1 kernel: LNetError: 42484:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 343 previous similar messages Mar 13 11:50:00 fir-io7-s1 kernel: LNetError: 42484:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 11:50:00 fir-io7-s1 kernel: LNetError: 42484:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 209 previous similar messages Mar 13 11:52:59 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.227@o2ib7: 0 seconds Mar 13 11:52:59 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 347 previous similar messages Mar 13 11:54:00 fir-io7-s1 kernel: LNetError: 42838:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 13 11:54:00 fir-io7-s1 kernel: LNetError: 42838:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 331 previous similar messages Mar 13 12:00:00 fir-io7-s1 kernel: LNetError: 42838:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 12:00:00 fir-io7-s1 kernel: LNetError: 42838:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 196 previous similar messages Mar 13 12:03:00 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.227@o2ib7: 0 seconds Mar 13 12:03:00 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 154 previous similar messages Mar 13 12:04:00 fir-io7-s1 kernel: LNetError: 42838:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 13 12:04:00 fir-io7-s1 kernel: LNetError: 42838:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 343 previous similar messages Mar 13 12:08:53 fir-io7-s1 kernel: Lustre: fir-OST0048: haven't heard from client 0f186c88-bcb4-4 (at 10.50.9.27@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6bfe4cb800, cur 1584126533 expire 1584126383 last 1584126306 Mar 13 12:08:53 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 13 12:09:34 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 2f501a16-5689-4 (at 10.50.9.27@o2ib2) Mar 13 12:09:34 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 13 12:10:00 fir-io7-s1 kernel: LNetError: 43344:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 12:10:00 fir-io7-s1 kernel: LNetError: 43344:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 161 previous similar messages Mar 13 12:13:06 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.233@o2ib7: 1 seconds Mar 13 12:13:06 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 136 previous similar messages Mar 13 12:14:00 fir-io7-s1 kernel: LNetError: 43344:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 13 12:14:00 fir-io7-s1 kernel: LNetError: 43344:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 331 previous similar messages Mar 13 12:20:05 fir-io7-s1 kernel: LNetError: 43699:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 12:20:05 fir-io7-s1 kernel: LNetError: 43699:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 193 previous similar messages Mar 13 12:22:37 fir-io7-s1 kernel: Lustre: fir-OST004e: haven't heard from client 05864610-63c0-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c79cd7e6000, cur 1584127357 expire 1584127207 last 1584127130 Mar 13 12:22:37 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 13 12:22:56 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 13 12:22:56 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 13 12:23:15 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 0 seconds Mar 13 12:23:15 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 124 previous similar messages Mar 13 12:24:00 fir-io7-s1 kernel: LNetError: 43920:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 13 12:24:00 fir-io7-s1 kernel: LNetError: 43920:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 358 previous similar messages Mar 13 12:25:49 fir-io7-s1 kernel: Lustre: fir-OST0052: haven't heard from client 4ea22083-1f76-4 (at 10.50.9.37@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c59f2285000, cur 1584127549 expire 1584127399 last 1584127322 Mar 13 12:25:49 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 13 12:25:58 fir-io7-s1 kernel: Lustre: fir-OST004e: haven't heard from client 4ea22083-1f76-4 (at 10.50.9.37@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c5a2cf8c400, cur 1584127558 expire 1584127408 last 1584127331 Mar 13 12:25:58 fir-io7-s1 kernel: Lustre: Skipped 1 previous similar message Mar 13 12:25:59 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 13 12:25:59 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 13 12:26:35 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 4ecd6e0d-238a-4 (at 10.50.9.37@o2ib2) Mar 13 12:26:35 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 13 12:27:05 fir-io7-s1 kernel: Lustre: fir-OST0052: haven't heard from client 23eec0b9-0e96-4 (at 10.49.26.4@o2ib1) in 223 seconds. I think it's dead, and I am evicting it. exp ffff9c7799c53800, cur 1584127625 expire 1584127475 last 1584127402 Mar 13 12:27:05 fir-io7-s1 kernel: Lustre: Skipped 7 previous similar messages Mar 13 12:30:05 fir-io7-s1 kernel: LNetError: 43920:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 12:30:05 fir-io7-s1 kernel: LNetError: 43920:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 168 previous similar messages Mar 13 12:33:16 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.236@o2ib7: 0 seconds Mar 13 12:33:16 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 215 previous similar messages Mar 13 12:34:00 fir-io7-s1 kernel: LNetError: 44273:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 13 12:34:00 fir-io7-s1 kernel: LNetError: 44273:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 353 previous similar messages Mar 13 12:35:52 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 13 12:35:52 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 13 12:36:39 fir-io7-s1 kernel: Lustre: fir-OST004c: haven't heard from client 78beaa53-7f08-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c698eb31000, cur 1584128199 expire 1584128049 last 1584127972 Mar 13 12:36:39 fir-io7-s1 kernel: Lustre: Skipped 1 previous similar message Mar 13 12:36:54 fir-io7-s1 kernel: Lustre: fir-OST0048: haven't heard from client 78beaa53-7f08-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c5b4f9bb000, cur 1584128214 expire 1584128064 last 1584127987 Mar 13 12:40:05 fir-io7-s1 kernel: LNetError: 44273:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 12:40:05 fir-io7-s1 kernel: LNetError: 44273:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 214 previous similar messages Mar 13 12:43:19 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.226@o2ib7: 0 seconds Mar 13 12:43:19 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 153 previous similar messages Mar 13 12:44:00 fir-io7-s1 kernel: LNetError: 44273:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 13 12:44:00 fir-io7-s1 kernel: LNetError: 44273:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 345 previous similar messages Mar 13 12:44:28 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 13 12:44:28 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 13 12:45:31 fir-io7-s1 kernel: Lustre: fir-OST004e: haven't heard from client 73c35f00-40db-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c519964b000, cur 1584128731 expire 1584128581 last 1584128504 Mar 13 12:45:31 fir-io7-s1 kernel: Lustre: Skipped 4 previous similar messages Mar 13 12:47:13 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 13 12:47:13 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 13 12:48:15 fir-io7-s1 kernel: Lustre: fir-OST0050: haven't heard from client 227f0129-25d2-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c698f22c800, cur 1584128895 expire 1584128745 last 1584128668 Mar 13 12:48:15 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 13 12:50:05 fir-io7-s1 kernel: LNetError: 44750:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 12:50:05 fir-io7-s1 kernel: LNetError: 44750:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 190 previous similar messages Mar 13 12:50:46 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 13 12:50:46 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 13 12:51:51 fir-io7-s1 kernel: Lustre: fir-OST0048: haven't heard from client 529d1a8c-397a-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6f91f96400, cur 1584129111 expire 1584128961 last 1584128884 Mar 13 12:51:51 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 13 12:53:20 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 0 seconds Mar 13 12:53:20 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 189 previous similar messages Mar 13 12:54:00 fir-io7-s1 kernel: LNetError: 44971:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 13 12:54:00 fir-io7-s1 kernel: LNetError: 44971:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 344 previous similar messages Mar 13 13:00:05 fir-io7-s1 kernel: LNetError: 44971:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 13:00:05 fir-io7-s1 kernel: LNetError: 44971:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 183 previous similar messages Mar 13 13:01:22 fir-io7-s1 kernel: Lustre: fir-OST0052: haven't heard from client 3328da11-ed85-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c7a01e03400, cur 1584129682 expire 1584129532 last 1584129455 Mar 13 13:01:22 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 13 13:01:51 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 13 13:01:51 fir-io7-s1 kernel: Lustre: Skipped 4 previous similar messages Mar 13 13:03:47 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.217@o2ib7: 0 seconds Mar 13 13:03:47 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 137 previous similar messages Mar 13 13:04:00 fir-io7-s1 kernel: LNetError: 45317:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 13 13:04:00 fir-io7-s1 kernel: LNetError: 45317:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 324 previous similar messages Mar 13 13:06:53 fir-io7-s1 kernel: Lustre: fir-OST004c: haven't heard from client d4754147-66ad-4 (at 10.50.9.27@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c73314d8000, cur 1584130013 expire 1584129863 last 1584129786 Mar 13 13:06:53 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 13 13:07:28 fir-io7-s1 kernel: Lustre: fir-OST004a: Connection restored to 2f501a16-5689-4 (at 10.50.9.27@o2ib2) Mar 13 13:07:28 fir-io7-s1 kernel: Lustre: Skipped 6 previous similar messages Mar 13 13:08:09 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 13 13:08:09 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 13 13:10:05 fir-io7-s1 kernel: LNetError: 45317:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 13:10:05 fir-io7-s1 kernel: LNetError: 45317:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 171 previous similar messages Mar 13 13:13:51 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.237@o2ib7: 3 seconds Mar 13 13:13:51 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 100 previous similar messages Mar 13 13:14:05 fir-io7-s1 kernel: LNetError: 45317:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 13 13:14:05 fir-io7-s1 kernel: LNetError: 45317:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 345 previous similar messages Mar 13 13:14:57 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 13 13:14:57 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 13 13:16:07 fir-io7-s1 kernel: Lustre: fir-OST0052: haven't heard from client 0ae5f35d-5bc6-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6cfcfb3800, cur 1584130567 expire 1584130417 last 1584130340 Mar 13 13:16:07 fir-io7-s1 kernel: Lustre: Skipped 11 previous similar messages Mar 13 13:20:05 fir-io7-s1 kernel: LNetError: 46003:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 13:20:05 fir-io7-s1 kernel: LNetError: 46003:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 196 previous similar messages Mar 13 13:22:04 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 13 13:22:04 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 13 13:24:05 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.239@o2ib7: 0 seconds Mar 13 13:24:05 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 262 previous similar messages Mar 13 13:24:05 fir-io7-s1 kernel: LNetError: 46089:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 13 13:24:05 fir-io7-s1 kernel: LNetError: 46089:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 326 previous similar messages Mar 13 13:24:50 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 13 13:24:50 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 13 13:30:05 fir-io7-s1 kernel: LNetError: 46306:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 13:30:05 fir-io7-s1 kernel: LNetError: 46306:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 203 previous similar messages Mar 13 13:34:10 fir-io7-s1 kernel: LNetError: 46306:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 13 13:34:10 fir-io7-s1 kernel: LNetError: 46306:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 354 previous similar messages Mar 13 13:34:17 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.237@o2ib7: 0 seconds Mar 13 13:34:17 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 179 previous similar messages Mar 13 13:34:47 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 13 13:34:47 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 13 13:35:34 fir-io7-s1 kernel: Lustre: fir-OST004c: haven't heard from client 1fc63cfa-a2cc-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6b34efe800, cur 1584131734 expire 1584131584 last 1584131507 Mar 13 13:35:34 fir-io7-s1 kernel: Lustre: Skipped 17 previous similar messages Mar 13 13:40:05 fir-io7-s1 kernel: LNetError: 46519:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 13:40:05 fir-io7-s1 kernel: LNetError: 46519:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 185 previous similar messages Mar 13 13:40:49 fir-io7-s1 kernel: Lustre: fir-OST004e: Connection restored to 4ecd6e0d-238a-4 (at 10.50.9.37@o2ib2) Mar 13 13:40:49 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 13 13:44:10 fir-io7-s1 kernel: LNetError: 46306:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 13 13:44:10 fir-io7-s1 kernel: LNetError: 46306:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 328 previous similar messages Mar 13 13:44:22 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.237@o2ib7: 0 seconds Mar 13 13:44:22 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 193 previous similar messages Mar 13 13:50:05 fir-io7-s1 kernel: LNetError: 46762:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 13:50:05 fir-io7-s1 kernel: LNetError: 46762:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 197 previous similar messages Mar 13 13:54:11 fir-io7-s1 kernel: LNetError: 46901:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 13 13:54:11 fir-io7-s1 kernel: LNetError: 46901:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 305 previous similar messages Mar 13 13:54:41 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.225@o2ib7: 0 seconds Mar 13 13:54:41 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 172 previous similar messages Mar 13 14:00:06 fir-io7-s1 kernel: LNetError: 46762:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 14:00:06 fir-io7-s1 kernel: LNetError: 46762:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 170 previous similar messages Mar 13 14:04:15 fir-io7-s1 kernel: LNetError: 40976:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 13 14:04:15 fir-io7-s1 kernel: LNetError: 40976:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 325 previous similar messages Mar 13 14:05:38 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.226@o2ib7: 0 seconds Mar 13 14:05:38 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 123 previous similar messages Mar 13 14:10:08 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 14:10:08 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 170 previous similar messages Mar 13 14:14:15 fir-io7-s1 kernel: LNetError: 47814:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 13 14:14:15 fir-io7-s1 kernel: LNetError: 47814:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 311 previous similar messages Mar 13 14:15:40 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.237@o2ib7: 0 seconds Mar 13 14:15:40 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 138 previous similar messages Mar 13 14:20:10 fir-io7-s1 kernel: LNetError: 47814:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 14:20:10 fir-io7-s1 kernel: LNetError: 47814:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 168 previous similar messages Mar 13 14:24:15 fir-io7-s1 kernel: LNetError: 47814:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 13 14:24:15 fir-io7-s1 kernel: LNetError: 47814:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 302 previous similar messages Mar 13 14:25:59 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.225@o2ib7: 1 seconds Mar 13 14:25:59 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 107 previous similar messages Mar 13 14:30:10 fir-io7-s1 kernel: LNetError: 48397:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 14:30:10 fir-io7-s1 kernel: LNetError: 48397:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 147 previous similar messages Mar 13 14:34:20 fir-io7-s1 kernel: LNetError: 48334:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 13 14:34:20 fir-io7-s1 kernel: LNetError: 48334:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 301 previous similar messages Mar 13 14:36:05 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.236@o2ib7: 5 seconds Mar 13 14:36:05 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 138 previous similar messages Mar 13 14:40:10 fir-io7-s1 kernel: LNetError: 48689:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 14:40:10 fir-io7-s1 kernel: LNetError: 48689:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 179 previous similar messages Mar 13 14:44:30 fir-io7-s1 kernel: LNetError: 48897:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 13 14:44:30 fir-io7-s1 kernel: LNetError: 48897:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 301 previous similar messages Mar 13 14:46:09 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 0 seconds Mar 13 14:46:09 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 206 previous similar messages Mar 13 14:46:13 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 3504afd3-d32d-4 (at 10.49.25.17@o2ib1) Mar 13 14:46:13 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 13 14:47:06 fir-io7-s1 kernel: Lustre: fir-OST0052: haven't heard from client 3504afd3-d32d-4 (at 10.49.25.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c698eb33400, cur 1584136026 expire 1584135876 last 1584135799 Mar 13 14:47:06 fir-io7-s1 kernel: Lustre: Skipped 11 previous similar messages Mar 13 14:50:10 fir-io7-s1 kernel: LNetError: 48897:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 14:50:10 fir-io7-s1 kernel: LNetError: 48897:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 193 previous similar messages Mar 13 14:54:35 fir-io7-s1 kernel: LNetError: 48897:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 13 14:54:35 fir-io7-s1 kernel: LNetError: 48897:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 306 previous similar messages Mar 13 14:56:38 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.232@o2ib7: 0 seconds Mar 13 14:56:38 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 146 previous similar messages Mar 13 15:00:10 fir-io7-s1 kernel: LNetError: 49397:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 15:00:10 fir-io7-s1 kernel: LNetError: 49397:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 158 previous similar messages Mar 13 15:04:35 fir-io7-s1 kernel: LNetError: 49594:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 13 15:04:35 fir-io7-s1 kernel: LNetError: 49594:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 313 previous similar messages Mar 13 15:06:40 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.237@o2ib7: 0 seconds Mar 13 15:06:40 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 174 previous similar messages Mar 13 15:10:10 fir-io7-s1 kernel: LNetError: 49919:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 15:10:10 fir-io7-s1 kernel: LNetError: 49919:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 149 previous similar messages Mar 13 15:14:35 fir-io7-s1 kernel: LNetError: 49919:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 13 15:14:35 fir-io7-s1 kernel: LNetError: 49919:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 314 previous similar messages Mar 13 15:16:45 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.217@o2ib7: 0 seconds Mar 13 15:16:45 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 572 previous similar messages Mar 13 15:20:15 fir-io7-s1 kernel: LNetError: 50117:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 15:20:15 fir-io7-s1 kernel: LNetError: 50117:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 160 previous similar messages Mar 13 15:24:35 fir-io7-s1 kernel: LNetError: 50352:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 13 15:24:35 fir-io7-s1 kernel: LNetError: 50352:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 340 previous similar messages Mar 13 15:26:51 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 6 seconds Mar 13 15:26:51 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 362 previous similar messages Mar 13 15:30:15 fir-io7-s1 kernel: LNetError: 50534:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 15:30:15 fir-io7-s1 kernel: LNetError: 50534:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 191 previous similar messages Mar 13 15:34:35 fir-io7-s1 kernel: LNetError: 50534:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 13 15:34:35 fir-io7-s1 kernel: LNetError: 50534:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 349 previous similar messages Mar 13 15:36:52 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.232@o2ib7: 0 seconds Mar 13 15:36:52 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 290 previous similar messages Mar 13 15:40:15 fir-io7-s1 kernel: LNetError: 50854:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 15:40:15 fir-io7-s1 kernel: LNetError: 50854:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 216 previous similar messages Mar 13 15:44:35 fir-io7-s1 kernel: LNetError: 50854:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 13 15:44:35 fir-io7-s1 kernel: LNetError: 50854:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 329 previous similar messages Mar 13 15:46:56 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.236@o2ib7: 3 seconds Mar 13 15:46:56 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 261 previous similar messages Mar 13 15:50:15 fir-io7-s1 kernel: LNetError: 50854:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 15:50:15 fir-io7-s1 kernel: LNetError: 50854:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 216 previous similar messages Mar 13 15:54:35 fir-io7-s1 kernel: LNetError: 51428:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 13 15:54:35 fir-io7-s1 kernel: LNetError: 51428:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 315 previous similar messages Mar 13 15:57:02 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.232@o2ib7: 4 seconds Mar 13 15:57:02 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 178 previous similar messages Mar 13 16:00:15 fir-io7-s1 kernel: LNetError: 51428:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 16:00:15 fir-io7-s1 kernel: LNetError: 51428:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 187 previous similar messages Mar 13 16:04:40 fir-io7-s1 kernel: LNetError: 51778:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 13 16:04:40 fir-io7-s1 kernel: LNetError: 51778:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 318 previous similar messages Mar 13 16:07:06 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.235@o2ib7: 1 seconds Mar 13 16:07:06 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 273 previous similar messages Mar 13 16:10:15 fir-io7-s1 kernel: LNetError: 51778:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 16:10:15 fir-io7-s1 kernel: LNetError: 51778:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 210 previous similar messages Mar 13 16:14:45 fir-io7-s1 kernel: LNetError: 52142:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 13 16:14:45 fir-io7-s1 kernel: LNetError: 52142:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 320 previous similar messages Mar 13 16:17:07 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.232@o2ib7: 0 seconds Mar 13 16:17:07 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 212 previous similar messages Mar 13 16:20:15 fir-io7-s1 kernel: LNetError: 52142:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 16:20:15 fir-io7-s1 kernel: LNetError: 52142:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 191 previous similar messages Mar 13 16:24:50 fir-io7-s1 kernel: LNetError: 52496:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 13 16:24:50 fir-io7-s1 kernel: LNetError: 52496:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 323 previous similar messages Mar 13 16:27:16 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.219@o2ib7: 2 seconds Mar 13 16:27:16 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 226 previous similar messages Mar 13 16:30:15 fir-io7-s1 kernel: LNetError: 52496:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 16:30:15 fir-io7-s1 kernel: LNetError: 52496:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 200 previous similar messages Mar 13 16:34:50 fir-io7-s1 kernel: LNetError: 52844:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 13 16:34:50 fir-io7-s1 kernel: LNetError: 52844:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 336 previous similar messages Mar 13 16:37:19 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.226@o2ib7: 0 seconds Mar 13 16:37:19 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 258 previous similar messages Mar 13 16:40:15 fir-io7-s1 kernel: LNetError: 52844:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 16:40:15 fir-io7-s1 kernel: LNetError: 52844:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 187 previous similar messages Mar 13 16:44:50 fir-io7-s1 kernel: LNetError: 53194:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 13 16:44:50 fir-io7-s1 kernel: LNetError: 53194:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 320 previous similar messages Mar 13 16:45:59 fir-io7-s1 kernel: Lustre: fir-OST0050: haven't heard from client d2edbd7e-4c6b-4 (at 10.50.9.37@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c54650fc400, cur 1584143159 expire 1584143009 last 1584142932 Mar 13 16:45:59 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 13 16:46:49 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 4ecd6e0d-238a-4 (at 10.50.9.37@o2ib2) Mar 13 16:46:49 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 13 16:47:21 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 5 seconds Mar 13 16:47:21 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 183 previous similar messages Mar 13 16:50:15 fir-io7-s1 kernel: LNetError: 53194:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 16:50:15 fir-io7-s1 kernel: LNetError: 53194:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 192 previous similar messages Mar 13 16:54:50 fir-io7-s1 kernel: LNetError: 53542:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 13 16:54:50 fir-io7-s1 kernel: LNetError: 53542:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 292 previous similar messages Mar 13 16:57:44 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 1 seconds Mar 13 16:57:44 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 225 previous similar messages Mar 13 17:00:15 fir-io7-s1 kernel: LNetError: 53542:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 17:00:15 fir-io7-s1 kernel: LNetError: 53542:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 176 previous similar messages Mar 13 17:02:19 fir-io7-s1 kernel: Lustre: fir-OST0050: haven't heard from client 3ab8c95c-ea7b-4 (at 10.50.9.37@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c7319cf0400, cur 1584144139 expire 1584143989 last 1584143912 Mar 13 17:02:19 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 13 17:03:09 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 4ecd6e0d-238a-4 (at 10.50.9.37@o2ib2) Mar 13 17:03:09 fir-io7-s1 kernel: Lustre: Skipped 4 previous similar messages Mar 13 17:04:55 fir-io7-s1 kernel: LNetError: 53542:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 13 17:04:55 fir-io7-s1 kernel: LNetError: 53542:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 302 previous similar messages Mar 13 17:07:46 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.233@o2ib7: 0 seconds Mar 13 17:07:46 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 169 previous similar messages Mar 13 17:10:15 fir-io7-s1 kernel: LNetError: 54073:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 17:10:15 fir-io7-s1 kernel: LNetError: 54073:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 204 previous similar messages Mar 13 17:14:55 fir-io7-s1 kernel: LNetError: 54263:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 13 17:14:55 fir-io7-s1 kernel: LNetError: 54263:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 323 previous similar messages Mar 13 17:17:50 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 1 seconds Mar 13 17:17:50 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 259 previous similar messages Mar 13 17:20:15 fir-io7-s1 kernel: LNetError: 40976:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 17:20:15 fir-io7-s1 kernel: LNetError: 40976:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 200 previous similar messages Mar 13 17:25:00 fir-io7-s1 kernel: LNetError: 54503:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 13 17:25:00 fir-io7-s1 kernel: LNetError: 54503:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 345 previous similar messages Mar 13 17:28:09 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.237@o2ib7: 0 seconds Mar 13 17:28:09 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 149 previous similar messages Mar 13 17:30:15 fir-io7-s1 kernel: LNetError: 54775:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 17:30:15 fir-io7-s1 kernel: LNetError: 54775:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 180 previous similar messages Mar 13 17:35:00 fir-io7-s1 kernel: LNetError: 54775:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 13 17:35:00 fir-io7-s1 kernel: LNetError: 54775:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 309 previous similar messages Mar 13 17:38:25 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.236@o2ib7: 0 seconds Mar 13 17:38:25 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 179 previous similar messages Mar 13 17:40:15 fir-io7-s1 kernel: LNetError: 53395:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 17:40:15 fir-io7-s1 kernel: LNetError: 53395:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 193 previous similar messages Mar 13 17:45:00 fir-io7-s1 kernel: LNetError: 55122:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 13 17:45:00 fir-io7-s1 kernel: LNetError: 55122:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 299 previous similar messages Mar 13 17:47:42 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 13 17:47:42 fir-io7-s1 kernel: Lustre: Skipped 6 previous similar messages Mar 13 17:48:31 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 0 seconds Mar 13 17:48:31 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 202 previous similar messages Mar 13 17:48:35 fir-io7-s1 kernel: Lustre: fir-OST004c: haven't heard from client 2d33399d-76f1-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c773ef65800, cur 1584146915 expire 1584146765 last 1584146688 Mar 13 17:48:35 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 13 17:48:38 fir-io7-s1 kernel: Lustre: fir-OST0052: haven't heard from client 2d33399d-76f1-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c534908a400, cur 1584146918 expire 1584146768 last 1584146691 Mar 13 17:50:17 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 17:50:17 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 188 previous similar messages Mar 13 17:55:00 fir-io7-s1 kernel: LNetError: 55475:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 13 17:55:00 fir-io7-s1 kernel: LNetError: 55475:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 301 previous similar messages Mar 13 17:56:51 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 13 17:56:51 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 13 17:57:47 fir-io7-s1 kernel: Lustre: fir-OST0052: haven't heard from client 505cc99a-257c-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c698e534800, cur 1584147467 expire 1584147317 last 1584147240 Mar 13 17:57:47 fir-io7-s1 kernel: Lustre: Skipped 4 previous similar messages Mar 13 17:58:40 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds Mar 13 17:58:40 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 194 previous similar messages Mar 13 18:00:20 fir-io7-s1 kernel: LNetError: 55822:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 18:00:20 fir-io7-s1 kernel: LNetError: 55822:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 177 previous similar messages Mar 13 18:05:05 fir-io7-s1 kernel: LNetError: 55822:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 13 18:05:05 fir-io7-s1 kernel: LNetError: 55822:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 309 previous similar messages Mar 13 18:08:51 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 5 seconds Mar 13 18:08:51 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 190 previous similar messages Mar 13 18:10:20 fir-io7-s1 kernel: LNetError: 55532:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 18:10:20 fir-io7-s1 kernel: LNetError: 55532:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 178 previous similar messages Mar 13 18:15:05 fir-io7-s1 kernel: LNetError: 56199:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 13 18:15:05 fir-io7-s1 kernel: LNetError: 56199:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 313 previous similar messages Mar 13 18:18:55 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 0 seconds Mar 13 18:18:55 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 170 previous similar messages Mar 13 18:20:20 fir-io7-s1 kernel: LNetError: 56199:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 18:20:20 fir-io7-s1 kernel: LNetError: 56199:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 183 previous similar messages Mar 13 18:25:05 fir-io7-s1 kernel: LNetError: 56199:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 13 18:25:05 fir-io7-s1 kernel: LNetError: 56199:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 376 previous similar messages Mar 13 18:29:06 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.233@o2ib7: 0 seconds Mar 13 18:29:06 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 153 previous similar messages Mar 13 18:30:20 fir-io7-s1 kernel: LNetError: 8893:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 18:30:20 fir-io7-s1 kernel: LNetError: 8893:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 183 previous similar messages Mar 13 18:35:06 fir-io7-s1 kernel: LNetError: 56899:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 13 18:35:06 fir-io7-s1 kernel: LNetError: 56899:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 364 previous similar messages Mar 13 18:39:10 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.239@o2ib7: 0 seconds Mar 13 18:39:10 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 84 previous similar messages Mar 13 18:40:21 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 18:40:21 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 160 previous similar messages Mar 13 18:45:10 fir-io7-s1 kernel: LNetError: 57443:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 13 18:45:10 fir-io7-s1 kernel: LNetError: 57443:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 384 previous similar messages Mar 13 18:49:15 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.217@o2ib7: 0 seconds Mar 13 18:49:15 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 175 previous similar messages Mar 13 18:50:24 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 18:50:24 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 164 previous similar messages Mar 13 18:55:10 fir-io7-s1 kernel: LNetError: 57443:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 13 18:55:10 fir-io7-s1 kernel: LNetError: 57443:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 398 previous similar messages Mar 13 18:59:16 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.225@o2ib7: 0 seconds Mar 13 18:59:16 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 482 previous similar messages Mar 13 19:00:25 fir-io7-s1 kernel: LNetError: 57947:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 19:00:25 fir-io7-s1 kernel: LNetError: 57947:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 181 previous similar messages Mar 13 19:05:10 fir-io7-s1 kernel: LNetError: 58142:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 13 19:05:10 fir-io7-s1 kernel: LNetError: 58142:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 411 previous similar messages Mar 13 19:09:19 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 0 seconds Mar 13 19:09:19 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 203 previous similar messages Mar 13 19:10:25 fir-io7-s1 kernel: LNetError: 58142:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 19:10:25 fir-io7-s1 kernel: LNetError: 58142:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 188 previous similar messages Mar 13 19:15:10 fir-io7-s1 kernel: LNetError: 58512:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 13 19:15:10 fir-io7-s1 kernel: LNetError: 58512:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 413 previous similar messages Mar 13 19:19:20 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.233@o2ib7: 1 seconds Mar 13 19:19:20 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 169 previous similar messages Mar 13 19:20:25 fir-io7-s1 kernel: LNetError: 58689:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 19:20:25 fir-io7-s1 kernel: LNetError: 58689:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 190 previous similar messages Mar 13 19:25:07 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 0297d18c-dff2-4 (at 10.50.5.33@o2ib2) Mar 13 19:25:07 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 13 19:25:10 fir-io7-s1 kernel: LNetError: 58689:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 13 19:25:10 fir-io7-s1 kernel: LNetError: 58689:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 411 previous similar messages Mar 13 19:25:44 fir-io7-s1 kernel: Lustre: fir-OST004e: haven't heard from client 0297d18c-dff2-4 (at 10.50.5.33@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6992e5fc00, cur 1584152744 expire 1584152594 last 1584152517 Mar 13 19:25:44 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 13 19:25:52 fir-io7-s1 kernel: Lustre: fir-OST004a: haven't heard from client 0297d18c-dff2-4 (at 10.50.5.33@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6992e5a000, cur 1584152752 expire 1584152602 last 1584152525 Mar 13 19:25:52 fir-io7-s1 kernel: Lustre: Skipped 2 previous similar messages Mar 13 19:25:54 fir-io7-s1 kernel: Lustre: fir-OST004c: haven't heard from client 0297d18c-dff2-4 (at 10.50.5.33@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c768773c800, cur 1584152754 expire 1584152604 last 1584152527 Mar 13 19:26:03 fir-io7-s1 kernel: Lustre: fir-OST0052: haven't heard from client 0297d18c-dff2-4 (at 10.50.5.33@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6992e5c000, cur 1584152763 expire 1584152613 last 1584152536 Mar 13 19:29:24 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.238@o2ib7: 0 seconds Mar 13 19:29:24 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 204 previous similar messages Mar 13 19:30:25 fir-io7-s1 kernel: LNetError: 58689:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 19:30:25 fir-io7-s1 kernel: LNetError: 58689:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 188 previous similar messages Mar 13 19:35:10 fir-io7-s1 kernel: LNetError: 59213:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 13 19:35:10 fir-io7-s1 kernel: LNetError: 59213:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 405 previous similar messages Mar 13 19:39:30 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.217@o2ib7: 0 seconds Mar 13 19:39:30 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 52 previous similar messages Mar 13 19:40:25 fir-io7-s1 kernel: LNetError: 59437:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 19:40:25 fir-io7-s1 kernel: LNetError: 59437:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 149 previous similar messages Mar 13 19:45:10 fir-io7-s1 kernel: LNetError: 59213:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 13 19:45:10 fir-io7-s1 kernel: LNetError: 59213:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 393 previous similar messages Mar 13 19:49:35 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.236@o2ib7: 0 seconds Mar 13 19:49:35 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 143 previous similar messages Mar 13 19:50:25 fir-io7-s1 kernel: LNetError: 58232:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 19:50:25 fir-io7-s1 kernel: LNetError: 58232:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 180 previous similar messages Mar 13 19:55:15 fir-io7-s1 kernel: LNetError: 59213:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 13 19:55:15 fir-io7-s1 kernel: LNetError: 59213:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 371 previous similar messages Mar 13 19:59:48 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.239@o2ib7: 0 seconds Mar 13 19:59:48 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 177 previous similar messages Mar 13 20:00:26 fir-io7-s1 kernel: LNetError: 60070:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 20:00:26 fir-io7-s1 kernel: LNetError: 60070:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 189 previous similar messages Mar 13 20:05:20 fir-io7-s1 kernel: LNetError: 60271:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 13 20:05:20 fir-io7-s1 kernel: LNetError: 60271:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 370 previous similar messages Mar 13 20:10:24 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.238@o2ib7: 0 seconds Mar 13 20:10:24 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 131 previous similar messages Mar 13 20:10:30 fir-io7-s1 kernel: LNetError: 60271:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 20:10:30 fir-io7-s1 kernel: LNetError: 60271:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 174 previous similar messages Mar 13 20:15:20 fir-io7-s1 kernel: LNetError: 60639:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 13 20:15:20 fir-io7-s1 kernel: LNetError: 60639:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 334 previous similar messages Mar 13 20:20:30 fir-io7-s1 kernel: LNetError: 60639:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 20:20:30 fir-io7-s1 kernel: LNetError: 60639:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 165 previous similar messages Mar 13 20:20:30 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 0 seconds Mar 13 20:20:30 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 191 previous similar messages Mar 13 20:25:30 fir-io7-s1 kernel: LNetError: 60991:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 13 20:25:30 fir-io7-s1 kernel: LNetError: 60991:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 315 previous similar messages Mar 13 20:29:52 fir-io7-s1 kernel: Lustre: fir-OST0052: haven't heard from client ac8d5649-97ac-4 (at 10.50.9.27@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c69929a6800, cur 1584156592 expire 1584156442 last 1584156365 Mar 13 20:30:30 fir-io7-s1 kernel: LNetError: 60991:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 20:30:30 fir-io7-s1 kernel: LNetError: 60991:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 167 previous similar messages Mar 13 20:30:33 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 0 seconds Mar 13 20:30:33 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 244 previous similar messages Mar 13 20:30:39 fir-io7-s1 kernel: Lustre: fir-OST004a: Connection restored to 2f501a16-5689-4 (at 10.50.9.27@o2ib2) Mar 13 20:30:39 fir-io7-s1 kernel: Lustre: Skipped 4 previous similar messages Mar 13 20:35:30 fir-io7-s1 kernel: LNetError: 61347:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 13 20:35:30 fir-io7-s1 kernel: LNetError: 61347:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 317 previous similar messages Mar 13 20:40:30 fir-io7-s1 kernel: LNetError: 61347:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 20:40:30 fir-io7-s1 kernel: LNetError: 61347:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 167 previous similar messages Mar 13 20:40:39 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.225@o2ib7: 1 seconds Mar 13 20:40:39 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 100 previous similar messages Mar 13 20:44:53 fir-io7-s1 kernel: Lustre: fir-OST004c: haven't heard from client a70fba3e-eb19-4 (at 10.50.9.37@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c5816afa400, cur 1584157493 expire 1584157343 last 1584157266 Mar 13 20:44:53 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 13 20:45:35 fir-io7-s1 kernel: LNetError: 61700:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 13 20:45:35 fir-io7-s1 kernel: LNetError: 61700:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 306 previous similar messages Mar 13 20:45:39 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 4ecd6e0d-238a-4 (at 10.50.9.37@o2ib2) Mar 13 20:45:39 fir-io7-s1 kernel: Lustre: Skipped 4 previous similar messages Mar 13 20:50:30 fir-io7-s1 kernel: LNetError: 61230:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 20:50:30 fir-io7-s1 kernel: LNetError: 61230:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 185 previous similar messages Mar 13 20:51:00 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.219@o2ib7: 5 seconds Mar 13 20:51:00 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 153 previous similar messages Mar 13 20:55:35 fir-io7-s1 kernel: LNetError: 61868:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 13 20:55:35 fir-io7-s1 kernel: LNetError: 61868:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 319 previous similar messages Mar 13 21:00:30 fir-io7-s1 kernel: LNetError: 62218:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 21:00:30 fir-io7-s1 kernel: LNetError: 62218:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 167 previous similar messages Mar 13 21:01:06 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 5 seconds Mar 13 21:01:06 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 566 previous similar messages Mar 13 21:05:35 fir-io7-s1 kernel: LNetError: 62218:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 13 21:05:35 fir-io7-s1 kernel: LNetError: 62218:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 332 previous similar messages Mar 13 21:10:35 fir-io7-s1 kernel: LNetError: 62589:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 21:10:35 fir-io7-s1 kernel: LNetError: 62589:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 131 previous similar messages Mar 13 21:11:17 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.233@o2ib7: 0 seconds Mar 13 21:11:17 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 617 previous similar messages Mar 13 21:15:40 fir-io7-s1 kernel: LNetError: 62589:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 13 21:15:40 fir-io7-s1 kernel: LNetError: 62589:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 318 previous similar messages Mar 13 21:20:40 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 21:20:40 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 165 previous similar messages Mar 13 21:20:51 fir-io7-s1 kernel: Lustre: fir-OST0050: haven't heard from client b0a8a07d-23fd-4 (at 10.50.9.27@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c79cd4bc400, cur 1584159651 expire 1584159501 last 1584159424 Mar 13 21:20:51 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 13 21:21:23 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 0 seconds Mar 13 21:21:23 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 459 previous similar messages Mar 13 21:21:33 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 2f501a16-5689-4 (at 10.50.9.27@o2ib2) Mar 13 21:21:33 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 13 21:25:40 fir-io7-s1 kernel: LNetError: 62940:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 13 21:25:40 fir-io7-s1 kernel: LNetError: 62940:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 304 previous similar messages Mar 13 21:30:40 fir-io7-s1 kernel: LNetError: 62439:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 21:30:40 fir-io7-s1 kernel: LNetError: 62439:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 199 previous similar messages Mar 13 21:31:25 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.237@o2ib7: 0 seconds Mar 13 21:31:25 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 254 previous similar messages Mar 13 21:35:40 fir-io7-s1 kernel: LNetError: 63582:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 13 21:35:40 fir-io7-s1 kernel: LNetError: 63582:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 306 previous similar messages Mar 13 21:40:40 fir-io7-s1 kernel: LNetError: 63582:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 21:40:40 fir-io7-s1 kernel: LNetError: 63582:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 169 previous similar messages Mar 13 21:41:26 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 1 seconds Mar 13 21:41:26 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 289 previous similar messages Mar 13 21:45:45 fir-io7-s1 kernel: LNetError: 63817:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 13 21:45:45 fir-io7-s1 kernel: LNetError: 63817:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 299 previous similar messages Mar 13 21:50:40 fir-io7-s1 kernel: LNetError: 63993:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 21:50:40 fir-io7-s1 kernel: LNetError: 63993:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 184 previous similar messages Mar 13 21:51:29 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.232@o2ib7: 0 seconds Mar 13 21:51:29 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 338 previous similar messages Mar 13 21:55:45 fir-io7-s1 kernel: LNetError: 63993:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 13 21:55:45 fir-io7-s1 kernel: LNetError: 63993:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 292 previous similar messages Mar 13 22:00:40 fir-io7-s1 kernel: LNetError: 64368:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 22:00:40 fir-io7-s1 kernel: LNetError: 64368:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 140 previous similar messages Mar 13 22:01:34 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 0 seconds Mar 13 22:01:34 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 500 previous similar messages Mar 13 22:04:39 fir-io7-s1 kernel: Lustre: fir-OST004a: haven't heard from client 777401c5-5492-4 (at 10.50.9.27@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c75a4c8a400, cur 1584162279 expire 1584162129 last 1584162052 Mar 13 22:04:39 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 13 22:05:45 fir-io7-s1 kernel: LNetError: 62087:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 13 22:05:45 fir-io7-s1 kernel: LNetError: 62087:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 327 previous similar messages Mar 13 22:05:52 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 2f501a16-5689-4 (at 10.50.9.27@o2ib2) Mar 13 22:05:52 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 13 22:10:40 fir-io7-s1 kernel: LNetError: 64097:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 22:10:40 fir-io7-s1 kernel: LNetError: 64097:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 204 previous similar messages Mar 13 22:11:50 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.233@o2ib7: 0 seconds Mar 13 22:11:50 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 320 previous similar messages Mar 13 22:13:55 fir-io7-s1 kernel: Lustre: fir-OST0052: haven't heard from client 9a311964-9cf4-4 (at 10.50.9.37@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c698f664800, cur 1584162835 expire 1584162685 last 1584162608 Mar 13 22:13:55 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 13 22:14:39 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 4ecd6e0d-238a-4 (at 10.50.9.37@o2ib2) Mar 13 22:14:39 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 13 22:15:45 fir-io7-s1 kernel: LNetError: 64701:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 13 22:15:45 fir-io7-s1 kernel: LNetError: 64701:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 305 previous similar messages Mar 13 22:20:40 fir-io7-s1 kernel: LNetError: 65021:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 22:20:40 fir-io7-s1 kernel: LNetError: 65021:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 187 previous similar messages Mar 13 22:21:59 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.227@o2ib7: 0 seconds Mar 13 22:21:59 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 360 previous similar messages Mar 13 22:25:55 fir-io7-s1 kernel: LNetError: 65179:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 13 22:25:55 fir-io7-s1 kernel: LNetError: 65179:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 304 previous similar messages Mar 13 22:30:40 fir-io7-s1 kernel: LNetError: 65532:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 22:30:40 fir-io7-s1 kernel: LNetError: 65532:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 195 previous similar messages Mar 13 22:32:00 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.217@o2ib7: 0 seconds Mar 13 22:32:00 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 423 previous similar messages Mar 13 22:35:55 fir-io7-s1 kernel: LNetError: 65532:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 13 22:35:55 fir-io7-s1 kernel: LNetError: 65532:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 313 previous similar messages Mar 13 22:40:40 fir-io7-s1 kernel: LNetError: 65850:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 22:40:40 fir-io7-s1 kernel: LNetError: 65850:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 206 previous similar messages Mar 13 22:42:02 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.239@o2ib7: 0 seconds Mar 13 22:42:02 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 264 previous similar messages Mar 13 22:45:55 fir-io7-s1 kernel: LNetError: 65890:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 13 22:45:55 fir-io7-s1 kernel: LNetError: 65890:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 313 previous similar messages Mar 13 22:50:40 fir-io7-s1 kernel: LNetError: 65220:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 22:50:40 fir-io7-s1 kernel: LNetError: 65220:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 204 previous similar messages Mar 13 22:52:06 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.232@o2ib7: 0 seconds Mar 13 22:52:06 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 273 previous similar messages Mar 13 22:55:55 fir-io7-s1 kernel: LNetError: 66247:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 13 22:55:55 fir-io7-s1 kernel: LNetError: 66247:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 317 previous similar messages Mar 13 23:00:40 fir-io7-s1 kernel: LNetError: 66138:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 23:00:40 fir-io7-s1 kernel: LNetError: 66138:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 203 previous similar messages Mar 13 23:02:11 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.232@o2ib7: 0 seconds Mar 13 23:02:11 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 276 previous similar messages Mar 13 23:05:55 fir-io7-s1 kernel: LNetError: 66247:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 13 23:05:55 fir-io7-s1 kernel: LNetError: 66247:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 318 previous similar messages Mar 13 23:10:40 fir-io7-s1 kernel: LNetError: 66547:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 23:10:40 fir-io7-s1 kernel: LNetError: 66547:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 198 previous similar messages Mar 13 23:12:16 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 5 seconds Mar 13 23:12:16 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 252 previous similar messages Mar 13 23:16:05 fir-io7-s1 kernel: LNetError: 67272:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 13 23:16:05 fir-io7-s1 kernel: LNetError: 67272:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 301 previous similar messages Mar 13 23:20:40 fir-io7-s1 kernel: LNetError: 67484:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 23:20:40 fir-io7-s1 kernel: LNetError: 67484:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 194 previous similar messages Mar 13 23:22:17 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.233@o2ib7: 0 seconds Mar 13 23:22:17 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 435 previous similar messages Mar 13 23:23:37 fir-io7-s1 kernel: Lustre: fir-OST004a: haven't heard from client ed18fc06-273e-4 (at 10.50.6.54@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c4ed18c7800, cur 1584167017 expire 1584166867 last 1584166790 Mar 13 23:23:37 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 13 23:25:20 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to a282389e-3a6d-4 (at 10.50.6.54@o2ib2) Mar 13 23:25:20 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 13 23:26:10 fir-io7-s1 kernel: LNetError: 67484:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 13 23:26:10 fir-io7-s1 kernel: LNetError: 67484:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 324 previous similar messages Mar 13 23:30:40 fir-io7-s1 kernel: LNetError: 67780:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 23:30:40 fir-io7-s1 kernel: LNetError: 67780:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 207 previous similar messages Mar 13 23:32:19 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.217@o2ib7: 0 seconds Mar 13 23:32:19 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 246 previous similar messages Mar 13 23:36:10 fir-io7-s1 kernel: LNetError: 67272:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 13 23:36:10 fir-io7-s1 kernel: LNetError: 67272:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 311 previous similar messages Mar 13 23:40:40 fir-io7-s1 kernel: LNetError: 67272:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 23:40:40 fir-io7-s1 kernel: LNetError: 67272:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 204 previous similar messages Mar 13 23:42:21 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.237@o2ib7: 5 seconds Mar 13 23:42:21 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 362 previous similar messages Mar 13 23:46:10 fir-io7-s1 kernel: LNetError: 68313:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 13 23:46:10 fir-io7-s1 kernel: LNetError: 68313:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 328 previous similar messages Mar 13 23:50:40 fir-io7-s1 kernel: LNetError: 69029:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 13 23:50:40 fir-io7-s1 kernel: LNetError: 69029:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 172 previous similar messages Mar 13 23:52:42 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.226@o2ib7: 1 seconds Mar 13 23:52:42 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 126 previous similar messages Mar 13 23:56:15 fir-io7-s1 kernel: LNetError: 69189:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 13 23:56:15 fir-io7-s1 kernel: LNetError: 69189:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 322 previous similar messages Mar 14 00:00:40 fir-io7-s1 kernel: LNetError: 69464:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 00:00:40 fir-io7-s1 kernel: LNetError: 69464:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 165 previous similar messages Mar 14 00:02:46 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.238@o2ib7: 0 seconds Mar 14 00:02:46 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 113 previous similar messages Mar 14 00:06:15 fir-io7-s1 kernel: LNetError: 69464:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 14 00:06:15 fir-io7-s1 kernel: LNetError: 69464:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 298 previous similar messages Mar 14 00:10:40 fir-io7-s1 kernel: LNetError: 69229:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 00:10:40 fir-io7-s1 kernel: LNetError: 69229:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 188 previous similar messages Mar 14 00:12:51 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 3 seconds Mar 14 00:12:51 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 358 previous similar messages Mar 14 00:16:15 fir-io7-s1 kernel: LNetError: 69736:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 14 00:16:15 fir-io7-s1 kernel: LNetError: 69736:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 324 previous similar messages Mar 14 00:20:40 fir-io7-s1 kernel: LNetError: 69418:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 00:20:40 fir-io7-s1 kernel: LNetError: 69418:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 184 previous similar messages Mar 14 00:22:55 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.237@o2ib7: 0 seconds Mar 14 00:22:55 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 304 previous similar messages Mar 14 00:26:15 fir-io7-s1 kernel: LNetError: 70394:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 14 00:26:15 fir-io7-s1 kernel: LNetError: 70394:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 327 previous similar messages Mar 14 00:30:40 fir-io7-s1 kernel: LNetError: 8893:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 00:30:40 fir-io7-s1 kernel: LNetError: 8893:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 196 previous similar messages Mar 14 00:32:56 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.236@o2ib7: 0 seconds Mar 14 00:32:56 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 218 previous similar messages Mar 14 00:36:15 fir-io7-s1 kernel: LNetError: 70394:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 14 00:36:15 fir-io7-s1 kernel: LNetError: 70394:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 337 previous similar messages Mar 14 00:40:40 fir-io7-s1 kernel: LNetError: 70810:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 00:40:40 fir-io7-s1 kernel: LNetError: 70810:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 195 previous similar messages Mar 14 00:42:59 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.225@o2ib7: 0 seconds Mar 14 00:42:59 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 259 previous similar messages Mar 14 00:46:20 fir-io7-s1 kernel: LNetError: 70810:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 14 00:46:20 fir-io7-s1 kernel: LNetError: 70810:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 308 previous similar messages Mar 14 00:50:40 fir-io7-s1 kernel: LNetError: 71294:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 00:50:40 fir-io7-s1 kernel: LNetError: 71294:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 196 previous similar messages Mar 14 00:53:11 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 0 seconds Mar 14 00:53:11 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 224 previous similar messages Mar 14 00:56:20 fir-io7-s1 kernel: LNetError: 70810:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 14 00:56:20 fir-io7-s1 kernel: LNetError: 70810:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 322 previous similar messages Mar 14 01:00:40 fir-io7-s1 kernel: LNetError: 70810:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 01:00:40 fir-io7-s1 kernel: LNetError: 70810:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 172 previous similar messages Mar 14 01:03:20 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds Mar 14 01:03:20 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 176 previous similar messages Mar 14 01:06:25 fir-io7-s1 kernel: LNetError: 70810:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 14 01:06:25 fir-io7-s1 kernel: LNetError: 70810:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 330 previous similar messages Mar 14 01:10:40 fir-io7-s1 kernel: LNetError: 71916:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 01:10:40 fir-io7-s1 kernel: LNetError: 71916:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 206 previous similar messages Mar 14 01:13:30 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.239@o2ib7: 0 seconds Mar 14 01:13:30 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 167 previous similar messages Mar 14 01:16:30 fir-io7-s1 kernel: LNetError: 71916:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 14 01:16:30 fir-io7-s1 kernel: LNetError: 71916:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 329 previous similar messages Mar 14 01:20:40 fir-io7-s1 kernel: LNetError: 71916:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 01:20:40 fir-io7-s1 kernel: LNetError: 71916:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 187 previous similar messages Mar 14 01:23:37 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.217@o2ib7: 3 seconds Mar 14 01:23:37 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 317 previous similar messages Mar 14 01:26:30 fir-io7-s1 kernel: LNetError: 72565:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 14 01:26:30 fir-io7-s1 kernel: LNetError: 72565:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 333 previous similar messages Mar 14 01:30:45 fir-io7-s1 kernel: LNetError: 62087:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 01:30:45 fir-io7-s1 kernel: LNetError: 62087:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 201 previous similar messages Mar 14 01:33:45 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 1 seconds Mar 14 01:33:45 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 334 previous similar messages Mar 14 01:36:36 fir-io7-s1 kernel: LNetError: 72565:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 14 01:36:36 fir-io7-s1 kernel: LNetError: 72565:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 354 previous similar messages Mar 14 01:40:50 fir-io7-s1 kernel: LNetError: 73021:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 01:40:50 fir-io7-s1 kernel: LNetError: 73021:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 217 previous similar messages Mar 14 01:43:57 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.232@o2ib7: 0 seconds Mar 14 01:43:57 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 389 previous similar messages Mar 14 01:46:40 fir-io7-s1 kernel: LNetError: 73187:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 14 01:46:40 fir-io7-s1 kernel: LNetError: 73187:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 351 previous similar messages Mar 14 01:50:50 fir-io7-s1 kernel: LNetError: 73447:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 01:50:50 fir-io7-s1 kernel: LNetError: 73447:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 184 previous similar messages Mar 14 01:54:04 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 1 seconds Mar 14 01:54:04 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 128 previous similar messages Mar 14 01:56:40 fir-io7-s1 kernel: LNetError: 73613:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 14 01:56:40 fir-io7-s1 kernel: LNetError: 73613:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 332 previous similar messages Mar 14 02:00:50 fir-io7-s1 kernel: LNetError: 72810:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 02:00:50 fir-io7-s1 kernel: LNetError: 72810:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 177 previous similar messages Mar 14 02:04:19 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 0 seconds Mar 14 02:04:19 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 194 previous similar messages Mar 14 02:06:50 fir-io7-s1 kernel: LNetError: 73613:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 14 02:06:50 fir-io7-s1 kernel: LNetError: 73613:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 346 previous similar messages Mar 14 02:10:55 fir-io7-s1 kernel: LNetError: 8893:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 02:10:55 fir-io7-s1 kernel: LNetError: 8893:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 205 previous similar messages Mar 14 02:14:21 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds Mar 14 02:14:21 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 664 previous similar messages Mar 14 02:16:50 fir-io7-s1 kernel: LNetError: 74095:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 14 02:16:50 fir-io7-s1 kernel: LNetError: 74095:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 326 previous similar messages Mar 14 02:21:00 fir-io7-s1 kernel: LNetError: 74446:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 02:21:00 fir-io7-s1 kernel: LNetError: 74446:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 115 previous similar messages Mar 14 02:24:27 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.225@o2ib7: 0 seconds Mar 14 02:24:27 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 590 previous similar messages Mar 14 02:26:55 fir-io7-s1 kernel: LNetError: 74446:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 14 02:26:55 fir-io7-s1 kernel: LNetError: 74446:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 331 previous similar messages Mar 14 02:31:00 fir-io7-s1 kernel: LNetError: 74446:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 02:31:00 fir-io7-s1 kernel: LNetError: 74446:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 182 previous similar messages Mar 14 02:34:29 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.238@o2ib7: 0 seconds Mar 14 02:34:29 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 517 previous similar messages Mar 14 02:36:55 fir-io7-s1 kernel: LNetError: 74446:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 14 02:36:55 fir-io7-s1 kernel: LNetError: 74446:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 341 previous similar messages Mar 14 02:41:00 fir-io7-s1 kernel: LNetError: 75157:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 02:41:00 fir-io7-s1 kernel: LNetError: 75157:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 198 previous similar messages Mar 14 02:44:40 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds Mar 14 02:44:40 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 541 previous similar messages Mar 14 02:47:00 fir-io7-s1 kernel: LNetError: 75157:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 14 02:47:00 fir-io7-s1 kernel: LNetError: 75157:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 329 previous similar messages Mar 14 02:51:00 fir-io7-s1 kernel: LNetError: 75512:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 02:51:00 fir-io7-s1 kernel: LNetError: 75512:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 166 previous similar messages Mar 14 02:54:46 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 5 seconds Mar 14 02:54:46 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 475 previous similar messages Mar 14 02:57:05 fir-io7-s1 kernel: LNetError: 75512:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 14 02:57:05 fir-io7-s1 kernel: LNetError: 75512:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 325 previous similar messages Mar 14 03:01:00 fir-io7-s1 kernel: LNetError: 75863:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 03:01:00 fir-io7-s1 kernel: LNetError: 75863:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 218 previous similar messages Mar 14 03:04:56 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 9 seconds Mar 14 03:04:56 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 491 previous similar messages Mar 14 03:07:05 fir-io7-s1 kernel: LNetError: 75863:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 14 03:07:05 fir-io7-s1 kernel: LNetError: 75863:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 315 previous similar messages Mar 14 03:11:00 fir-io7-s1 kernel: LNetError: 76231:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 03:11:00 fir-io7-s1 kernel: LNetError: 76231:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 212 previous similar messages Mar 14 03:15:01 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.233@o2ib7: 4 seconds Mar 14 03:15:01 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 433 previous similar messages Mar 14 03:17:10 fir-io7-s1 kernel: LNetError: 76231:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 14 03:17:10 fir-io7-s1 kernel: LNetError: 76231:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 306 previous similar messages Mar 14 03:21:05 fir-io7-s1 kernel: LNetError: 76583:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 03:21:05 fir-io7-s1 kernel: LNetError: 76583:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 225 previous similar messages Mar 14 03:25:06 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 8 seconds Mar 14 03:25:06 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 451 previous similar messages Mar 14 03:27:10 fir-io7-s1 kernel: LNetError: 76583:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 14 03:27:10 fir-io7-s1 kernel: LNetError: 76583:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 320 previous similar messages Mar 14 03:31:05 fir-io7-s1 kernel: LNetError: 76958:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 03:31:05 fir-io7-s1 kernel: LNetError: 76958:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 198 previous similar messages Mar 14 03:35:09 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds Mar 14 03:35:09 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 222 previous similar messages Mar 14 03:37:15 fir-io7-s1 kernel: LNetError: 77241:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 14 03:37:15 fir-io7-s1 kernel: LNetError: 77241:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 310 previous similar messages Mar 14 03:41:05 fir-io7-s1 kernel: LNetError: 77241:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 03:41:05 fir-io7-s1 kernel: LNetError: 77241:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 202 previous similar messages Mar 14 03:45:15 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.232@o2ib7: 0 seconds Mar 14 03:45:15 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 304 previous similar messages Mar 14 03:47:25 fir-io7-s1 kernel: LNetError: 77446:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 14 03:47:25 fir-io7-s1 kernel: LNetError: 77446:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 321 previous similar messages Mar 14 03:51:05 fir-io7-s1 kernel: LNetError: 77664:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 03:51:05 fir-io7-s1 kernel: LNetError: 77664:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 212 previous similar messages Mar 14 03:55:21 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.219@o2ib7: 0 seconds Mar 14 03:55:21 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 460 previous similar messages Mar 14 03:57:25 fir-io7-s1 kernel: LNetError: 77664:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 14 03:57:25 fir-io7-s1 kernel: LNetError: 77664:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 317 previous similar messages Mar 14 04:01:05 fir-io7-s1 kernel: LNetError: 78014:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 04:01:05 fir-io7-s1 kernel: LNetError: 78014:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 210 previous similar messages Mar 14 04:05:27 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds Mar 14 04:05:27 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 389 previous similar messages Mar 14 04:07:25 fir-io7-s1 kernel: LNetError: 78014:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 14 04:07:25 fir-io7-s1 kernel: LNetError: 78014:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 339 previous similar messages Mar 14 04:11:10 fir-io7-s1 kernel: LNetError: 77952:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 04:11:10 fir-io7-s1 kernel: LNetError: 77952:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 246 previous similar messages Mar 14 04:15:40 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.235@o2ib7: 0 seconds Mar 14 04:15:40 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 346 previous similar messages Mar 14 04:17:25 fir-io7-s1 kernel: LNetError: 78386:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 14 04:17:25 fir-io7-s1 kernel: LNetError: 78386:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 347 previous similar messages Mar 14 04:21:10 fir-io7-s1 kernel: LNetError: 78386:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 04:21:10 fir-io7-s1 kernel: LNetError: 78386:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 210 previous similar messages Mar 14 04:25:41 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.219@o2ib7: 0 seconds Mar 14 04:25:41 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 311 previous similar messages Mar 14 04:27:25 fir-io7-s1 kernel: LNetError: 78872:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 14 04:27:25 fir-io7-s1 kernel: LNetError: 78872:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 319 previous similar messages Mar 14 04:31:10 fir-io7-s1 kernel: LNetError: 79088:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 04:31:10 fir-io7-s1 kernel: LNetError: 79088:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 206 previous similar messages Mar 14 04:35:42 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 0 seconds Mar 14 04:35:42 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 422 previous similar messages Mar 14 04:37:25 fir-io7-s1 kernel: LNetError: 79427:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 14 04:37:25 fir-io7-s1 kernel: LNetError: 79427:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 314 previous similar messages Mar 14 04:41:10 fir-io7-s1 kernel: LNetError: 79427:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 04:41:10 fir-io7-s1 kernel: LNetError: 79427:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 210 previous similar messages Mar 14 04:45:56 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.239@o2ib7: 0 seconds Mar 14 04:45:56 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 203 previous similar messages Mar 14 04:47:25 fir-io7-s1 kernel: LNetError: 79427:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 14 04:47:25 fir-io7-s1 kernel: LNetError: 79427:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 330 previous similar messages Mar 14 04:51:10 fir-io7-s1 kernel: LNetError: 79782:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 04:51:10 fir-io7-s1 kernel: LNetError: 79782:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 195 previous similar messages Mar 14 04:55:57 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 1 seconds Mar 14 04:55:57 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 240 previous similar messages Mar 14 04:57:30 fir-io7-s1 kernel: LNetError: 79782:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 14 04:57:30 fir-io7-s1 kernel: LNetError: 79782:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 313 previous similar messages Mar 14 05:01:10 fir-io7-s1 kernel: LNetError: 80142:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 05:01:10 fir-io7-s1 kernel: LNetError: 80142:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 196 previous similar messages Mar 14 05:06:01 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.233@o2ib7: 5 seconds Mar 14 05:06:01 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 278 previous similar messages Mar 14 05:07:40 fir-io7-s1 kernel: LNetError: 80137:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 14 05:07:40 fir-io7-s1 kernel: LNetError: 80137:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 301 previous similar messages Mar 14 05:11:15 fir-io7-s1 kernel: LNetError: 80506:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 05:11:15 fir-io7-s1 kernel: LNetError: 80506:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 191 previous similar messages Mar 14 05:16:02 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.233@o2ib7: 4 seconds Mar 14 05:16:02 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 511 previous similar messages Mar 14 05:17:45 fir-io7-s1 kernel: LNetError: 80506:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 14 05:17:45 fir-io7-s1 kernel: LNetError: 80506:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 301 previous similar messages Mar 14 05:21:15 fir-io7-s1 kernel: LNetError: 80865:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 05:21:15 fir-io7-s1 kernel: LNetError: 80865:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 191 previous similar messages Mar 14 05:26:12 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 0 seconds Mar 14 05:26:12 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 442 previous similar messages Mar 14 05:27:50 fir-io7-s1 kernel: LNetError: 80865:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 14 05:27:50 fir-io7-s1 kernel: LNetError: 80865:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 297 previous similar messages Mar 14 05:31:20 fir-io7-s1 kernel: LNetError: 81215:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 05:31:20 fir-io7-s1 kernel: LNetError: 81215:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 212 previous similar messages Mar 14 05:36:25 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.236@o2ib7: 0 seconds Mar 14 05:36:25 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 336 previous similar messages Mar 14 05:37:50 fir-io7-s1 kernel: LNetError: 81496:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 14 05:37:50 fir-io7-s1 kernel: LNetError: 81496:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 296 previous similar messages Mar 14 05:41:20 fir-io7-s1 kernel: LNetError: 81496:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 05:41:20 fir-io7-s1 kernel: LNetError: 81496:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 192 previous similar messages Mar 14 05:46:26 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 5 seconds Mar 14 05:46:26 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 215 previous similar messages Mar 14 05:47:55 fir-io7-s1 kernel: LNetError: 81922:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 14 05:47:55 fir-io7-s1 kernel: LNetError: 81922:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 14 05:51:20 fir-io7-s1 kernel: LNetError: 82008:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 05:51:20 fir-io7-s1 kernel: LNetError: 82008:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 192 previous similar messages Mar 14 05:56:27 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.238@o2ib7: 3 seconds Mar 14 05:56:27 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 498 previous similar messages Mar 14 05:57:55 fir-io7-s1 kernel: LNetError: 82134:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 14 05:57:55 fir-io7-s1 kernel: LNetError: 82134:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 14 06:01:20 fir-io7-s1 kernel: LNetError: 82314:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 06:01:20 fir-io7-s1 kernel: LNetError: 82314:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 200 previous similar messages Mar 14 06:06:28 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.232@o2ib7: 0 seconds Mar 14 06:06:28 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 385 previous similar messages Mar 14 06:07:55 fir-io7-s1 kernel: LNetError: 82314:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 14 06:07:55 fir-io7-s1 kernel: LNetError: 82314:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 14 06:11:25 fir-io7-s1 kernel: LNetError: 82639:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 06:11:25 fir-io7-s1 kernel: LNetError: 82639:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 190 previous similar messages Mar 14 06:16:31 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.217@o2ib7: 1 seconds Mar 14 06:16:31 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 414 previous similar messages Mar 14 06:17:55 fir-io7-s1 kernel: LNetError: 82639:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 14 06:17:55 fir-io7-s1 kernel: LNetError: 82639:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 308 previous similar messages Mar 14 06:21:30 fir-io7-s1 kernel: LNetError: 82991:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 06:21:30 fir-io7-s1 kernel: LNetError: 82991:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 180 previous similar messages Mar 14 06:26:36 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.225@o2ib7: 1 seconds Mar 14 06:26:36 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 405 previous similar messages Mar 14 06:28:05 fir-io7-s1 kernel: LNetError: 82991:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 14 06:28:05 fir-io7-s1 kernel: LNetError: 82991:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 319 previous similar messages Mar 14 06:31:35 fir-io7-s1 kernel: LNetError: 83351:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 06:31:35 fir-io7-s1 kernel: LNetError: 83351:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 202 previous similar messages Mar 14 06:36:38 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.235@o2ib7: 0 seconds Mar 14 06:36:38 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 266 previous similar messages Mar 14 06:38:05 fir-io7-s1 kernel: LNetError: 83675:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 14 06:38:05 fir-io7-s1 kernel: LNetError: 83675:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 282 previous similar messages Mar 14 06:41:35 fir-io7-s1 kernel: LNetError: 83675:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 06:41:35 fir-io7-s1 kernel: LNetError: 83675:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 175 previous similar messages Mar 14 06:46:46 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.239@o2ib7: 0 seconds Mar 14 06:46:46 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 265 previous similar messages Mar 14 06:48:15 fir-io7-s1 kernel: LNetError: 83952:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 14 06:48:15 fir-io7-s1 kernel: LNetError: 83952:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 297 previous similar messages Mar 14 06:51:35 fir-io7-s1 kernel: LNetError: 83952:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 06:51:35 fir-io7-s1 kernel: LNetError: 83952:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 170 previous similar messages Mar 14 06:56:57 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.235@o2ib7: 2 seconds Mar 14 06:56:57 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 312 previous similar messages Mar 14 06:58:15 fir-io7-s1 kernel: LNetError: 84166:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 14 06:58:15 fir-io7-s1 kernel: LNetError: 84166:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 14 07:01:35 fir-io7-s1 kernel: LNetError: 82727:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 07:01:35 fir-io7-s1 kernel: LNetError: 82727:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 198 previous similar messages Mar 14 07:07:00 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.239@o2ib7: 0 seconds Mar 14 07:07:00 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 205 previous similar messages Mar 14 07:08:20 fir-io7-s1 kernel: LNetError: 84394:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 14 07:08:20 fir-io7-s1 kernel: LNetError: 84394:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 308 previous similar messages Mar 14 07:11:36 fir-io7-s1 kernel: LNetError: 84773:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 07:11:36 fir-io7-s1 kernel: LNetError: 84773:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 193 previous similar messages Mar 14 07:17:01 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.233@o2ib7: 4 seconds Mar 14 07:17:01 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 346 previous similar messages Mar 14 07:18:21 fir-io7-s1 kernel: LNetError: 84773:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 14 07:18:21 fir-io7-s1 kernel: LNetError: 84773:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 305 previous similar messages Mar 14 07:21:38 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 07:21:38 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 190 previous similar messages Mar 14 07:27:03 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.217@o2ib7: 1 seconds Mar 14 07:27:03 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 432 previous similar messages Mar 14 07:28:30 fir-io7-s1 kernel: LNetError: 85127:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 14 07:28:30 fir-io7-s1 kernel: LNetError: 85127:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 308 previous similar messages Mar 14 07:31:40 fir-io7-s1 kernel: LNetError: 85502:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 07:31:40 fir-io7-s1 kernel: LNetError: 85502:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 185 previous similar messages Mar 14 07:37:10 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.239@o2ib7: 5 seconds Mar 14 07:37:10 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 421 previous similar messages Mar 14 07:38:40 fir-io7-s1 kernel: LNetError: 62087:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 14 07:38:40 fir-io7-s1 kernel: LNetError: 62087:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 300 previous similar messages Mar 14 07:41:40 fir-io7-s1 kernel: LNetError: 85707:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 07:41:40 fir-io7-s1 kernel: LNetError: 85707:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 209 previous similar messages Mar 14 07:47:13 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 0 seconds Mar 14 07:47:13 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 423 previous similar messages Mar 14 07:48:40 fir-io7-s1 kernel: LNetError: 85966:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 14 07:48:40 fir-io7-s1 kernel: LNetError: 85966:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 307 previous similar messages Mar 14 07:51:40 fir-io7-s1 kernel: LNetError: 85966:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 07:51:40 fir-io7-s1 kernel: LNetError: 85966:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 208 previous similar messages Mar 14 07:57:14 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 0 seconds Mar 14 07:57:14 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 241 previous similar messages Mar 14 07:58:50 fir-io7-s1 kernel: LNetError: 85966:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 14 07:58:50 fir-io7-s1 kernel: LNetError: 85966:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 298 previous similar messages Mar 14 08:01:45 fir-io7-s1 kernel: LNetError: 86573:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 08:01:45 fir-io7-s1 kernel: LNetError: 86573:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 185 previous similar messages Mar 14 08:07:20 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 1 seconds Mar 14 08:07:20 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 489 previous similar messages Mar 14 08:08:50 fir-io7-s1 kernel: LNetError: 86862:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 14 08:08:50 fir-io7-s1 kernel: LNetError: 86862:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 306 previous similar messages Mar 14 08:11:45 fir-io7-s1 kernel: LNetError: 86862:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 08:11:45 fir-io7-s1 kernel: LNetError: 86862:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 212 previous similar messages Mar 14 08:17:30 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 6 seconds Mar 14 08:17:30 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 417 previous similar messages Mar 14 08:18:55 fir-io7-s1 kernel: LNetError: 87045:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 14 08:18:55 fir-io7-s1 kernel: LNetError: 87045:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 293 previous similar messages Mar 14 08:21:45 fir-io7-s1 kernel: LNetError: 87342:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 08:21:45 fir-io7-s1 kernel: LNetError: 87342:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 193 previous similar messages Mar 14 08:27:40 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 6 seconds Mar 14 08:27:40 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 471 previous similar messages Mar 14 08:29:00 fir-io7-s1 kernel: LNetError: 87342:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 14 08:29:00 fir-io7-s1 kernel: LNetError: 87342:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 312 previous similar messages Mar 14 08:31:45 fir-io7-s1 kernel: LNetError: 87342:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 08:31:45 fir-io7-s1 kernel: LNetError: 87342:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 205 previous similar messages Mar 14 08:37:45 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 0 seconds Mar 14 08:37:45 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 377 previous similar messages Mar 14 08:39:00 fir-io7-s1 kernel: LNetError: 87342:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 14 08:39:00 fir-io7-s1 kernel: LNetError: 87342:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 308 previous similar messages Mar 14 08:41:45 fir-io7-s1 kernel: LNetError: 87740:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 08:41:45 fir-io7-s1 kernel: LNetError: 87740:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 202 previous similar messages Mar 14 08:47:46 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.217@o2ib7: 1 seconds Mar 14 08:47:46 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 376 previous similar messages Mar 14 08:49:00 fir-io7-s1 kernel: LNetError: 88029:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 14 08:49:00 fir-io7-s1 kernel: LNetError: 88029:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 345 previous similar messages Mar 14 08:51:45 fir-io7-s1 kernel: LNetError: 88400:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 08:51:45 fir-io7-s1 kernel: LNetError: 88400:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 149 previous similar messages Mar 14 08:56:52 fir-io7-s1 kernel: LustreError: 87564:0:(tgt_grant.c:758:tgt_grant_check()) fir-OST004e: cli b856828c-c4b4-4 claims 393216 GRANT, real grant 0 Mar 14 08:57:54 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 0 seconds Mar 14 08:57:54 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 488 previous similar messages Mar 14 08:59:05 fir-io7-s1 kernel: LNetError: 88387:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 14 08:59:05 fir-io7-s1 kernel: LNetError: 88387:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 315 previous similar messages Mar 14 09:01:50 fir-io7-s1 kernel: LNetError: 88740:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 09:01:50 fir-io7-s1 kernel: LNetError: 88740:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 150 previous similar messages Mar 14 09:07:56 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.235@o2ib7: 6 seconds Mar 14 09:07:56 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 584 previous similar messages Mar 14 09:09:15 fir-io7-s1 kernel: LNetError: 88740:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 14 09:09:15 fir-io7-s1 kernel: LNetError: 88740:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 324 previous similar messages Mar 14 09:11:50 fir-io7-s1 kernel: LNetError: 89102:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 09:11:50 fir-io7-s1 kernel: LNetError: 89102:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 133 previous similar messages Mar 14 09:17:57 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.239@o2ib7: 1 seconds Mar 14 09:17:57 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 542 previous similar messages Mar 14 09:19:15 fir-io7-s1 kernel: LNetError: 89102:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 14 09:19:15 fir-io7-s1 kernel: LNetError: 89102:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 350 previous similar messages Mar 14 09:21:50 fir-io7-s1 kernel: LNetError: 89451:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 09:21:50 fir-io7-s1 kernel: LNetError: 89451:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 233 previous similar messages Mar 14 09:27:59 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.217@o2ib7: 0 seconds Mar 14 09:27:59 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 461 previous similar messages Mar 14 09:29:15 fir-io7-s1 kernel: LNetError: 89451:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 14 09:29:15 fir-io7-s1 kernel: LNetError: 89451:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 354 previous similar messages Mar 14 09:31:50 fir-io7-s1 kernel: LNetError: 89803:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 09:31:50 fir-io7-s1 kernel: LNetError: 89803:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 244 previous similar messages Mar 14 09:38:01 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.232@o2ib7: 2 seconds Mar 14 09:38:01 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 539 previous similar messages Mar 14 09:39:15 fir-io7-s1 kernel: LNetError: 90148:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 14 09:39:15 fir-io7-s1 kernel: LNetError: 90148:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 343 previous similar messages Mar 14 09:41:50 fir-io7-s1 kernel: LNetError: 89780:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 09:41:50 fir-io7-s1 kernel: LNetError: 89780:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 221 previous similar messages Mar 14 09:48:02 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.238@o2ib7: 1 seconds Mar 14 09:48:02 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 448 previous similar messages Mar 14 09:49:15 fir-io7-s1 kernel: LNetError: 90345:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 14 09:49:15 fir-io7-s1 kernel: LNetError: 90345:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 361 previous similar messages Mar 14 09:51:50 fir-io7-s1 kernel: LNetError: 90361:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 09:51:50 fir-io7-s1 kernel: LNetError: 90361:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 209 previous similar messages Mar 14 09:58:07 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.237@o2ib7: 0 seconds Mar 14 09:58:07 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 274 previous similar messages Mar 14 09:59:25 fir-io7-s1 kernel: LNetError: 90509:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 14 09:59:25 fir-io7-s1 kernel: LNetError: 90509:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 339 previous similar messages Mar 14 10:01:50 fir-io7-s1 kernel: LNetError: 90584:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 10:01:50 fir-io7-s1 kernel: LNetError: 90584:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 232 previous similar messages Mar 14 10:08:15 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds Mar 14 10:08:15 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 275 previous similar messages Mar 14 10:09:35 fir-io7-s1 kernel: LNetError: 91136:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 14 10:09:35 fir-io7-s1 kernel: LNetError: 91136:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 381 previous similar messages Mar 14 10:11:50 fir-io7-s1 kernel: LNetError: 87318:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 10:11:50 fir-io7-s1 kernel: LNetError: 87318:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 207 previous similar messages Mar 14 10:18:21 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 5 seconds Mar 14 10:18:21 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 350 previous similar messages Mar 14 10:19:45 fir-io7-s1 kernel: LNetError: 91136:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 14 10:19:45 fir-io7-s1 kernel: LNetError: 91136:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 329 previous similar messages Mar 14 10:21:50 fir-io7-s1 kernel: LNetError: 91631:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 10:21:50 fir-io7-s1 kernel: LNetError: 91631:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 232 previous similar messages Mar 14 10:25:11 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 14 10:25:11 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 10:26:22 fir-io7-s1 kernel: Lustre: fir-OST004e: haven't heard from client 2a3cca4f-5eaa-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c7787165800, cur 1584206782 expire 1584206632 last 1584206555 Mar 14 10:26:22 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 10:28:22 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds Mar 14 10:28:22 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 461 previous similar messages Mar 14 10:29:50 fir-io7-s1 kernel: LNetError: 91826:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 14 10:29:50 fir-io7-s1 kernel: LNetError: 91826:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 349 previous similar messages Mar 14 10:31:47 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 14 10:31:47 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 10:31:50 fir-io7-s1 kernel: LNetError: 92035:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 10:31:50 fir-io7-s1 kernel: LNetError: 92035:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 223 previous similar messages Mar 14 10:32:44 fir-io7-s1 kernel: Lustre: fir-OST0050: haven't heard from client 7f5ea023-25bd-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6bfe4cc800, cur 1584207164 expire 1584207014 last 1584206937 Mar 14 10:32:44 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 10:38:24 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.226@o2ib7: 0 seconds Mar 14 10:38:24 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 169 previous similar messages Mar 14 10:39:45 fir-io7-s1 kernel: Lustre: fir-OST004a: haven't heard from client 39179d7c-c449-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c79be0c9400, cur 1584207585 expire 1584207435 last 1584207358 Mar 14 10:39:45 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 10:39:50 fir-io7-s1 kernel: LNetError: 92035:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 14 10:39:50 fir-io7-s1 kernel: LNetError: 92035:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 322 previous similar messages Mar 14 10:41:14 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 14 10:41:14 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 10:41:50 fir-io7-s1 kernel: LNetError: 92295:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 10:41:50 fir-io7-s1 kernel: LNetError: 92295:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 200 previous similar messages Mar 14 10:48:25 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.219@o2ib7: 0 seconds Mar 14 10:48:25 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 148 previous similar messages Mar 14 10:49:50 fir-io7-s1 kernel: LNetError: 92336:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 14 10:49:50 fir-io7-s1 kernel: LNetError: 92336:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 364 previous similar messages Mar 14 10:51:21 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 14 10:51:21 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 10:51:50 fir-io7-s1 kernel: LNetError: 92688:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 10:51:50 fir-io7-s1 kernel: LNetError: 92688:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 180 previous similar messages Mar 14 10:52:34 fir-io7-s1 kernel: Lustre: fir-OST004c: haven't heard from client 0240e951-418a-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6990b8a000, cur 1584208354 expire 1584208204 last 1584208127 Mar 14 10:52:34 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 10:58:26 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 6 seconds Mar 14 10:58:26 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 194 previous similar messages Mar 14 10:59:19 fir-io7-s1 kernel: Lustre: fir-OST0050: haven't heard from client 373e6943-bbe7-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6992d3e400, cur 1584208759 expire 1584208609 last 1584208532 Mar 14 10:59:19 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 10:59:55 fir-io7-s1 kernel: LNetError: 92890:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 14 10:59:55 fir-io7-s1 kernel: LNetError: 92890:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 382 previous similar messages Mar 14 11:00:24 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 14 11:00:24 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 11:01:55 fir-io7-s1 kernel: LNetError: 92890:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 11:01:55 fir-io7-s1 kernel: LNetError: 92890:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 191 previous similar messages Mar 14 11:08:33 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.232@o2ib7: 0 seconds Mar 14 11:08:33 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 136 previous similar messages Mar 14 11:09:55 fir-io7-s1 kernel: LNetError: 93144:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 14 11:09:55 fir-io7-s1 kernel: LNetError: 93144:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 390 previous similar messages Mar 14 11:10:43 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 14 11:10:43 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 11:11:44 fir-io7-s1 kernel: Lustre: fir-OST0052: haven't heard from client 31faaac2-0473-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c7c1a9b9c00, cur 1584209504 expire 1584209354 last 1584209277 Mar 14 11:11:44 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 11:11:55 fir-io7-s1 kernel: LNetError: 93438:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 11:11:55 fir-io7-s1 kernel: LNetError: 93438:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 151 previous similar messages Mar 14 11:18:34 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 0 seconds Mar 14 11:18:34 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 584 previous similar messages Mar 14 11:20:00 fir-io7-s1 kernel: LNetError: 93750:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 14 11:20:00 fir-io7-s1 kernel: LNetError: 93750:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 343 previous similar messages Mar 14 11:21:55 fir-io7-s1 kernel: LNetError: 93750:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 11:21:55 fir-io7-s1 kernel: LNetError: 93750:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 144 previous similar messages Mar 14 11:22:23 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 14 11:22:23 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 11:23:25 fir-io7-s1 kernel: LustreError: 68657:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) ### client (nid 10.49.26.4@o2ib1) returned error from glimpse AST (req@ffff9c4a4250cc80 x1652475666163248 status -107 rc -107), evict it ns: filter-fir-OST0052_UUID lock: ffff9c8435ea0b40/0x3bd9b8527c7df398 lrc: 3/0,0 mode: PW/PW res: [0x1780000402:0x1105f13:0x0].0x0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->134217727) flags: 0x40000080000000 nid: 10.49.26.4@o2ib1 remote: 0xf12f8729ac94e6d3 expref: 5 pid: 68572 timeout: 0 lvb_type: 0 Mar 14 11:23:25 fir-io7-s1 kernel: LustreError: 138-a: fir-OST0052: A client on nid 10.49.26.4@o2ib1 was evicted due to a lock glimpse callback time out: rc -107 Mar 14 11:23:25 fir-io7-s1 kernel: LustreError: 66897:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 1584210205s: evicting client at 10.49.26.4@o2ib1 ns: filter-fir-OST0052_UUID lock: ffff9c8435ea0b40/0x3bd9b8527c7df398 lrc: 3/0,0 mode: PW/PW res: [0x1780000402:0x1105f13:0x0].0x0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->134217727) flags: 0x40000080000000 nid: 10.49.26.4@o2ib1 remote: 0xf12f8729ac94e6d3 expref: 6 pid: 68572 timeout: 0 lvb_type: 0 Mar 14 11:23:43 fir-io7-s1 kernel: Lustre: fir-OST004e: haven't heard from client be14d42b-05d4-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6990b3b400, cur 1584210223 expire 1584210073 last 1584209996 Mar 14 11:23:43 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 11:28:01 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 14 11:28:01 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 11:28:36 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds Mar 14 11:28:36 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 718 previous similar messages Mar 14 11:29:06 fir-io7-s1 kernel: Lustre: fir-OST004a: haven't heard from client cdb8004e-5753-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6bfe4cb000, cur 1584210546 expire 1584210396 last 1584210319 Mar 14 11:29:06 fir-io7-s1 kernel: Lustre: Skipped 4 previous similar messages Mar 14 11:30:00 fir-io7-s1 kernel: LNetError: 93957:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 14 11:30:00 fir-io7-s1 kernel: LNetError: 93957:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 363 previous similar messages Mar 14 11:31:55 fir-io7-s1 kernel: LNetError: 93957:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 11:31:55 fir-io7-s1 kernel: LNetError: 93957:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 166 previous similar messages Mar 14 11:36:00 fir-io7-s1 kernel: Lustre: fir-OST0052: haven't heard from client f5a7b0f4-ff8a-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c7ee28fd400, cur 1584210960 expire 1584210810 last 1584210733 Mar 14 11:36:00 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 11:37:16 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 14 11:37:16 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 11:38:39 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.232@o2ib7: 0 seconds Mar 14 11:38:39 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 474 previous similar messages Mar 14 11:40:00 fir-io7-s1 kernel: LNetError: 94258:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 14 11:40:00 fir-io7-s1 kernel: LNetError: 94258:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 366 previous similar messages Mar 14 11:41:55 fir-io7-s1 kernel: LNetError: 94476:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 11:41:55 fir-io7-s1 kernel: LNetError: 94476:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 188 previous similar messages Mar 14 11:46:51 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 14 11:46:51 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 11:48:10 fir-io7-s1 kernel: Lustre: fir-OST004a: haven't heard from client 77760abe-efee-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c7ae1d7a400, cur 1584211690 expire 1584211540 last 1584211463 Mar 14 11:48:10 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 11:48:41 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.233@o2ib7: 2 seconds Mar 14 11:48:41 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 268 previous similar messages Mar 14 11:50:05 fir-io7-s1 kernel: LNetError: 94703:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 14 11:50:05 fir-io7-s1 kernel: LNetError: 94703:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 337 previous similar messages Mar 14 11:52:00 fir-io7-s1 kernel: LNetError: 94703:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 11:52:00 fir-io7-s1 kernel: LNetError: 94703:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 207 previous similar messages Mar 14 11:52:31 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 14 11:52:31 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 11:53:35 fir-io7-s1 kernel: Lustre: fir-OST0052: haven't heard from client 6883dad6-28e9-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c88d07db000, cur 1584212015 expire 1584211865 last 1584211788 Mar 14 11:53:35 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 11:58:42 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.227@o2ib7: 2 seconds Mar 14 11:58:42 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 359 previous similar messages Mar 14 11:59:35 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 14 11:59:35 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 12:00:10 fir-io7-s1 kernel: LNetError: 94904:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 14 12:00:10 fir-io7-s1 kernel: LNetError: 94904:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 296 previous similar messages Mar 14 12:00:30 fir-io7-s1 kernel: Lustre: fir-OST0052: haven't heard from client c271795f-0b97-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c8a3e985000, cur 1584212430 expire 1584212280 last 1584212203 Mar 14 12:00:30 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 12:02:00 fir-io7-s1 kernel: LNetError: 95195:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 12:02:00 fir-io7-s1 kernel: LNetError: 95195:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 218 previous similar messages Mar 14 12:07:34 fir-io7-s1 kernel: Lustre: fir-OST004a: haven't heard from client ec83caf2-c049-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c7ae1d7cc00, cur 1584212854 expire 1584212704 last 1584212627 Mar 14 12:07:34 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 12:08:22 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 14 12:08:22 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 12:08:46 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 6 seconds Mar 14 12:08:46 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 334 previous similar messages Mar 14 12:10:20 fir-io7-s1 kernel: LNetError: 95195:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 14 12:10:20 fir-io7-s1 kernel: LNetError: 95195:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 304 previous similar messages Mar 14 12:12:00 fir-io7-s1 kernel: LNetError: 95562:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 12:12:00 fir-io7-s1 kernel: LNetError: 95562:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 203 previous similar messages Mar 14 12:18:29 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 14 12:18:29 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 12:18:51 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.239@o2ib7: 6 seconds Mar 14 12:18:51 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 423 previous similar messages Mar 14 12:19:42 fir-io7-s1 kernel: Lustre: fir-OST004e: haven't heard from client 6d61d4f6-6e43-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c698e9be000, cur 1584213582 expire 1584213432 last 1584213355 Mar 14 12:19:42 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 12:20:25 fir-io7-s1 kernel: LNetError: 95562:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 14 12:20:25 fir-io7-s1 kernel: LNetError: 95562:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 318 previous similar messages Mar 14 12:22:00 fir-io7-s1 kernel: LNetError: 95650:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 12:22:00 fir-io7-s1 kernel: LNetError: 95650:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 206 previous similar messages Mar 14 12:28:54 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 0 seconds Mar 14 12:28:54 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 373 previous similar messages Mar 14 12:30:25 fir-io7-s1 kernel: LNetError: 95923:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 14 12:30:25 fir-io7-s1 kernel: LNetError: 95923:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 300 previous similar messages Mar 14 12:32:00 fir-io7-s1 kernel: LNetError: 96275:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 12:32:00 fir-io7-s1 kernel: LNetError: 96275:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 218 previous similar messages Mar 14 12:38:54 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.233@o2ib7: 0 seconds Mar 14 12:38:54 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 321 previous similar messages Mar 14 12:40:25 fir-io7-s1 kernel: LNetError: 96586:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 14 12:40:25 fir-io7-s1 kernel: LNetError: 96586:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 303 previous similar messages Mar 14 12:42:00 fir-io7-s1 kernel: LNetError: 96586:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 12:42:00 fir-io7-s1 kernel: LNetError: 96586:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 181 previous similar messages Mar 14 12:44:40 fir-io7-s1 kernel: Lustre: 101323:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1584215069/real 1584215069] req@ffff9c4a49061f80 x1652475672590336/t0(0) o106->fir-OST0048@10.49.26.4@o2ib1:15/16 lens 296/280 e 0 to 1 dl 1584215080 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Mar 14 12:44:40 fir-io7-s1 kernel: Lustre: 101323:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 3 previous similar messages Mar 14 12:45:02 fir-io7-s1 kernel: Lustre: 101323:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1584215091/real 1584215091] req@ffff9c4a49061f80 x1652475672590336/t0(0) o106->fir-OST0048@10.49.26.4@o2ib1:15/16 lens 296/280 e 0 to 1 dl 1584215102 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Mar 14 12:45:02 fir-io7-s1 kernel: Lustre: 101323:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 1 previous similar message Mar 14 12:45:35 fir-io7-s1 kernel: Lustre: 101323:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1584215124/real 1584215124] req@ffff9c4a49061f80 x1652475672590336/t0(0) o106->fir-OST0048@10.49.26.4@o2ib1:15/16 lens 296/280 e 0 to 1 dl 1584215135 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Mar 14 12:45:35 fir-io7-s1 kernel: Lustre: 101323:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Mar 14 12:45:35 fir-io7-s1 kernel: LustreError: 101323:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) ### client (nid 10.49.26.4@o2ib1) returned error from glimpse AST (req@ffff9c4a49061f80 x1652475672590336 status -107 rc -107), evict it ns: filter-fir-OST0048_UUID lock: ffff9c5cce5921c0/0x3bd9b8527d087ee6 lrc: 3/0,0 mode: PW/PW res: [0x1500000402:0x68ed08:0x0].0x0 rrc: 3 type: EXT [0->18446744073709551615] (req 34359738368->18446744073709551615) flags: 0x40000080000000 nid: 10.49.26.4@o2ib1 remote: 0x29572d611fea9fda expref: 5 pid: 68826 timeout: 0 lvb_type: 0 Mar 14 12:45:35 fir-io7-s1 kernel: LustreError: 138-a: fir-OST0048: A client on nid 10.49.26.4@o2ib1 was evicted due to a lock glimpse callback time out: rc -107 Mar 14 12:45:35 fir-io7-s1 kernel: LustreError: 66897:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 1584215135s: evicting client at 10.49.26.4@o2ib1 ns: filter-fir-OST0048_UUID lock: ffff9c5cce5921c0/0x3bd9b8527d087ee6 lrc: 3/0,0 mode: PW/PW res: [0x1500000402:0x68ed08:0x0].0x0 rrc: 2 type: EXT [0->18446744073709551615] (req 34359738368->18446744073709551615) flags: 0x40000080000000 nid: 10.49.26.4@o2ib1 remote: 0x29572d611fea9fda expref: 6 pid: 68826 timeout: 0 lvb_type: 0 Mar 14 12:45:35 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 14 12:45:35 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 12:45:41 fir-io7-s1 kernel: LustreError: 85257:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) ### client (nid 10.49.26.4@o2ib1) returned error from glimpse AST (req@ffff9c4a3ff1f980 x1652475672688144 status -107 rc -107), evict it ns: filter-fir-OST0052_UUID lock: ffff9c6395faa1c0/0x3bd9b8527d08db13 lrc: 3/0,0 mode: PW/PW res: [0x1780000401:0x769e4a:0x0].0x0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->8191) flags: 0x40000000020000 nid: 10.49.26.4@o2ib1 remote: 0x29572d611feaabb8 expref: 8 pid: 68636 timeout: 0 lvb_type: 0 Mar 14 12:45:41 fir-io7-s1 kernel: LustreError: 138-a: fir-OST0052: A client on nid 10.49.26.4@o2ib1 was evicted due to a lock glimpse callback time out: rc -107 Mar 14 12:45:41 fir-io7-s1 kernel: LustreError: 66897:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 1584215141s: evicting client at 10.49.26.4@o2ib1 ns: filter-fir-OST0052_UUID lock: ffff9c6395faa1c0/0x3bd9b8527d08db13 lrc: 3/0,0 mode: PW/PW res: [0x1780000401:0x769e4a:0x0].0x0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->8191) flags: 0x40000000020000 nid: 10.49.26.4@o2ib1 remote: 0x29572d611feaabb8 expref: 9 pid: 68636 timeout: 0 lvb_type: 0 Mar 14 12:46:12 fir-io7-s1 kernel: Lustre: fir-OST004a: haven't heard from client 96e36c80-37e9-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c7df1067c00, cur 1584215172 expire 1584215022 last 1584214945 Mar 14 12:46:12 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 12:46:32 fir-io7-s1 kernel: Lustre: fir-OST0050: haven't heard from client 96e36c80-37e9-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6990a1cc00, cur 1584215192 expire 1584215042 last 1584214965 Mar 14 12:46:47 fir-io7-s1 kernel: Lustre: fir-OST004e: haven't heard from client 96e36c80-37e9-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c7ae1d7bc00, cur 1584215207 expire 1584215057 last 1584214980 Mar 14 12:49:01 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.233@o2ib7: 6 seconds Mar 14 12:49:01 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 461 previous similar messages Mar 14 12:50:30 fir-io7-s1 kernel: LNetError: 96586:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 14 12:50:30 fir-io7-s1 kernel: LNetError: 96586:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 290 previous similar messages Mar 14 12:52:00 fir-io7-s1 kernel: LNetError: 96978:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 12:52:00 fir-io7-s1 kernel: LNetError: 96978:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 210 previous similar messages Mar 14 12:59:02 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.236@o2ib7: 11 seconds Mar 14 12:59:02 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 378 previous similar messages Mar 14 12:59:25 fir-io7-s1 kernel: Lustre: fir-OST004e: haven't heard from client 3d1bc114-5c52-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c698e626000, cur 1584215965 expire 1584215815 last 1584215738 Mar 14 12:59:25 fir-io7-s1 kernel: Lustre: Skipped 1 previous similar message Mar 14 12:59:50 fir-io7-s1 kernel: Lustre: fir-OST004a: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 14 12:59:50 fir-io7-s1 kernel: Lustre: Skipped 4 previous similar messages Mar 14 13:00:30 fir-io7-s1 kernel: LNetError: 96978:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 14 13:00:30 fir-io7-s1 kernel: LNetError: 96978:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 296 previous similar messages Mar 14 13:02:05 fir-io7-s1 kernel: LNetError: 97326:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 13:02:05 fir-io7-s1 kernel: LNetError: 97326:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 214 previous similar messages Mar 14 13:09:05 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.219@o2ib7: 0 seconds Mar 14 13:09:05 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 522 previous similar messages Mar 14 13:10:40 fir-io7-s1 kernel: LNetError: 97579:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 14 13:10:40 fir-io7-s1 kernel: LNetError: 97579:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 301 previous similar messages Mar 14 13:12:05 fir-io7-s1 kernel: LNetError: 97579:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 13:12:05 fir-io7-s1 kernel: LNetError: 97579:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 167 previous similar messages Mar 14 13:13:42 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 14 13:13:42 fir-io7-s1 kernel: Lustre: Skipped 4 previous similar messages Mar 14 13:14:05 fir-io7-s1 kernel: Lustre: fir-OST0052: haven't heard from client 573949f8-4e3c-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6990f70400, cur 1584216845 expire 1584216695 last 1584216618 Mar 14 13:14:05 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 13:14:11 fir-io7-s1 kernel: Lustre: fir-OST004a: haven't heard from client 573949f8-4e3c-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c79ff2aa400, cur 1584216851 expire 1584216701 last 1584216624 Mar 14 13:14:11 fir-io7-s1 kernel: Lustre: Skipped 4 previous similar messages Mar 14 13:19:19 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 0 seconds Mar 14 13:19:19 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 454 previous similar messages Mar 14 13:20:40 fir-io7-s1 kernel: LNetError: 97579:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 14 13:20:40 fir-io7-s1 kernel: LNetError: 97579:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 290 previous similar messages Mar 14 13:22:05 fir-io7-s1 kernel: LNetError: 98073:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 13:22:05 fir-io7-s1 kernel: LNetError: 98073:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 168 previous similar messages Mar 14 13:22:10 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 14 13:22:10 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 13:23:22 fir-io7-s1 kernel: Lustre: fir-OST0050: haven't heard from client b2dac5d0-7f05-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c699221c400, cur 1584217402 expire 1584217252 last 1584217175 Mar 14 13:29:21 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.233@o2ib7: 1 seconds Mar 14 13:29:21 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 368 previous similar messages Mar 14 13:30:40 fir-io7-s1 kernel: LNetError: 98073:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 14 13:30:40 fir-io7-s1 kernel: LNetError: 98073:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 301 previous similar messages Mar 14 13:31:00 fir-io7-s1 kernel: Lustre: fir-OST0048: haven't heard from client d3fcd5aa-a399-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c70682c5000, cur 1584217860 expire 1584217710 last 1584217633 Mar 14 13:31:00 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 13:32:05 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 14 13:32:05 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 13:32:05 fir-io7-s1 kernel: LNetError: 97438:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 13:32:05 fir-io7-s1 kernel: LNetError: 97438:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 214 previous similar messages Mar 14 13:39:26 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.239@o2ib7: 0 seconds Mar 14 13:39:26 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 476 previous similar messages Mar 14 13:40:40 fir-io7-s1 kernel: LNetError: 8893:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 14 13:40:40 fir-io7-s1 kernel: LNetError: 8893:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 293 previous similar messages Mar 14 13:42:05 fir-io7-s1 kernel: LNetError: 98425:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 13:42:05 fir-io7-s1 kernel: LNetError: 98425:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 186 previous similar messages Mar 14 13:44:40 fir-io7-s1 kernel: Lustre: fir-OST004e: haven't heard from client a9d140b4-373f-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c7c1a9b9800, cur 1584218680 expire 1584218530 last 1584218453 Mar 14 13:44:40 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 13:45:22 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 14 13:45:22 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 13:49:31 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 6 seconds Mar 14 13:49:31 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 469 previous similar messages Mar 14 13:50:40 fir-io7-s1 kernel: LNetError: 98847:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 14 13:50:40 fir-io7-s1 kernel: LNetError: 98847:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 14 13:52:05 fir-io7-s1 kernel: LNetError: 97438:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 13:52:05 fir-io7-s1 kernel: LNetError: 97438:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 154 previous similar messages Mar 14 13:57:50 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 14 13:57:50 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 13:58:29 fir-io7-s1 kernel: Lustre: fir-OST004c: haven't heard from client 3ae5255f-bd2a-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c69922e5800, cur 1584219509 expire 1584219359 last 1584219282 Mar 14 13:58:29 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 13:58:59 fir-io7-s1 kernel: Lustre: fir-OST004a: haven't heard from client 3ae5255f-bd2a-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6f6848ec00, cur 1584219539 expire 1584219389 last 1584219312 Mar 14 13:58:59 fir-io7-s1 kernel: Lustre: Skipped 3 previous similar messages Mar 14 13:59:36 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 4 seconds Mar 14 13:59:36 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 535 previous similar messages Mar 14 14:00:50 fir-io7-s1 kernel: LNetError: 99461:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 14 14:00:50 fir-io7-s1 kernel: LNetError: 99461:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 301 previous similar messages Mar 14 14:02:05 fir-io7-s1 kernel: LNetError: 99461:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 14:02:05 fir-io7-s1 kernel: LNetError: 99461:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 152 previous similar messages Mar 14 14:09:37 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 7 seconds Mar 14 14:09:37 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 498 previous similar messages Mar 14 14:11:00 fir-io7-s1 kernel: LNetError: 99461:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 14 14:11:00 fir-io7-s1 kernel: LNetError: 99461:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 14 14:11:25 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 14 14:11:25 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 14:12:10 fir-io7-s1 kernel: LNetError: 99880:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 14:12:10 fir-io7-s1 kernel: LNetError: 99880:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 158 previous similar messages Mar 14 14:12:30 fir-io7-s1 kernel: Lustre: fir-OST004a: haven't heard from client b8de01d0-02e9-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c78e1f1f400, cur 1584220350 expire 1584220200 last 1584220123 Mar 14 14:12:30 fir-io7-s1 kernel: Lustre: Skipped 1 previous similar message Mar 14 14:12:37 fir-io7-s1 kernel: Lustre: fir-OST0048: haven't heard from client b8de01d0-02e9-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c8a1be89400, cur 1584220357 expire 1584220207 last 1584220130 Mar 14 14:12:37 fir-io7-s1 kernel: Lustre: Skipped 4 previous similar messages Mar 14 14:19:25 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 14 14:19:25 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 14:19:46 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.238@o2ib7: 5 seconds Mar 14 14:19:46 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 508 previous similar messages Mar 14 14:20:39 fir-io7-s1 kernel: Lustre: fir-OST0052: haven't heard from client 023a05fd-5f3e-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6a39f66400, cur 1584220839 expire 1584220689 last 1584220612 Mar 14 14:21:05 fir-io7-s1 kernel: LNetError: 99880:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 14 14:21:05 fir-io7-s1 kernel: LNetError: 99880:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 14 14:22:20 fir-io7-s1 kernel: LNetError: 100324:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 14:22:20 fir-io7-s1 kernel: LNetError: 100324:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 159 previous similar messages Mar 14 14:29:51 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 10 seconds Mar 14 14:29:51 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 470 previous similar messages Mar 14 14:31:05 fir-io7-s1 kernel: LNetError: 100324:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 14 14:31:05 fir-io7-s1 kernel: LNetError: 100324:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 281 previous similar messages Mar 14 14:32:20 fir-io7-s1 kernel: LNetError: 100672:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 14:32:20 fir-io7-s1 kernel: LNetError: 100672:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 178 previous similar messages Mar 14 14:33:12 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 14 14:33:12 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 14:34:30 fir-io7-s1 kernel: Lustre: fir-OST0048: haven't heard from client a62d8d43-5537-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6990d99800, cur 1584221670 expire 1584221520 last 1584221443 Mar 14 14:34:30 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 14:39:52 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 6 seconds Mar 14 14:39:52 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 358 previous similar messages Mar 14 14:41:05 fir-io7-s1 kernel: LNetError: 100672:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 14 14:41:05 fir-io7-s1 kernel: LNetError: 100672:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 288 previous similar messages Mar 14 14:42:20 fir-io7-s1 kernel: LNetError: 101021:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 14:42:20 fir-io7-s1 kernel: LNetError: 101021:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 187 previous similar messages Mar 14 14:49:56 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 2 seconds Mar 14 14:49:56 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 395 previous similar messages Mar 14 14:51:05 fir-io7-s1 kernel: LNetError: 101382:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 14 14:51:05 fir-io7-s1 kernel: LNetError: 101382:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 14 14:52:20 fir-io7-s1 kernel: LNetError: 101235:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 14:52:20 fir-io7-s1 kernel: LNetError: 101235:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 189 previous similar messages Mar 14 14:53:01 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 14 14:53:01 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 14:53:18 fir-io7-s1 kernel: Lustre: fir-OST004c: haven't heard from client 42b41428-98e5-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c88144e2400, cur 1584222798 expire 1584222648 last 1584222571 Mar 14 14:53:18 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 14:59:51 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 14 14:59:51 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 15:00:06 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.227@o2ib7: 5 seconds Mar 14 15:00:06 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 369 previous similar messages Mar 14 15:01:00 fir-io7-s1 kernel: Lustre: fir-OST0050: haven't heard from client ad501daa-2951-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c698e500c00, cur 1584223260 expire 1584223110 last 1584223033 Mar 14 15:01:00 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 15:01:15 fir-io7-s1 kernel: LNetError: 101561:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 14 15:01:15 fir-io7-s1 kernel: LNetError: 101561:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 301 previous similar messages Mar 14 15:02:20 fir-io7-s1 kernel: LNetError: 101779:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 15:02:20 fir-io7-s1 kernel: LNetError: 101779:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 189 previous similar messages Mar 14 15:07:55 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 14 15:07:55 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 15:09:05 fir-io7-s1 kernel: Lustre: fir-OST004a: haven't heard from client 81483f19-1aea-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c74f34c0c00, cur 1584223745 expire 1584223595 last 1584223518 Mar 14 15:09:05 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 15:09:15 fir-io7-s1 kernel: Lustre: fir-OST004e: haven't heard from client 81483f19-1aea-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c74f34c5000, cur 1584223755 expire 1584223605 last 1584223528 Mar 14 15:09:15 fir-io7-s1 kernel: Lustre: Skipped 4 previous similar messages Mar 14 15:10:07 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 7 seconds Mar 14 15:10:07 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 509 previous similar messages Mar 14 15:11:20 fir-io7-s1 kernel: LNetError: 101779:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 14 15:11:20 fir-io7-s1 kernel: LNetError: 101779:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 315 previous similar messages Mar 14 15:12:20 fir-io7-s1 kernel: LNetError: 102133:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 15:12:20 fir-io7-s1 kernel: LNetError: 102133:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 197 previous similar messages Mar 14 15:19:22 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 14 15:19:22 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 15:19:32 fir-io7-s1 kernel: LustreError: 68657:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) ### client (nid 10.49.26.4@o2ib1) returned error from glimpse AST (req@ffff9c4a4ccd0d80 x1652475689400128 status -107 rc -107), evict it ns: filter-fir-OST004e_UUID lock: ffff9c5d4d79ee40/0x3bd9b8527e371ea9 lrc: 3/0,0 mode: PW/PW res: [0x1680000400:0x1103960:0x0].0x0 rrc: 3 type: EXT [0->18446744073709551615] (req 34359738368->18446744073709551615) flags: 0x40000000020000 nid: 10.49.26.4@o2ib1 remote: 0x9bdfe907465fa865 expref: 6 pid: 68574 timeout: 0 lvb_type: 0 Mar 14 15:19:32 fir-io7-s1 kernel: LustreError: 138-a: fir-OST0050: A client on nid 10.49.26.4@o2ib1 was evicted due to a lock glimpse callback time out: rc -107 Mar 14 15:19:32 fir-io7-s1 kernel: LustreError: 66897:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 1584224372s: evicting client at 10.49.26.4@o2ib1 ns: filter-fir-OST0050_UUID lock: ffff9c7ae06157c0/0x3bd9b8527e372586 lrc: 3/0,0 mode: PW/PW res: [0x1700000401:0x10f6e26:0x0].0x0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->134217727) flags: 0x40000080000000 nid: 10.49.26.4@o2ib1 remote: 0x9bdfe907465fa896 expref: 6 pid: 68366 timeout: 0 lvb_type: 0 Mar 14 15:19:32 fir-io7-s1 kernel: LustreError: 68657:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) Skipped 1 previous similar message Mar 14 15:20:09 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.237@o2ib7: 0 seconds Mar 14 15:20:09 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 354 previous similar messages Mar 14 15:20:29 fir-io7-s1 kernel: Lustre: fir-OST0052: haven't heard from client c95b0868-9590-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c7cdaa51400, cur 1584224429 expire 1584224279 last 1584224202 Mar 14 15:21:25 fir-io7-s1 kernel: LNetError: 102133:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 14 15:21:25 fir-io7-s1 kernel: LNetError: 102133:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 299 previous similar messages Mar 14 15:22:25 fir-io7-s1 kernel: LNetError: 102497:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 15:22:25 fir-io7-s1 kernel: LNetError: 102497:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 186 previous similar messages Mar 14 15:25:29 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 14 15:25:29 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 15:26:31 fir-io7-s1 kernel: Lustre: fir-OST0052: haven't heard from client c36352bd-661d-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c79cd491400, cur 1584224791 expire 1584224641 last 1584224564 Mar 14 15:26:31 fir-io7-s1 kernel: Lustre: Skipped 3 previous similar messages Mar 14 15:30:10 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.219@o2ib7: 0 seconds Mar 14 15:30:10 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 352 previous similar messages Mar 14 15:31:35 fir-io7-s1 kernel: LNetError: 102497:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 14 15:31:35 fir-io7-s1 kernel: LNetError: 102497:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 290 previous similar messages Mar 14 15:32:30 fir-io7-s1 kernel: LNetError: 102849:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 15:32:30 fir-io7-s1 kernel: LNetError: 102849:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 187 previous similar messages Mar 14 15:34:13 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 14 15:34:13 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 15:35:08 fir-io7-s1 kernel: Lustre: fir-OST004a: haven't heard from client 96350b8e-b3ac-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6990e11c00, cur 1584225308 expire 1584225158 last 1584225081 Mar 14 15:35:08 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 15:40:13 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 0 seconds Mar 14 15:40:13 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 398 previous similar messages Mar 14 15:41:35 fir-io7-s1 kernel: LNetError: 103134:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 14 15:41:35 fir-io7-s1 kernel: LNetError: 103134:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 301 previous similar messages Mar 14 15:42:30 fir-io7-s1 kernel: LNetError: 103134:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 15:42:30 fir-io7-s1 kernel: LNetError: 103134:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 162 previous similar messages Mar 14 15:42:39 fir-io7-s1 kernel: Lustre: fir-OST004a: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 14 15:42:39 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 15:43:02 fir-io7-s1 kernel: Lustre: fir-OST0048: haven't heard from client b6e72754-7b66-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c828ebd8400, cur 1584225782 expire 1584225632 last 1584225555 Mar 14 15:43:02 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 15:50:16 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.237@o2ib7: 5 seconds Mar 14 15:50:16 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 433 previous similar messages Mar 14 15:51:35 fir-io7-s1 kernel: LNetError: 103457:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 14 15:51:35 fir-io7-s1 kernel: LNetError: 103457:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 14 15:52:35 fir-io7-s1 kernel: LNetError: 103457:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 15:52:35 fir-io7-s1 kernel: LNetError: 103457:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 171 previous similar messages Mar 14 15:55:19 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 14 15:55:19 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 15:56:29 fir-io7-s1 kernel: Lustre: fir-OST004c: haven't heard from client 0c056e66-d0aa-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6992e80800, cur 1584226589 expire 1584226439 last 1584226362 Mar 14 15:56:29 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 16:00:20 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.217@o2ib7: 0 seconds Mar 14 16:00:20 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 476 previous similar messages Mar 14 16:01:40 fir-io7-s1 kernel: LNetError: 103726:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 14 16:01:40 fir-io7-s1 kernel: LNetError: 103726:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 14 16:02:35 fir-io7-s1 kernel: LNetError: 103778:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 16:02:35 fir-io7-s1 kernel: LNetError: 103778:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 158 previous similar messages Mar 14 16:05:46 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 14 16:05:46 fir-io7-s1 kernel: Lustre: Skipped 4 previous similar messages Mar 14 16:06:14 fir-io7-s1 kernel: Lustre: fir-OST004e: haven't heard from client 4f837ea0-3696-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c74f34c0c00, cur 1584227174 expire 1584227024 last 1584226947 Mar 14 16:06:14 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 16:10:26 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 8 seconds Mar 14 16:10:26 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 476 previous similar messages Mar 14 16:11:45 fir-io7-s1 kernel: LNetError: 8893:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 14 16:11:45 fir-io7-s1 kernel: LNetError: 8893:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 307 previous similar messages Mar 14 16:12:35 fir-io7-s1 kernel: LNetError: 102678:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 16:12:35 fir-io7-s1 kernel: LNetError: 102678:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 170 previous similar messages Mar 14 16:17:04 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 14 16:17:04 fir-io7-s1 kernel: Lustre: Skipped 6 previous similar messages Mar 14 16:17:55 fir-io7-s1 kernel: Lustre: fir-OST0050: haven't heard from client 717affbe-9e39-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c79cebb1000, cur 1584227875 expire 1584227725 last 1584227648 Mar 14 16:17:55 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 16:20:30 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 0 seconds Mar 14 16:20:30 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 465 previous similar messages Mar 14 16:21:50 fir-io7-s1 kernel: LNetError: 104287:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 14 16:21:50 fir-io7-s1 kernel: LNetError: 104287:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 14 16:22:38 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 16:22:38 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 154 previous similar messages Mar 14 16:30:41 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 5 seconds Mar 14 16:30:41 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 428 previous similar messages Mar 14 16:31:50 fir-io7-s1 kernel: LNetError: 104622:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 14 16:31:50 fir-io7-s1 kernel: LNetError: 104622:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 300 previous similar messages Mar 14 16:32:41 fir-io7-s1 kernel: LNetError: 104428:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 16:32:41 fir-io7-s1 kernel: LNetError: 104428:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 169 previous similar messages Mar 14 16:40:53 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds Mar 14 16:40:53 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 424 previous similar messages Mar 14 16:42:01 fir-io7-s1 kernel: LNetError: 104969:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 14 16:42:01 fir-io7-s1 kernel: LNetError: 104969:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 14 16:42:46 fir-io7-s1 kernel: LNetError: 105326:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 16:42:46 fir-io7-s1 kernel: LNetError: 105326:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 144 previous similar messages Mar 14 16:50:56 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.233@o2ib7: 5 seconds Mar 14 16:50:56 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 390 previous similar messages Mar 14 16:52:11 fir-io7-s1 kernel: LNetError: 105326:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 14 16:52:11 fir-io7-s1 kernel: LNetError: 105326:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 282 previous similar messages Mar 14 16:52:46 fir-io7-s1 kernel: LNetError: 105510:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 16:52:46 fir-io7-s1 kernel: LNetError: 105510:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 175 previous similar messages Mar 14 17:01:06 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 5 seconds Mar 14 17:01:06 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 419 previous similar messages Mar 14 17:02:20 fir-io7-s1 kernel: LNetError: 105694:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 14 17:02:20 fir-io7-s1 kernel: LNetError: 105694:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 280 previous similar messages Mar 14 17:02:50 fir-io7-s1 kernel: LNetError: 106061:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 17:02:50 fir-io7-s1 kernel: LNetError: 106061:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 150 previous similar messages Mar 14 17:11:09 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.233@o2ib7: 0 seconds Mar 14 17:11:09 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 291 previous similar messages Mar 14 17:12:20 fir-io7-s1 kernel: LNetError: 106061:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 14 17:12:20 fir-io7-s1 kernel: LNetError: 106061:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 14 17:12:50 fir-io7-s1 kernel: LNetError: 106411:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 17:12:50 fir-io7-s1 kernel: LNetError: 106411:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 173 previous similar messages Mar 14 17:19:25 fir-io7-s1 kernel: Lustre: fir-OST0048: haven't heard from client c6d54dfb-a66d-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c69929ce800, cur 1584231565 expire 1584231415 last 1584231338 Mar 14 17:19:25 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 17:20:10 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 14 17:20:10 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 17:21:10 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 10 seconds Mar 14 17:21:10 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 197 previous similar messages Mar 14 17:22:25 fir-io7-s1 kernel: LNetError: 106411:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 14 17:22:25 fir-io7-s1 kernel: LNetError: 106411:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 282 previous similar messages Mar 14 17:22:55 fir-io7-s1 kernel: LNetError: 106775:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 17:22:55 fir-io7-s1 kernel: LNetError: 106775:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 160 previous similar messages Mar 14 17:31:11 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.232@o2ib7: 6 seconds Mar 14 17:31:11 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 245 previous similar messages Mar 14 17:32:25 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 14 17:32:25 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 17:32:30 fir-io7-s1 kernel: LNetError: 105694:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 14 17:32:30 fir-io7-s1 kernel: LNetError: 105694:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 14 17:32:55 fir-io7-s1 kernel: LNetError: 106430:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 17:32:55 fir-io7-s1 kernel: LNetError: 106430:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 149 previous similar messages Mar 14 17:33:35 fir-io7-s1 kernel: Lustre: fir-OST0050: haven't heard from client c78f0434-b623-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c69903b4000, cur 1584232415 expire 1584232265 last 1584232188 Mar 14 17:33:35 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 17:40:22 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 14 17:40:22 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 17:41:14 fir-io7-s1 kernel: Lustre: fir-OST0052: haven't heard from client 2cdd7837-c275-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6f91f91800, cur 1584232874 expire 1584232724 last 1584232647 Mar 14 17:41:14 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 17:41:20 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 5 seconds Mar 14 17:41:20 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 351 previous similar messages Mar 14 17:42:35 fir-io7-s1 kernel: LNetError: 107289:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 14 17:42:35 fir-io7-s1 kernel: LNetError: 107289:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 280 previous similar messages Mar 14 17:43:00 fir-io7-s1 kernel: LNetError: 106702:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 17:43:00 fir-io7-s1 kernel: LNetError: 106702:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 179 previous similar messages Mar 14 17:46:22 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 14 17:46:22 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 17:47:30 fir-io7-s1 kernel: Lustre: fir-OST004c: haven't heard from client c9616f6b-0f0a-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c607894b000, cur 1584233250 expire 1584233100 last 1584233023 Mar 14 17:47:30 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 17:51:34 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds Mar 14 17:51:34 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 407 previous similar messages Mar 14 17:52:45 fir-io7-s1 kernel: LNetError: 107478:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 14 17:52:45 fir-io7-s1 kernel: LNetError: 107478:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 14 17:53:00 fir-io7-s1 kernel: LNetError: 107834:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 17:53:00 fir-io7-s1 kernel: LNetError: 107834:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 155 previous similar messages Mar 14 17:55:06 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 14 17:55:06 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 17:56:02 fir-io7-s1 kernel: Lustre: fir-OST004a: haven't heard from client b258a64d-e7e3-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c8182fb8800, cur 1584233762 expire 1584233612 last 1584233535 Mar 14 17:56:02 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 18:01:57 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.226@o2ib7: 0 seconds Mar 14 18:01:57 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 353 previous similar messages Mar 14 18:02:50 fir-io7-s1 kernel: LNetError: 107834:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 14 18:02:50 fir-io7-s1 kernel: LNetError: 107834:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 301 previous similar messages Mar 14 18:03:05 fir-io7-s1 kernel: LNetError: 107834:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 18:03:05 fir-io7-s1 kernel: LNetError: 107834:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 176 previous similar messages Mar 14 18:05:09 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 14 18:05:09 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 18:06:01 fir-io7-s1 kernel: Lustre: fir-OST0052: haven't heard from client e2310b24-8482-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c69923ad400, cur 1584234361 expire 1584234211 last 1584234134 Mar 14 18:06:01 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 18:12:05 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 5 seconds Mar 14 18:12:05 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 323 previous similar messages Mar 14 18:12:50 fir-io7-s1 kernel: LNetError: 107834:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 14 18:12:50 fir-io7-s1 kernel: LNetError: 107834:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 14 18:13:05 fir-io7-s1 kernel: LNetError: 108555:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 18:13:05 fir-io7-s1 kernel: LNetError: 108555:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 159 previous similar messages Mar 14 18:18:09 fir-io7-s1 kernel: Lustre: fir-OST0048: haven't heard from client 658b0513-13f6-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c7b14730800, cur 1584235089 expire 1584234939 last 1584234862 Mar 14 18:18:09 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 18:19:22 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 14 18:19:22 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 18:22:10 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.239@o2ib7: 6 seconds Mar 14 18:22:10 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 291 previous similar messages Mar 14 18:22:55 fir-io7-s1 kernel: LNetError: 108555:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 14 18:22:55 fir-io7-s1 kernel: LNetError: 108555:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 14 18:23:05 fir-io7-s1 kernel: LNetError: 107315:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 18:23:05 fir-io7-s1 kernel: LNetError: 107315:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 170 previous similar messages Mar 14 18:32:20 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 3 seconds Mar 14 18:32:20 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 399 previous similar messages Mar 14 18:33:05 fir-io7-s1 kernel: LNetError: 109244:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 14 18:33:05 fir-io7-s1 kernel: LNetError: 109244:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 14 18:33:05 fir-io7-s1 kernel: LNetError: 107845:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 18:33:05 fir-io7-s1 kernel: LNetError: 107845:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 194 previous similar messages Mar 14 18:33:45 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 14 18:33:45 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 18:34:52 fir-io7-s1 kernel: Lustre: fir-OST0052: haven't heard from client 2f6e8ee3-08d7-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6991d61000, cur 1584236092 expire 1584235942 last 1584235865 Mar 14 18:34:52 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 18:42:21 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.232@o2ib7: 7 seconds Mar 14 18:42:21 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 442 previous similar messages Mar 14 18:43:05 fir-io7-s1 kernel: LNetError: 109550:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 14 18:43:05 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 18:43:05 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 161 previous similar messages Mar 14 18:43:05 fir-io7-s1 kernel: LNetError: 109550:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 300 previous similar messages Mar 14 18:43:50 fir-io7-s1 kernel: Lustre: fir-OST0048: haven't heard from client 27c8a355-9970-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c812c96d800, cur 1584236630 expire 1584236480 last 1584236403 Mar 14 18:43:50 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 18:44:43 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 14 18:44:43 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 18:52:22 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 6 seconds Mar 14 18:52:22 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 813 previous similar messages Mar 14 18:53:10 fir-io7-s1 kernel: LNetError: 109550:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 14 18:53:10 fir-io7-s1 kernel: LNetError: 109550:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 14 18:53:10 fir-io7-s1 kernel: LNetError: 107845:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 18:53:10 fir-io7-s1 kernel: LNetError: 107845:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 159 previous similar messages Mar 14 18:58:35 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 14 18:58:35 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 18:59:48 fir-io7-s1 kernel: Lustre: fir-OST004c: haven't heard from client cb05a300-7954-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c768773b000, cur 1584237588 expire 1584237438 last 1584237361 Mar 14 18:59:48 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 19:02:23 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.227@o2ib7: 0 seconds Mar 14 19:02:23 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 797 previous similar messages Mar 14 19:03:15 fir-io7-s1 kernel: LNetError: 109974:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 14 19:03:15 fir-io7-s1 kernel: LNetError: 109974:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 14 19:03:15 fir-io7-s1 kernel: LNetError: 107845:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 19:03:15 fir-io7-s1 kernel: LNetError: 107845:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 77 previous similar messages Mar 14 19:07:24 fir-io7-s1 kernel: Lustre: fir-OST004e: haven't heard from client 559fc0bb-8b5e-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6bd84fb800, cur 1584238044 expire 1584237894 last 1584237817 Mar 14 19:07:24 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 19:08:38 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 14 19:08:38 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 19:12:26 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.233@o2ib7: 7 seconds Mar 14 19:12:26 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 740 previous similar messages Mar 14 19:13:20 fir-io7-s1 kernel: LNetError: 110526:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 14 19:13:20 fir-io7-s1 kernel: LNetError: 110526:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 295 previous similar messages Mar 14 19:13:20 fir-io7-s1 kernel: LNetError: 107845:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 19:13:20 fir-io7-s1 kernel: LNetError: 107845:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 93 previous similar messages Mar 14 19:22:31 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.232@o2ib7: 6 seconds Mar 14 19:22:31 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 568 previous similar messages Mar 14 19:23:00 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 14 19:23:00 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 19:23:25 fir-io7-s1 kernel: LNetError: 110950:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 19:23:25 fir-io7-s1 kernel: LNetError: 110950:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 121 previous similar messages Mar 14 19:23:30 fir-io7-s1 kernel: LNetError: 110526:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 14 19:23:30 fir-io7-s1 kernel: LNetError: 110526:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 14 19:24:08 fir-io7-s1 kernel: Lustre: fir-OST0052: haven't heard from client f211e235-864c-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c73314d8c00, cur 1584239048 expire 1584238898 last 1584238821 Mar 14 19:24:08 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 19:32:40 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.232@o2ib7: 0 seconds Mar 14 19:32:40 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 712 previous similar messages Mar 14 19:33:29 fir-io7-s1 kernel: Lustre: fir-OST0050: haven't heard from client 77383d01-8335-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c7513088000, cur 1584239609 expire 1584239459 last 1584239382 Mar 14 19:33:29 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 19:33:30 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 19:33:30 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 91 previous similar messages Mar 14 19:33:35 fir-io7-s1 kernel: LNetError: 111053:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 14 19:33:35 fir-io7-s1 kernel: LNetError: 111053:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 14 19:34:31 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 14 19:34:31 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 19:42:41 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.219@o2ib7: 0 seconds Mar 14 19:42:41 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 533 previous similar messages Mar 14 19:43:35 fir-io7-s1 kernel: LNetError: 111771:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 14 19:43:35 fir-io7-s1 kernel: LNetError: 111771:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 14 19:43:35 fir-io7-s1 kernel: LNetError: 107845:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 19:43:35 fir-io7-s1 kernel: LNetError: 107845:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 122 previous similar messages Mar 14 19:52:46 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 2 seconds Mar 14 19:52:46 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 742 previous similar messages Mar 14 19:53:35 fir-io7-s1 kernel: LNetError: 111980:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 14 19:53:35 fir-io7-s1 kernel: LNetError: 111980:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 288 previous similar messages Mar 14 19:53:35 fir-io7-s1 kernel: LNetError: 107845:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 19:53:35 fir-io7-s1 kernel: LNetError: 107845:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 84 previous similar messages Mar 14 20:02:56 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.239@o2ib7: 4 seconds Mar 14 20:02:56 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 738 previous similar messages Mar 14 20:03:40 fir-io7-s1 kernel: LNetError: 112317:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 14 20:03:40 fir-io7-s1 kernel: LNetError: 112317:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 14 20:03:40 fir-io7-s1 kernel: LNetError: 107845:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 20:03:40 fir-io7-s1 kernel: LNetError: 107845:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 91 previous similar messages Mar 14 20:12:57 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.233@o2ib7: 5 seconds Mar 14 20:12:57 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 688 previous similar messages Mar 14 20:13:40 fir-io7-s1 kernel: LNetError: 112317:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 14 20:13:40 fir-io7-s1 kernel: LNetError: 112317:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 14 20:13:40 fir-io7-s1 kernel: LNetError: 107845:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 20:13:40 fir-io7-s1 kernel: LNetError: 107845:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 97 previous similar messages Mar 14 20:23:11 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.239@o2ib7: 10 seconds Mar 14 20:23:11 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 708 previous similar messages Mar 14 20:23:40 fir-io7-s1 kernel: LNetError: 113161:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 14 20:23:40 fir-io7-s1 kernel: LNetError: 113161:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 301 previous similar messages Mar 14 20:23:40 fir-io7-s1 kernel: LNetError: 107845:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 20:23:40 fir-io7-s1 kernel: LNetError: 107845:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 117 previous similar messages Mar 14 20:33:14 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.236@o2ib7: 0 seconds Mar 14 20:33:14 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 513 previous similar messages Mar 14 20:33:40 fir-io7-s1 kernel: LNetError: 113383:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 14 20:33:40 fir-io7-s1 kernel: LNetError: 113383:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 293 previous similar messages Mar 14 20:33:40 fir-io7-s1 kernel: LNetError: 107845:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 20:33:40 fir-io7-s1 kernel: LNetError: 107845:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 156 previous similar messages Mar 14 20:43:24 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.232@o2ib7: 0 seconds Mar 14 20:43:24 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 614 previous similar messages Mar 14 20:43:45 fir-io7-s1 kernel: LNetError: 113817:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 14 20:43:45 fir-io7-s1 kernel: LNetError: 113817:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 299 previous similar messages Mar 14 20:43:45 fir-io7-s1 kernel: LNetError: 107845:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 20:43:45 fir-io7-s1 kernel: LNetError: 107845:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 132 previous similar messages Mar 14 20:53:31 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 0 seconds Mar 14 20:53:31 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 617 previous similar messages Mar 14 20:53:45 fir-io7-s1 kernel: LNetError: 113817:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 14 20:53:45 fir-io7-s1 kernel: LNetError: 113817:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 288 previous similar messages Mar 14 20:53:45 fir-io7-s1 kernel: LNetError: 107845:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 20:53:45 fir-io7-s1 kernel: LNetError: 107845:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 114 previous similar messages Mar 14 21:03:33 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.219@o2ib7: 0 seconds Mar 14 21:03:33 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 505 previous similar messages Mar 14 21:03:45 fir-io7-s1 kernel: LNetError: 114259:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 14 21:03:45 fir-io7-s1 kernel: LNetError: 113619:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 21:03:45 fir-io7-s1 kernel: LNetError: 113619:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 207 previous similar messages Mar 14 21:03:45 fir-io7-s1 kernel: LNetError: 114259:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 335 previous similar messages Mar 14 21:11:06 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 14 21:11:06 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 21:12:24 fir-io7-s1 kernel: Lustre: fir-OST0050: haven't heard from client e24bce0c-f7ae-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c73265ec800, cur 1584245544 expire 1584245394 last 1584245317 Mar 14 21:12:24 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 21:13:34 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.239@o2ib7: 0 seconds Mar 14 21:13:34 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 423 previous similar messages Mar 14 21:13:45 fir-io7-s1 kernel: LNetError: 114955:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 14 21:13:45 fir-io7-s1 kernel: LNetError: 114955:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 302 previous similar messages Mar 14 21:13:45 fir-io7-s1 kernel: LNetError: 107845:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 21:13:45 fir-io7-s1 kernel: LNetError: 107845:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 179 previous similar messages Mar 14 21:20:45 fir-io7-s1 kernel: Lustre: fir-OST004c: haven't heard from client 17111f78-7afb-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c7ebcaf3c00, cur 1584246045 expire 1584245895 last 1584245818 Mar 14 21:20:45 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 21:21:58 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 14 21:21:58 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 21:23:41 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 7 seconds Mar 14 21:23:41 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 215 previous similar messages Mar 14 21:23:45 fir-io7-s1 kernel: LNetError: 114536:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 21:23:45 fir-io7-s1 kernel: LNetError: 114536:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 197 previous similar messages Mar 14 21:23:55 fir-io7-s1 kernel: LNetError: 114955:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 14 21:23:55 fir-io7-s1 kernel: LNetError: 114955:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 310 previous similar messages Mar 14 21:29:43 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 14 21:29:43 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 21:30:47 fir-io7-s1 kernel: Lustre: fir-OST004c: haven't heard from client e9d3ce30-67c3-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c4f7dd9e400, cur 1584246647 expire 1584246497 last 1584246420 Mar 14 21:30:47 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 21:33:45 fir-io7-s1 kernel: LNetError: 115334:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 21:33:45 fir-io7-s1 kernel: LNetError: 115334:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 183 previous similar messages Mar 14 21:33:46 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 8 seconds Mar 14 21:33:46 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 468 previous similar messages Mar 14 21:34:00 fir-io7-s1 kernel: LNetError: 115673:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 14 21:34:00 fir-io7-s1 kernel: LNetError: 115673:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 299 previous similar messages Mar 14 21:37:59 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 14 21:37:59 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 21:38:57 fir-io7-s1 kernel: Lustre: fir-OST004c: haven't heard from client e272014f-d8b8-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c698e9e1000, cur 1584247137 expire 1584246987 last 1584246910 Mar 14 21:38:57 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 21:40:27 fir-io7-s1 kernel: Lustre: fir-OST004a: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 14 21:40:27 fir-io7-s1 kernel: Lustre: Skipped 4 previous similar messages Mar 14 21:41:46 fir-io7-s1 kernel: Lustre: fir-OST0052: haven't heard from client 16e8fbcf-b79e-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6992d3ec00, cur 1584247306 expire 1584247156 last 1584247079 Mar 14 21:41:46 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 21:43:45 fir-io7-s1 kernel: LNetError: 116013:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 21:43:45 fir-io7-s1 kernel: LNetError: 116013:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 207 previous similar messages Mar 14 21:44:00 fir-io7-s1 kernel: LNetError: 116013:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 14 21:44:00 fir-io7-s1 kernel: LNetError: 116013:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 301 previous similar messages Mar 14 21:44:01 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.238@o2ib7: 0 seconds Mar 14 21:44:01 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 289 previous similar messages Mar 14 21:50:47 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 14 21:50:47 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 21:51:47 fir-io7-s1 kernel: Lustre: fir-OST0048: haven't heard from client e035008c-8db4-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c73314da400, cur 1584247907 expire 1584247757 last 1584247680 Mar 14 21:51:47 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 21:53:45 fir-io7-s1 kernel: LNetError: 116013:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 21:53:45 fir-io7-s1 kernel: LNetError: 116013:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 183 previous similar messages Mar 14 21:54:00 fir-io7-s1 kernel: LNetError: 116374:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 14 21:54:00 fir-io7-s1 kernel: LNetError: 116374:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 14 21:54:06 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.233@o2ib7: 3 seconds Mar 14 21:54:06 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 333 previous similar messages Mar 14 22:03:45 fir-io7-s1 kernel: LNetError: 116734:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 22:03:45 fir-io7-s1 kernel: LNetError: 116734:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 197 previous similar messages Mar 14 22:04:00 fir-io7-s1 kernel: LNetError: 116734:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 14 22:04:00 fir-io7-s1 kernel: LNetError: 116734:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 292 previous similar messages Mar 14 22:04:16 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.236@o2ib7: 0 seconds Mar 14 22:04:16 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 305 previous similar messages Mar 14 22:07:29 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 14 22:07:29 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 22:08:48 fir-io7-s1 kernel: Lustre: fir-OST0048: haven't heard from client ccea9329-d069-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c78e1f19400, cur 1584248928 expire 1584248778 last 1584248701 Mar 14 22:08:48 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 22:13:50 fir-io7-s1 kernel: LNetError: 107845:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 22:13:50 fir-io7-s1 kernel: LNetError: 107845:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 167 previous similar messages Mar 14 22:14:05 fir-io7-s1 kernel: LNetError: 117004:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 14 22:14:05 fir-io7-s1 kernel: LNetError: 117004:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 14 22:14:17 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.238@o2ib7: 0 seconds Mar 14 22:14:17 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 485 previous similar messages Mar 14 22:14:36 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 14 22:14:36 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 22:15:28 fir-io7-s1 kernel: Lustre: fir-OST004a: haven't heard from client e3ca1468-976e-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c4c9fb3ac00, cur 1584249328 expire 1584249178 last 1584249101 Mar 14 22:15:28 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 22:23:50 fir-io7-s1 kernel: LNetError: 116792:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 22:23:50 fir-io7-s1 kernel: LNetError: 116792:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 176 previous similar messages Mar 14 22:24:15 fir-io7-s1 kernel: LNetError: 117004:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 14 22:24:15 fir-io7-s1 kernel: LNetError: 117004:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 14 22:24:20 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.232@o2ib7: 0 seconds Mar 14 22:24:20 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 452 previous similar messages Mar 14 22:27:56 fir-io7-s1 kernel: Lustre: fir-OST004a: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 14 22:27:56 fir-io7-s1 kernel: Lustre: Skipped 4 previous similar messages Mar 14 22:28:51 fir-io7-s1 kernel: Lustre: fir-OST0048: haven't heard from client 06eb66f8-2987-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c80001f0c00, cur 1584250131 expire 1584249981 last 1584249904 Mar 14 22:28:51 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 22:33:54 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 22:33:54 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 203 previous similar messages Mar 14 22:34:15 fir-io7-s1 kernel: LNetError: 117775:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 14 22:34:15 fir-io7-s1 kernel: LNetError: 117775:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 14 22:34:31 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 5 seconds Mar 14 22:34:31 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 479 previous similar messages Mar 14 22:35:29 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 14 22:35:29 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 22:36:20 fir-io7-s1 kernel: Lustre: fir-OST004a: haven't heard from client 7756e271-d399-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c73314d9800, cur 1584250580 expire 1584250430 last 1584250353 Mar 14 22:36:20 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 22:43:34 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 14 22:43:34 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 22:43:55 fir-io7-s1 kernel: LNetError: 118149:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 22:43:55 fir-io7-s1 kernel: LNetError: 118149:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 193 previous similar messages Mar 14 22:44:15 fir-io7-s1 kernel: LNetError: 117775:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 14 22:44:15 fir-io7-s1 kernel: LNetError: 117775:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 14 22:44:43 fir-io7-s1 kernel: Lustre: fir-OST004a: haven't heard from client 87444dd8-20f6-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c78a5e68400, cur 1584251083 expire 1584250933 last 1584250856 Mar 14 22:44:43 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 22:44:50 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.232@o2ib7: 0 seconds Mar 14 22:44:50 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 369 previous similar messages Mar 14 22:51:12 fir-io7-s1 kernel: Lustre: fir-OST004a: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 14 22:51:12 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 14 22:51:12 fir-io7-s1 kernel: Lustre: Skipped 4 previous similar messages Mar 14 22:52:23 fir-io7-s1 kernel: Lustre: fir-OST004a: haven't heard from client b5ac098c-4091-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6c0009e400, cur 1584251543 expire 1584251393 last 1584251316 Mar 14 22:52:23 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 22:54:05 fir-io7-s1 kernel: LNetError: 118460:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 22:54:05 fir-io7-s1 kernel: LNetError: 118460:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 211 previous similar messages Mar 14 22:54:20 fir-io7-s1 kernel: LNetError: 118460:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 14 22:54:20 fir-io7-s1 kernel: LNetError: 118460:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 282 previous similar messages Mar 14 22:54:51 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.232@o2ib7: 3 seconds Mar 14 22:54:51 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 412 previous similar messages Mar 14 22:58:58 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 14 22:58:58 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 22:59:10 fir-io7-s1 kernel: Lustre: fir-OST0050: haven't heard from client 59eef819-b540-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6b79bc9800, cur 1584251950 expire 1584251800 last 1584251723 Mar 14 22:59:10 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 23:04:05 fir-io7-s1 kernel: LNetError: 117955:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 23:04:05 fir-io7-s1 kernel: LNetError: 117955:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 164 previous similar messages Mar 14 23:04:30 fir-io7-s1 kernel: LNetError: 118688:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 14 23:04:30 fir-io7-s1 kernel: LNetError: 118688:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 14 23:05:02 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.227@o2ib7: 0 seconds Mar 14 23:05:02 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 531 previous similar messages Mar 14 23:11:42 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 14 23:11:42 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 23:12:23 fir-io7-s1 kernel: Lustre: fir-OST004a: haven't heard from client 07828a5f-a415-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c7319cf2400, cur 1584252743 expire 1584252593 last 1584252516 Mar 14 23:12:23 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 23:14:07 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 23:14:07 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 167 previous similar messages Mar 14 23:14:35 fir-io7-s1 kernel: LNetError: 119166:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 14 23:14:35 fir-io7-s1 kernel: LNetError: 119166:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 280 previous similar messages Mar 14 23:15:08 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.235@o2ib7: 0 seconds Mar 14 23:15:08 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 478 previous similar messages Mar 14 23:24:10 fir-io7-s1 kernel: LNetError: 119345:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 23:24:10 fir-io7-s1 kernel: LNetError: 119345:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 195 previous similar messages Mar 14 23:24:40 fir-io7-s1 kernel: LNetError: 119590:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 14 23:24:40 fir-io7-s1 kernel: LNetError: 119590:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 295 previous similar messages Mar 14 23:25:14 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.232@o2ib7: 0 seconds Mar 14 23:25:14 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 436 previous similar messages Mar 14 23:34:10 fir-io7-s1 kernel: LNetError: 107845:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 23:34:10 fir-io7-s1 kernel: LNetError: 107845:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 174 previous similar messages Mar 14 23:34:40 fir-io7-s1 kernel: LNetError: 119590:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 14 23:34:40 fir-io7-s1 kernel: LNetError: 119590:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 14 23:35:15 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.235@o2ib7: 0 seconds Mar 14 23:35:15 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 509 previous similar messages Mar 14 23:44:10 fir-io7-s1 kernel: LNetError: 120199:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 23:44:10 fir-io7-s1 kernel: LNetError: 120199:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 192 previous similar messages Mar 14 23:44:45 fir-io7-s1 kernel: LNetError: 120199:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 14 23:44:45 fir-io7-s1 kernel: LNetError: 120199:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 298 previous similar messages Mar 14 23:45:18 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.237@o2ib7: 0 seconds Mar 14 23:45:18 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 394 previous similar messages Mar 14 23:49:32 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 14 23:49:32 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 23:50:38 fir-io7-s1 kernel: Lustre: fir-OST0052: haven't heard from client 5a3f791c-6860-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c7a01e01800, cur 1584255038 expire 1584254888 last 1584254811 Mar 14 23:50:38 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 23:52:01 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 14 23:52:01 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 23:53:19 fir-io7-s1 kernel: Lustre: fir-OST0052: haven't heard from client 4ff18c73-aee0-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c8a093b8800, cur 1584255199 expire 1584255049 last 1584254972 Mar 14 23:53:19 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 14 23:54:10 fir-io7-s1 kernel: LNetError: 120199:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 14 23:54:10 fir-io7-s1 kernel: LNetError: 120199:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 187 previous similar messages Mar 14 23:54:55 fir-io7-s1 kernel: LNetError: 120654:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 14 23:54:55 fir-io7-s1 kernel: LNetError: 120654:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 14 23:55:21 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 1 seconds Mar 14 23:55:21 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 500 previous similar messages Mar 14 23:59:57 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 14 23:59:57 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 00:00:49 fir-io7-s1 kernel: Lustre: fir-OST004c: haven't heard from client 9e6368fb-a774-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c78ef41e000, cur 1584255649 expire 1584255499 last 1584255422 Mar 15 00:00:49 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 00:04:10 fir-io7-s1 kernel: LNetError: 120916:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 00:04:10 fir-io7-s1 kernel: LNetError: 120916:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 170 previous similar messages Mar 15 00:05:05 fir-io7-s1 kernel: LNetError: 120654:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 15 00:05:05 fir-io7-s1 kernel: LNetError: 120654:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 307 previous similar messages Mar 15 00:05:23 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 0 seconds Mar 15 00:05:23 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 504 previous similar messages Mar 15 00:12:52 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 00:12:52 fir-io7-s1 kernel: Lustre: Skipped 4 previous similar messages Mar 15 00:13:47 fir-io7-s1 kernel: Lustre: fir-OST004e: haven't heard from client cbfeae0a-7595-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c4c05accc00, cur 1584256427 expire 1584256277 last 1584256200 Mar 15 00:13:47 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 00:14:15 fir-io7-s1 kernel: LNetError: 121053:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 00:14:15 fir-io7-s1 kernel: LNetError: 121053:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 180 previous similar messages Mar 15 00:15:05 fir-io7-s1 kernel: LNetError: 121053:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 15 00:15:05 fir-io7-s1 kernel: LNetError: 121053:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 15 00:15:25 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.232@o2ib7: 0 seconds Mar 15 00:15:25 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 409 previous similar messages Mar 15 00:22:44 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 00:22:44 fir-io7-s1 kernel: Lustre: Skipped 6 previous similar messages Mar 15 00:23:46 fir-io7-s1 kernel: Lustre: fir-OST0048: haven't heard from client be39122d-9a73-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c79ceab3400, cur 1584257026 expire 1584256876 last 1584256799 Mar 15 00:23:46 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 00:24:15 fir-io7-s1 kernel: LNetError: 121400:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 00:24:15 fir-io7-s1 kernel: LNetError: 121400:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 202 previous similar messages Mar 15 00:25:05 fir-io7-s1 kernel: LNetError: 121717:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 15 00:25:05 fir-io7-s1 kernel: LNetError: 121717:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 283 previous similar messages Mar 15 00:25:30 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.226@o2ib7: 0 seconds Mar 15 00:25:30 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 438 previous similar messages Mar 15 00:34:20 fir-io7-s1 kernel: LNetError: 121717:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 00:34:20 fir-io7-s1 kernel: LNetError: 121717:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 201 previous similar messages Mar 15 00:35:15 fir-io7-s1 kernel: LNetError: 121717:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 15 00:35:15 fir-io7-s1 kernel: LNetError: 121717:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 295 previous similar messages Mar 15 00:35:31 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.219@o2ib7: 4 seconds Mar 15 00:35:31 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 415 previous similar messages Mar 15 00:35:44 fir-io7-s1 kernel: Lustre: fir-OST004c: haven't heard from client 73ab419b-96af-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c79ce41c400, cur 1584257744 expire 1584257594 last 1584257517 Mar 15 00:35:44 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 00:35:45 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 00:35:45 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 00:44:25 fir-io7-s1 kernel: LNetError: 121717:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 00:44:25 fir-io7-s1 kernel: LNetError: 121717:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 210 previous similar messages Mar 15 00:45:20 fir-io7-s1 kernel: LNetError: 121717:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 15 00:45:20 fir-io7-s1 kernel: LNetError: 121717:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 305 previous similar messages Mar 15 00:45:36 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 1 seconds Mar 15 00:45:36 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 438 previous similar messages Mar 15 00:47:03 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 00:47:03 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 00:47:55 fir-io7-s1 kernel: Lustre: fir-OST004c: haven't heard from client 0a369f03-0f04-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c740c7cf400, cur 1584258475 expire 1584258325 last 1584258248 Mar 15 00:47:55 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 00:53:47 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 00:53:47 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 00:54:11 fir-io7-s1 kernel: Lustre: fir-OST004c: haven't heard from client b01589ca-7419-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c7ae1d79000, cur 1584258851 expire 1584258701 last 1584258624 Mar 15 00:54:11 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 00:54:30 fir-io7-s1 kernel: LNetError: 122452:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 00:54:30 fir-io7-s1 kernel: LNetError: 122452:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 190 previous similar messages Mar 15 00:55:30 fir-io7-s1 kernel: LNetError: 122783:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 15 00:55:30 fir-io7-s1 kernel: LNetError: 122783:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 301 previous similar messages Mar 15 00:55:46 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.219@o2ib7: 5 seconds Mar 15 00:55:46 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 542 previous similar messages Mar 15 01:00:01 fir-io7-s1 kernel: md: data-check of RAID array md2 Mar 15 01:00:07 fir-io7-s1 kernel: md: data-check of RAID array md4 Mar 15 01:04:35 fir-io7-s1 kernel: LNetError: 122451:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 01:04:35 fir-io7-s1 kernel: LNetError: 122451:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 215 previous similar messages Mar 15 01:05:40 fir-io7-s1 kernel: LNetError: 122783:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 15 01:05:40 fir-io7-s1 kernel: LNetError: 122783:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 323 previous similar messages Mar 15 01:06:22 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.237@o2ib7: 0 seconds Mar 15 01:06:22 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 579 previous similar messages Mar 15 01:14:35 fir-io7-s1 kernel: LNetError: 123255:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 01:14:35 fir-io7-s1 kernel: LNetError: 123255:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 193 previous similar messages Mar 15 01:15:40 fir-io7-s1 kernel: LNetError: 123255:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 15 01:15:40 fir-io7-s1 kernel: LNetError: 123255:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 295 previous similar messages Mar 15 01:16:26 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 1 seconds Mar 15 01:16:26 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 538 previous similar messages Mar 15 01:24:35 fir-io7-s1 kernel: LNetError: 123634:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 01:24:35 fir-io7-s1 kernel: LNetError: 123634:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 182 previous similar messages Mar 15 01:25:40 fir-io7-s1 kernel: LNetError: 123971:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 15 01:25:40 fir-io7-s1 kernel: LNetError: 123971:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 15 01:26:27 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 7 seconds Mar 15 01:26:27 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 367 previous similar messages Mar 15 01:34:35 fir-io7-s1 kernel: LNetError: 123971:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 01:34:35 fir-io7-s1 kernel: LNetError: 123971:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 177 previous similar messages Mar 15 01:35:40 fir-io7-s1 kernel: LNetError: 124355:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 15 01:35:40 fir-io7-s1 kernel: LNetError: 124355:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 301 previous similar messages Mar 15 01:36:36 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 5 seconds Mar 15 01:36:36 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 386 previous similar messages Mar 15 01:44:35 fir-io7-s1 kernel: LNetError: 124573:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 01:44:35 fir-io7-s1 kernel: LNetError: 124573:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 188 previous similar messages Mar 15 01:45:45 fir-io7-s1 kernel: LNetError: 124355:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 15 01:45:45 fir-io7-s1 kernel: LNetError: 124355:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 15 01:46:41 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.217@o2ib7: 0 seconds Mar 15 01:46:41 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 406 previous similar messages Mar 15 01:54:35 fir-io7-s1 kernel: LNetError: 124981:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 01:54:35 fir-io7-s1 kernel: LNetError: 124981:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 195 previous similar messages Mar 15 01:55:50 fir-io7-s1 kernel: LNetError: 124981:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 15 01:55:50 fir-io7-s1 kernel: LNetError: 124981:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 292 previous similar messages Mar 15 01:56:51 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.217@o2ib7: 2 seconds Mar 15 01:56:51 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 299 previous similar messages Mar 15 02:04:35 fir-io7-s1 kernel: LNetError: 125356:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 02:04:35 fir-io7-s1 kernel: LNetError: 125356:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 178 previous similar messages Mar 15 02:06:00 fir-io7-s1 kernel: LNetError: 124981:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 15 02:06:00 fir-io7-s1 kernel: LNetError: 124981:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 281 previous similar messages Mar 15 02:07:01 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.232@o2ib7: 6 seconds Mar 15 02:07:01 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 452 previous similar messages Mar 15 02:14:35 fir-io7-s1 kernel: LNetError: 125356:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 02:14:35 fir-io7-s1 kernel: LNetError: 125356:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 172 previous similar messages Mar 15 02:16:10 fir-io7-s1 kernel: LNetError: 125567:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 15 02:16:10 fir-io7-s1 kernel: LNetError: 125567:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 301 previous similar messages Mar 15 02:17:06 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.233@o2ib7: 0 seconds Mar 15 02:17:06 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 492 previous similar messages Mar 15 02:24:21 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 02:24:21 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 02:24:35 fir-io7-s1 kernel: LNetError: 125584:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 02:24:35 fir-io7-s1 kernel: LNetError: 125584:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 186 previous similar messages Mar 15 02:25:24 fir-io7-s1 kernel: Lustre: fir-OST004a: haven't heard from client 87bd8652-951d-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6b34efb000, cur 1584264324 expire 1584264174 last 1584264097 Mar 15 02:25:24 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 02:26:10 fir-io7-s1 kernel: LNetError: 125977:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 15 02:26:10 fir-io7-s1 kernel: LNetError: 125977:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 280 previous similar messages Mar 15 02:26:57 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 02:26:57 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 02:27:07 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 2 seconds Mar 15 02:27:07 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 464 previous similar messages Mar 15 02:28:08 fir-io7-s1 kernel: Lustre: fir-OST004c: haven't heard from client 31ae654d-3470-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c7ae1d7ec00, cur 1584264488 expire 1584264338 last 1584264261 Mar 15 02:28:08 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 02:34:40 fir-io7-s1 kernel: LNetError: 126357:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 02:34:40 fir-io7-s1 kernel: LNetError: 126357:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 176 previous similar messages Mar 15 02:35:02 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 02:35:02 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 02:36:10 fir-io7-s1 kernel: LNetError: 126682:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 15 02:36:10 fir-io7-s1 kernel: LNetError: 126682:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 280 previous similar messages Mar 15 02:36:11 fir-io7-s1 kernel: Lustre: fir-OST004a: haven't heard from client 7d9dcccd-5904-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c8a1be8b000, cur 1584264971 expire 1584264821 last 1584264744 Mar 15 02:36:11 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 02:37:11 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.217@o2ib7: 0 seconds Mar 15 02:37:11 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 471 previous similar messages Mar 15 02:37:31 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 02:37:31 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 02:38:49 fir-io7-s1 kernel: Lustre: fir-OST0050: haven't heard from client 6f9a4147-5d50-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c79cea81400, cur 1584265129 expire 1584264979 last 1584264902 Mar 15 02:38:49 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 02:44:30 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 02:44:30 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 02:44:40 fir-io7-s1 kernel: LNetError: 126682:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 02:44:40 fir-io7-s1 kernel: LNetError: 126682:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 193 previous similar messages Mar 15 02:45:30 fir-io7-s1 kernel: Lustre: fir-OST004e: haven't heard from client 5fa4629f-efae-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c803b02cc00, cur 1584265530 expire 1584265380 last 1584265303 Mar 15 02:45:30 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 02:46:15 fir-io7-s1 kernel: LNetError: 127061:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 15 02:46:15 fir-io7-s1 kernel: LNetError: 127061:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 301 previous similar messages Mar 15 02:47:12 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.236@o2ib7: 0 seconds Mar 15 02:47:12 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 499 previous similar messages Mar 15 02:54:40 fir-io7-s1 kernel: LNetError: 127020:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 02:54:40 fir-io7-s1 kernel: LNetError: 127020:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 182 previous similar messages Mar 15 02:55:43 fir-io7-s1 kernel: Lustre: fir-OST0048: haven't heard from client 9f5b18d0-eca0-4 (at 10.50.9.27@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c7319cf3c00, cur 1584266143 expire 1584265993 last 1584265916 Mar 15 02:55:43 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 02:55:55 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 02:55:55 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 02:56:24 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 2f501a16-5689-4 (at 10.50.9.27@o2ib2) Mar 15 02:56:24 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 02:56:25 fir-io7-s1 kernel: LNetError: 127272:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 15 02:56:25 fir-io7-s1 kernel: LNetError: 127272:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 15 02:56:59 fir-io7-s1 kernel: Lustre: fir-OST0052: haven't heard from client 19d58cb8-52aa-4 (at 10.49.26.4@o2ib1) in 221 seconds. I think it's dead, and I am evicting it. exp ffff9c597edfc000, cur 1584266219 expire 1584266069 last 1584265998 Mar 15 02:56:59 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 02:57:13 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 0 seconds Mar 15 02:57:13 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 456 previous similar messages Mar 15 03:04:50 fir-io7-s1 kernel: LNetError: 127500:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 03:04:50 fir-io7-s1 kernel: LNetError: 127500:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 191 previous similar messages Mar 15 03:06:35 fir-io7-s1 kernel: LNetError: 127840:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 15 03:06:35 fir-io7-s1 kernel: LNetError: 127840:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 15 03:07:03 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 03:07:03 fir-io7-s1 kernel: Lustre: Skipped 4 previous similar messages Mar 15 03:07:23 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 0 seconds Mar 15 03:07:23 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 460 previous similar messages Mar 15 03:08:05 fir-io7-s1 kernel: Lustre: fir-OST004a: haven't heard from client 80172396-5f7f-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6eecef8000, cur 1584266885 expire 1584266735 last 1584266658 Mar 15 03:08:05 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 03:14:50 fir-io7-s1 kernel: LNetError: 128124:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 03:14:50 fir-io7-s1 kernel: LNetError: 128124:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 175 previous similar messages Mar 15 03:16:35 fir-io7-s1 kernel: LNetError: 128124:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 15 03:16:35 fir-io7-s1 kernel: LNetError: 128124:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 289 previous similar messages Mar 15 03:17:24 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 0 seconds Mar 15 03:17:24 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 454 previous similar messages Mar 15 03:24:50 fir-io7-s1 kernel: LNetError: 128557:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 03:24:50 fir-io7-s1 kernel: LNetError: 128557:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 185 previous similar messages Mar 15 03:25:54 fir-io7-s1 kernel: Lustre: fir-OST004c: haven't heard from client 12f8ac81-bb1e-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c88d07dcc00, cur 1584267954 expire 1584267804 last 1584267727 Mar 15 03:25:54 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 03:26:35 fir-io7-s1 kernel: LNetError: 128395:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 15 03:26:35 fir-io7-s1 kernel: LNetError: 128395:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 15 03:26:56 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 03:26:56 fir-io7-s1 kernel: Lustre: Skipped 6 previous similar messages Mar 15 03:27:31 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 0 seconds Mar 15 03:27:31 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 401 previous similar messages Mar 15 03:34:50 fir-io7-s1 kernel: LNetError: 107845:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 03:34:50 fir-io7-s1 kernel: LNetError: 107845:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 188 previous similar messages Mar 15 03:36:12 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 03:36:12 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 03:36:35 fir-io7-s1 kernel: LNetError: 128957:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 15 03:36:35 fir-io7-s1 kernel: LNetError: 128957:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 285 previous similar messages Mar 15 03:37:25 fir-io7-s1 kernel: Lustre: fir-OST004a: haven't heard from client 0efb400c-bcf1-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c698f202000, cur 1584268645 expire 1584268495 last 1584268418 Mar 15 03:37:25 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 03:37:32 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 7 seconds Mar 15 03:37:32 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 399 previous similar messages Mar 15 03:44:50 fir-io7-s1 kernel: LNetError: 129337:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 03:44:50 fir-io7-s1 kernel: LNetError: 129337:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 161 previous similar messages Mar 15 03:46:40 fir-io7-s1 kernel: LNetError: 129337:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 15 03:46:40 fir-io7-s1 kernel: LNetError: 129337:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 290 previous similar messages Mar 15 03:47:34 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.217@o2ib7: 0 seconds Mar 15 03:47:34 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 428 previous similar messages Mar 15 03:48:10 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 03:48:10 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 03:49:12 fir-io7-s1 kernel: Lustre: fir-OST0048: haven't heard from client 1836b3d2-51b1-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c4eb7ac0000, cur 1584269352 expire 1584269202 last 1584269125 Mar 15 03:49:12 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 03:54:50 fir-io7-s1 kernel: LNetError: 129641:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 03:54:50 fir-io7-s1 kernel: LNetError: 129641:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 196 previous similar messages Mar 15 03:56:45 fir-io7-s1 kernel: LNetError: 129641:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 15 03:56:45 fir-io7-s1 kernel: LNetError: 129641:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 284 previous similar messages Mar 15 03:57:36 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 6 seconds Mar 15 03:57:36 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 339 previous similar messages Mar 15 03:58:58 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 03:58:58 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 03:59:53 fir-io7-s1 kernel: Lustre: fir-OST004e: haven't heard from client 4f72a9c4-e1b7-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c7a37863800, cur 1584269993 expire 1584269843 last 1584269766 Mar 15 03:59:53 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 04:04:50 fir-io7-s1 kernel: LNetError: 129718:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 04:04:50 fir-io7-s1 kernel: LNetError: 129718:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 208 previous similar messages Mar 15 04:06:45 fir-io7-s1 kernel: LNetError: 129848:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 15 04:06:45 fir-io7-s1 kernel: LNetError: 129848:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 329 previous similar messages Mar 15 04:07:45 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 0 seconds Mar 15 04:07:45 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 435 previous similar messages Mar 15 04:10:33 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 04:10:33 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 04:11:33 fir-io7-s1 kernel: Lustre: fir-OST004c: haven't heard from client e952c218-a602-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c70682c0000, cur 1584270693 expire 1584270543 last 1584270466 Mar 15 04:11:33 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 04:14:50 fir-io7-s1 kernel: LNetError: 130249:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 04:14:50 fir-io7-s1 kernel: LNetError: 130249:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 193 previous similar messages Mar 15 04:16:45 fir-io7-s1 kernel: LNetError: 130559:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 15 04:16:45 fir-io7-s1 kernel: LNetError: 130559:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 15 04:17:55 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.233@o2ib7: 0 seconds Mar 15 04:17:55 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 111 previous similar messages Mar 15 04:21:38 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 04:21:38 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 04:22:43 fir-io7-s1 kernel: Lustre: fir-OST004c: haven't heard from client 39f6fee9-e3c9-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c79ccd05400, cur 1584271363 expire 1584271213 last 1584271136 Mar 15 04:22:43 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 04:24:50 fir-io7-s1 kernel: LNetError: 130559:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 04:24:50 fir-io7-s1 kernel: LNetError: 130559:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 175 previous similar messages Mar 15 04:26:50 fir-io7-s1 kernel: LNetError: 130940:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 15 04:26:50 fir-io7-s1 kernel: LNetError: 130940:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 290 previous similar messages Mar 15 04:28:06 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 5 seconds Mar 15 04:28:06 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 263 previous similar messages Mar 15 04:32:46 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 04:32:46 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 04:32:58 fir-io7-s1 kernel: Lustre: fir-OST0048: haven't heard from client 77f09a93-d803-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6f1c5d0400, cur 1584271978 expire 1584271828 last 1584271751 Mar 15 04:32:58 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 04:34:56 fir-io7-s1 kernel: LNetError: 489:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 04:34:56 fir-io7-s1 kernel: LNetError: 489:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 198 previous similar messages Mar 15 04:36:56 fir-io7-s1 kernel: LNetError: 489:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 15 04:36:56 fir-io7-s1 kernel: LNetError: 489:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 309 previous similar messages Mar 15 04:38:15 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.237@o2ib7: 0 seconds Mar 15 04:38:15 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 434 previous similar messages Mar 15 04:41:10 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 04:41:10 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 04:42:25 fir-io7-s1 kernel: Lustre: fir-OST0052: haven't heard from client a347e25a-d7aa-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6c0009a400, cur 1584272545 expire 1584272395 last 1584272318 Mar 15 04:42:25 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 04:44:56 fir-io7-s1 kernel: LNetError: 489:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 04:44:56 fir-io7-s1 kernel: LNetError: 489:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 172 previous similar messages Mar 15 04:47:06 fir-io7-s1 kernel: LNetError: 948:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 15 04:47:06 fir-io7-s1 kernel: LNetError: 948:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 15 04:48:21 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.233@o2ib7: 5 seconds Mar 15 04:48:21 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 450 previous similar messages Mar 15 04:51:37 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 04:51:37 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 04:52:29 fir-io7-s1 kernel: Lustre: fir-OST004e: haven't heard from client cd014692-bc2b-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c52c8f00400, cur 1584273149 expire 1584272999 last 1584272922 Mar 15 04:52:29 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 04:54:56 fir-io7-s1 kernel: LNetError: 948:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 04:54:56 fir-io7-s1 kernel: LNetError: 948:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 182 previous similar messages Mar 15 04:57:11 fir-io7-s1 kernel: LNetError: 1327:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 15 04:57:11 fir-io7-s1 kernel: LNetError: 1327:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 15 04:58:31 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 1 seconds Mar 15 04:58:31 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 420 previous similar messages Mar 15 05:01:36 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 05:01:36 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 05:02:56 fir-io7-s1 kernel: Lustre: fir-OST004a: haven't heard from client f56f9f8e-86dc-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6990e64000, cur 1584273776 expire 1584273626 last 1584273549 Mar 15 05:02:56 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 05:04:56 fir-io7-s1 kernel: LNetError: 1327:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 05:04:56 fir-io7-s1 kernel: LNetError: 1327:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 193 previous similar messages Mar 15 05:07:11 fir-io7-s1 kernel: LNetError: 1724:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 15 05:07:11 fir-io7-s1 kernel: LNetError: 1724:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 279 previous similar messages Mar 15 05:08:35 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.233@o2ib7: 0 seconds Mar 15 05:08:35 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 410 previous similar messages Mar 15 05:11:15 fir-io7-s1 kernel: Lustre: fir-OST0050: haven't heard from client 67357126-15f6-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c698f541000, cur 1584274275 expire 1584274125 last 1584274048 Mar 15 05:11:15 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 05:11:45 fir-io7-s1 kernel: Lustre: fir-OST004a: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 05:11:45 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 05:15:01 fir-io7-s1 kernel: LNetError: 107845:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 05:15:01 fir-io7-s1 kernel: LNetError: 107845:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 190 previous similar messages Mar 15 05:17:16 fir-io7-s1 kernel: LNetError: 1938:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 15 05:17:16 fir-io7-s1 kernel: LNetError: 1938:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 15 05:18:41 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 0 seconds Mar 15 05:18:41 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 390 previous similar messages Mar 15 05:19:03 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 05:19:03 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 05:20:09 fir-io7-s1 kernel: Lustre: fir-OST004e: haven't heard from client 7ed641c0-32ab-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c7c1a9be000, cur 1584274809 expire 1584274659 last 1584274582 Mar 15 05:20:09 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 05:25:01 fir-io7-s1 kernel: LNetError: 2474:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 05:25:01 fir-io7-s1 kernel: LNetError: 2474:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 186 previous similar messages Mar 15 05:27:21 fir-io7-s1 kernel: LNetError: 2419:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 15 05:27:21 fir-io7-s1 kernel: LNetError: 2419:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 15 05:27:39 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 05:27:39 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 05:28:42 fir-io7-s1 kernel: Lustre: fir-OST0050: haven't heard from client 653dc995-faa7-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6bfe4c8800, cur 1584275322 expire 1584275172 last 1584275095 Mar 15 05:28:42 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 05:28:51 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.237@o2ib7: 6 seconds Mar 15 05:28:51 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 402 previous similar messages Mar 15 05:35:06 fir-io7-s1 kernel: LNetError: 2628:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 05:35:06 fir-io7-s1 kernel: LNetError: 2628:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 176 previous similar messages Mar 15 05:36:08 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 05:36:08 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 05:37:18 fir-io7-s1 kernel: Lustre: fir-OST004e: haven't heard from client 33a2cf22-dba2-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c8a2f121400, cur 1584275838 expire 1584275688 last 1584275611 Mar 15 05:37:18 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 05:37:31 fir-io7-s1 kernel: LNetError: 2879:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 15 05:37:31 fir-io7-s1 kernel: LNetError: 2879:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 281 previous similar messages Mar 15 05:38:52 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.239@o2ib7: 9 seconds Mar 15 05:38:52 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 379 previous similar messages Mar 15 05:44:50 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 05:44:50 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 05:45:06 fir-io7-s1 kernel: LNetError: 2879:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 05:45:06 fir-io7-s1 kernel: LNetError: 2879:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 155 previous similar messages Mar 15 05:45:47 fir-io7-s1 kernel: Lustre: fir-OST004a: haven't heard from client f6ac8524-c6a8-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c69922e6c00, cur 1584276347 expire 1584276197 last 1584276120 Mar 15 05:45:47 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 05:47:36 fir-io7-s1 kernel: LNetError: 3262:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 15 05:47:36 fir-io7-s1 kernel: LNetError: 3262:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 15 05:48:54 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.226@o2ib7: 0 seconds Mar 15 05:48:54 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 279 previous similar messages Mar 15 05:52:30 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 05:52:30 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 05:52:49 fir-io7-s1 kernel: Lustre: fir-OST004c: haven't heard from client be002f5f-ab06-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6bfe4c8800, cur 1584276769 expire 1584276619 last 1584276542 Mar 15 05:52:49 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 05:55:06 fir-io7-s1 kernel: LNetError: 3262:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 05:55:06 fir-io7-s1 kernel: LNetError: 3262:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 132 previous similar messages Mar 15 05:57:46 fir-io7-s1 kernel: LNetError: 3641:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 15 05:57:46 fir-io7-s1 kernel: LNetError: 3641:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 15 05:59:01 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 5 seconds Mar 15 05:59:01 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 393 previous similar messages Mar 15 06:01:19 fir-io7-s1 kernel: Lustre: fir-OST0048: haven't heard from client 6ab6167f-99ad-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c698f49d800, cur 1584277279 expire 1584277129 last 1584277052 Mar 15 06:01:19 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 06:02:17 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 06:02:17 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 06:05:06 fir-io7-s1 kernel: LNetError: 3641:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 06:05:06 fir-io7-s1 kernel: LNetError: 3641:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 151 previous similar messages Mar 15 06:07:46 fir-io7-s1 kernel: LNetError: 4033:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 15 06:07:46 fir-io7-s1 kernel: LNetError: 4033:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 280 previous similar messages Mar 15 06:09:02 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 3 seconds Mar 15 06:09:02 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 399 previous similar messages Mar 15 06:09:26 fir-io7-s1 kernel: Lustre: fir-OST004c: haven't heard from client 01f55a9f-bf8d-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6a39e75400, cur 1584277766 expire 1584277616 last 1584277539 Mar 15 06:09:26 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 06:10:47 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 06:10:47 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 06:15:06 fir-io7-s1 kernel: LNetError: 4078:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 06:15:06 fir-io7-s1 kernel: LNetError: 4078:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 149 previous similar messages Mar 15 06:17:46 fir-io7-s1 kernel: LNetError: 4033:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 15 06:17:46 fir-io7-s1 kernel: LNetError: 4033:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 282 previous similar messages Mar 15 06:18:21 fir-io7-s1 kernel: Lustre: fir-OST004a: haven't heard from client 1a2ccd8e-471b-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c79be0ca800, cur 1584278301 expire 1584278151 last 1584278074 Mar 15 06:18:21 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 06:19:11 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 2 seconds Mar 15 06:19:11 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 344 previous similar messages Mar 15 06:20:38 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 06:20:38 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 06:25:08 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 06:25:08 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 144 previous similar messages Mar 15 06:27:46 fir-io7-s1 kernel: LNetError: 4509:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 15 06:27:46 fir-io7-s1 kernel: LNetError: 4509:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 280 previous similar messages Mar 15 06:27:47 fir-io7-s1 kernel: Lustre: fir-OST004e: haven't heard from client 86f41a59-093c-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c69923d6400, cur 1584278867 expire 1584278717 last 1584278640 Mar 15 06:27:47 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 06:29:14 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.217@o2ib7: 0 seconds Mar 15 06:29:14 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 480 previous similar messages Mar 15 06:35:11 fir-io7-s1 kernel: LNetError: 93731:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 06:35:11 fir-io7-s1 kernel: LNetError: 93731:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 118 previous similar messages Mar 15 06:37:46 fir-io7-s1 kernel: LNetError: 4509:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 15 06:37:46 fir-io7-s1 kernel: LNetError: 4509:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 280 previous similar messages Mar 15 06:39:15 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 06:39:15 fir-io7-s1 kernel: Lustre: Skipped 11 previous similar messages Mar 15 06:39:26 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.217@o2ib7: 0 seconds Mar 15 06:39:26 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 471 previous similar messages Mar 15 06:40:19 fir-io7-s1 kernel: Lustre: fir-OST004a: haven't heard from client 4962d33d-35a9-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c7a193d0000, cur 1584279619 expire 1584279469 last 1584279392 Mar 15 06:40:19 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 06:45:11 fir-io7-s1 kernel: LNetError: 4737:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 06:45:11 fir-io7-s1 kernel: LNetError: 4737:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 128 previous similar messages Mar 15 06:47:51 fir-io7-s1 kernel: LNetError: 5287:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 15 06:47:51 fir-io7-s1 kernel: LNetError: 5287:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 280 previous similar messages Mar 15 06:49:46 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.237@o2ib7: 5 seconds Mar 15 06:49:46 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 466 previous similar messages Mar 15 06:50:13 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 06:50:13 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 06:55:16 fir-io7-s1 kernel: LNetError: 5045:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 06:55:16 fir-io7-s1 kernel: LNetError: 5045:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 137 previous similar messages Mar 15 06:57:51 fir-io7-s1 kernel: LNetError: 5688:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 15 06:57:51 fir-io7-s1 kernel: LNetError: 5688:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 280 previous similar messages Mar 15 06:59:56 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 5 seconds Mar 15 06:59:56 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 479 previous similar messages Mar 15 07:00:58 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 07:00:58 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 07:00:59 fir-io7-s1 kernel: Lustre: fir-OST004a: haven't heard from client 6e1b00e0-6634-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c7787166400, cur 1584280859 expire 1584280709 last 1584280632 Mar 15 07:00:59 fir-io7-s1 kernel: Lustre: Skipped 11 previous similar messages Mar 15 07:05:21 fir-io7-s1 kernel: LNetError: 6068:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 07:05:21 fir-io7-s1 kernel: LNetError: 6068:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 115 previous similar messages Mar 15 07:07:51 fir-io7-s1 kernel: LNetError: 6377:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 15 07:07:51 fir-io7-s1 kernel: LNetError: 6377:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 280 previous similar messages Mar 15 07:09:57 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.227@o2ib7: 1 seconds Mar 15 07:09:57 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 494 previous similar messages Mar 15 07:11:38 fir-io7-s1 kernel: Lustre: fir-OST0048: haven't heard from client 2d9d4407-ac46-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6990e62000, cur 1584281498 expire 1584281348 last 1584281271 Mar 15 07:11:38 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 07:15:21 fir-io7-s1 kernel: LNetError: 6377:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 07:15:21 fir-io7-s1 kernel: LNetError: 6377:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 125 previous similar messages Mar 15 07:17:56 fir-io7-s1 kernel: LNetError: 6789:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 15 07:17:56 fir-io7-s1 kernel: LNetError: 6789:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 15 07:19:49 fir-io7-s1 kernel: Lustre: fir-OST004a: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 07:19:49 fir-io7-s1 kernel: Lustre: Skipped 11 previous similar messages Mar 15 07:20:06 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.237@o2ib7: 5 seconds Mar 15 07:20:06 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 452 previous similar messages Mar 15 07:20:43 fir-io7-s1 kernel: Lustre: fir-OST0052: haven't heard from client b37abbae-df9f-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c7ae1d7ac00, cur 1584282043 expire 1584281893 last 1584281816 Mar 15 07:20:43 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 07:25:21 fir-io7-s1 kernel: LNetError: 6104:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 07:25:21 fir-io7-s1 kernel: LNetError: 6104:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 113 previous similar messages Mar 15 07:27:56 fir-io7-s1 kernel: LNetError: 6789:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 15 07:27:56 fir-io7-s1 kernel: LNetError: 6789:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 280 previous similar messages Mar 15 07:30:01 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 07:30:01 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 07:30:07 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 1 seconds Mar 15 07:30:07 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 436 previous similar messages Mar 15 07:30:51 fir-io7-s1 kernel: Lustre: fir-OST004e: haven't heard from client cc9651f1-f2ee-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c698e431c00, cur 1584282651 expire 1584282501 last 1584282424 Mar 15 07:30:51 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 07:35:31 fir-io7-s1 kernel: LNetError: 7274:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 07:35:31 fir-io7-s1 kernel: LNetError: 7274:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 121 previous similar messages Mar 15 07:38:06 fir-io7-s1 kernel: LNetError: 7561:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 15 07:38:06 fir-io7-s1 kernel: LNetError: 7561:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 15 07:40:16 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 0 seconds Mar 15 07:40:16 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 450 previous similar messages Mar 15 07:45:31 fir-io7-s1 kernel: LNetError: 106627:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 07:45:31 fir-io7-s1 kernel: LNetError: 106627:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 132 previous similar messages Mar 15 07:48:11 fir-io7-s1 kernel: LNetError: 8010:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 15 07:48:11 fir-io7-s1 kernel: LNetError: 8010:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 293 previous similar messages Mar 15 07:48:32 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 07:48:32 fir-io7-s1 kernel: Lustre: Skipped 11 previous similar messages Mar 15 07:49:36 fir-io7-s1 kernel: Lustre: fir-OST0050: haven't heard from client d50e7b27-353f-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6f5178dc00, cur 1584283776 expire 1584283626 last 1584283549 Mar 15 07:49:36 fir-io7-s1 kernel: Lustre: Skipped 11 previous similar messages Mar 15 07:50:26 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 5 seconds Mar 15 07:50:26 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 475 previous similar messages Mar 15 07:55:31 fir-io7-s1 kernel: LNetError: 7932:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 07:55:31 fir-io7-s1 kernel: LNetError: 7932:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 120 previous similar messages Mar 15 07:58:11 fir-io7-s1 kernel: LNetError: 7932:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 15 07:58:11 fir-io7-s1 kernel: LNetError: 7932:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 280 previous similar messages Mar 15 08:00:28 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 0 seconds Mar 15 08:00:28 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 459 previous similar messages Mar 15 08:03:54 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 08:03:54 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 08:04:53 fir-io7-s1 kernel: Lustre: fir-OST004a: haven't heard from client 720f2229-dc43-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c7d9152f800, cur 1584284693 expire 1584284543 last 1584284466 Mar 15 08:04:53 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 08:05:36 fir-io7-s1 kernel: LNetError: 8434:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 08:05:36 fir-io7-s1 kernel: LNetError: 8434:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 142 previous similar messages Mar 15 08:08:21 fir-io7-s1 kernel: LNetError: 98374:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 15 08:08:21 fir-io7-s1 kernel: LNetError: 98374:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 307 previous similar messages Mar 15 08:10:31 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.237@o2ib7: 0 seconds Mar 15 08:10:31 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 475 previous similar messages Mar 15 08:15:36 fir-io7-s1 kernel: LNetError: 8489:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 08:15:36 fir-io7-s1 kernel: LNetError: 8489:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 143 previous similar messages Mar 15 08:17:22 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 08:17:22 fir-io7-s1 kernel: Lustre: Skipped 11 previous similar messages Mar 15 08:18:13 fir-io7-s1 kernel: Lustre: fir-OST004a: haven't heard from client 35463a64-d69a-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c8a25358400, cur 1584285493 expire 1584285343 last 1584285266 Mar 15 08:18:13 fir-io7-s1 kernel: Lustre: Skipped 11 previous similar messages Mar 15 08:18:26 fir-io7-s1 kernel: LNetError: 8010:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 15 08:18:26 fir-io7-s1 kernel: LNetError: 8010:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 293 previous similar messages Mar 15 08:20:41 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.217@o2ib7: 5 seconds Mar 15 08:20:41 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 452 previous similar messages Mar 15 08:25:40 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 08:25:40 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 161 previous similar messages Mar 15 08:28:31 fir-io7-s1 kernel: LNetError: 7946:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 15 08:28:31 fir-io7-s1 kernel: LNetError: 7946:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 284 previous similar messages Mar 15 08:30:51 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.239@o2ib7: 2 seconds Mar 15 08:30:51 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 115 previous similar messages Mar 15 08:31:21 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 08:31:21 fir-io7-s1 kernel: Lustre: Skipped 11 previous similar messages Mar 15 08:32:19 fir-io7-s1 kernel: Lustre: fir-OST0048: haven't heard from client cab69c6e-ca40-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6d026e1000, cur 1584286339 expire 1584286189 last 1584286112 Mar 15 08:32:19 fir-io7-s1 kernel: Lustre: Skipped 11 previous similar messages Mar 15 08:35:46 fir-io7-s1 kernel: LNetError: 9411:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 08:35:46 fir-io7-s1 kernel: LNetError: 9411:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 149 previous similar messages Mar 15 08:38:41 fir-io7-s1 kernel: LNetError: 9411:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 15 08:38:41 fir-io7-s1 kernel: LNetError: 9411:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 289 previous similar messages Mar 15 08:40:56 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.219@o2ib7: 5 seconds Mar 15 08:40:56 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 306 previous similar messages Mar 15 08:45:46 fir-io7-s1 kernel: LNetError: 8489:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 08:45:46 fir-io7-s1 kernel: LNetError: 8489:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 158 previous similar messages Mar 15 08:47:56 fir-io7-s1 kernel: Lustre: fir-OST004a: haven't heard from client 8d3e1f60-72d8-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c79cd69dc00, cur 1584287276 expire 1584287126 last 1584287049 Mar 15 08:47:56 fir-io7-s1 kernel: Lustre: Skipped 11 previous similar messages Mar 15 08:48:51 fir-io7-s1 kernel: LNetError: 8010:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 15 08:48:51 fir-io7-s1 kernel: LNetError: 8010:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 286 previous similar messages Mar 15 08:49:25 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 08:49:25 fir-io7-s1 kernel: Lustre: Skipped 11 previous similar messages Mar 15 08:50:57 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds Mar 15 08:50:57 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 359 previous similar messages Mar 15 08:55:51 fir-io7-s1 kernel: LNetError: 8010:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 08:55:51 fir-io7-s1 kernel: LNetError: 8010:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 163 previous similar messages Mar 15 08:59:01 fir-io7-s1 kernel: LNetError: 8010:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 15 08:59:01 fir-io7-s1 kernel: LNetError: 8010:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 293 previous similar messages Mar 15 09:00:04 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 09:00:04 fir-io7-s1 kernel: Lustre: Skipped 11 previous similar messages Mar 15 09:00:53 fir-io7-s1 kernel: Lustre: fir-OST004e: haven't heard from client 3c6b11b7-9a85-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c698f663800, cur 1584288053 expire 1584287903 last 1584287826 Mar 15 09:00:53 fir-io7-s1 kernel: Lustre: Skipped 11 previous similar messages Mar 15 09:00:59 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds Mar 15 09:00:59 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 378 previous similar messages Mar 15 09:05:51 fir-io7-s1 kernel: LNetError: 8010:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 09:05:51 fir-io7-s1 kernel: LNetError: 8010:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 162 previous similar messages Mar 15 09:09:01 fir-io7-s1 kernel: LNetError: 8010:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 15 09:09:01 fir-io7-s1 kernel: LNetError: 8010:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 286 previous similar messages Mar 15 09:11:00 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.238@o2ib7: 0 seconds Mar 15 09:11:00 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 396 previous similar messages Mar 15 09:13:18 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 09:13:18 fir-io7-s1 kernel: Lustre: Skipped 10 previous similar messages Mar 15 09:14:29 fir-io7-s1 kernel: Lustre: fir-OST0048: haven't heard from client 642b308b-2a74-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c7319cf5c00, cur 1584288869 expire 1584288719 last 1584288642 Mar 15 09:14:29 fir-io7-s1 kernel: Lustre: Skipped 11 previous similar messages Mar 15 09:15:51 fir-io7-s1 kernel: LNetError: 10183:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 09:15:51 fir-io7-s1 kernel: LNetError: 10183:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 167 previous similar messages Mar 15 09:19:01 fir-io7-s1 kernel: LNetError: 11431:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 15 09:19:01 fir-io7-s1 kernel: LNetError: 11431:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 280 previous similar messages Mar 15 09:21:11 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 5 seconds Mar 15 09:21:11 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 349 previous similar messages Mar 15 09:25:56 fir-io7-s1 kernel: LNetError: 11431:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 09:25:56 fir-io7-s1 kernel: LNetError: 11431:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 171 previous similar messages Mar 15 09:27:00 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 09:27:00 fir-io7-s1 kernel: Lustre: Skipped 11 previous similar messages Mar 15 09:28:10 fir-io7-s1 kernel: Lustre: fir-OST0050: haven't heard from client 12946e92-3d2c-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c740c7cdc00, cur 1584289690 expire 1584289540 last 1584289463 Mar 15 09:28:10 fir-io7-s1 kernel: Lustre: Skipped 11 previous similar messages Mar 15 09:29:01 fir-io7-s1 kernel: LNetError: 11824:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 15 09:29:01 fir-io7-s1 kernel: LNetError: 11824:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 280 previous similar messages Mar 15 09:31:14 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.235@o2ib7: 1 seconds Mar 15 09:31:14 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 405 previous similar messages Mar 15 09:35:56 fir-io7-s1 kernel: LNetError: 11824:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 09:35:56 fir-io7-s1 kernel: LNetError: 11824:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 175 previous similar messages Mar 15 09:39:06 fir-io7-s1 kernel: LNetError: 11824:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 15 09:39:06 fir-io7-s1 kernel: LNetError: 11824:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 280 previous similar messages Mar 15 09:41:21 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.235@o2ib7: 0 seconds Mar 15 09:41:21 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 470 previous similar messages Mar 15 09:42:30 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 09:42:30 fir-io7-s1 kernel: Lustre: Skipped 11 previous similar messages Mar 15 09:43:50 fir-io7-s1 kernel: Lustre: fir-OST004c: haven't heard from client 8a04c1b4-f003-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6990409800, cur 1584290630 expire 1584290480 last 1584290403 Mar 15 09:43:50 fir-io7-s1 kernel: Lustre: Skipped 11 previous similar messages Mar 15 09:46:01 fir-io7-s1 kernel: LNetError: 11824:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 09:46:01 fir-io7-s1 kernel: LNetError: 11824:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 164 previous similar messages Mar 15 09:49:16 fir-io7-s1 kernel: LNetError: 12628:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 15 09:49:16 fir-io7-s1 kernel: LNetError: 12628:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 280 previous similar messages Mar 15 09:51:31 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.238@o2ib7: 0 seconds Mar 15 09:51:31 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 619 previous similar messages Mar 15 09:56:01 fir-io7-s1 kernel: LNetError: 11825:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 09:56:01 fir-io7-s1 kernel: LNetError: 11825:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 157 previous similar messages Mar 15 09:59:21 fir-io7-s1 kernel: LNetError: 11825:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 15 09:59:21 fir-io7-s1 kernel: LNetError: 11825:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 286 previous similar messages Mar 15 10:01:36 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 5 seconds Mar 15 10:01:36 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 698 previous similar messages Mar 15 10:06:01 fir-io7-s1 kernel: LNetError: 11825:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 10:06:01 fir-io7-s1 kernel: LNetError: 11825:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 153 previous similar messages Mar 15 10:09:31 fir-io7-s1 kernel: LNetError: 11825:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 15 10:09:31 fir-io7-s1 kernel: LNetError: 11825:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 286 previous similar messages Mar 15 10:11:46 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 4 seconds Mar 15 10:11:46 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 842 previous similar messages Mar 15 10:16:01 fir-io7-s1 kernel: LNetError: 12628:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 10:16:01 fir-io7-s1 kernel: LNetError: 12628:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 151 previous similar messages Mar 15 10:19:41 fir-io7-s1 kernel: LNetError: 12628:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 15 10:19:41 fir-io7-s1 kernel: LNetError: 12628:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 280 previous similar messages Mar 15 10:21:57 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.239@o2ib7: 9 seconds Mar 15 10:21:57 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 840 previous similar messages Mar 15 10:24:20 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 10:24:20 fir-io7-s1 kernel: Lustre: Skipped 11 previous similar messages Mar 15 10:25:17 fir-io7-s1 kernel: Lustre: fir-OST0048: haven't heard from client dd3a034c-1a6c-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c79ceab3400, cur 1584293117 expire 1584292967 last 1584292890 Mar 15 10:25:17 fir-io7-s1 kernel: Lustre: Skipped 11 previous similar messages Mar 15 10:26:01 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 10:26:01 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 128 previous similar messages Mar 15 10:29:46 fir-io7-s1 kernel: LNetError: 14092:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 15 10:29:46 fir-io7-s1 kernel: LNetError: 14092:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 15 10:31:47 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 10:31:47 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 10:31:53 fir-io7-s1 kernel: Lustre: fir-OST004a: haven't heard from client 8d3554a7-5fd2-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c5037c97c00, cur 1584293513 expire 1584293363 last 1584293286 Mar 15 10:31:53 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 10:32:00 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.238@o2ib7: 1 seconds Mar 15 10:32:00 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 742 previous similar messages Mar 15 10:36:11 fir-io7-s1 kernel: LNetError: 14392:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 10:36:11 fir-io7-s1 kernel: LNetError: 14392:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 76 previous similar messages Mar 15 10:37:55 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 10:37:55 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 10:38:55 fir-io7-s1 kernel: Lustre: fir-OST004a: haven't heard from client 615f4aae-b409-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c698e967c00, cur 1584293935 expire 1584293785 last 1584293708 Mar 15 10:38:55 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 10:39:51 fir-io7-s1 kernel: LNetError: 14645:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 15 10:39:51 fir-io7-s1 kernel: LNetError: 14645:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 15 10:42:02 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.239@o2ib7: 7 seconds Mar 15 10:42:02 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 653 previous similar messages Mar 15 10:43:49 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 10:43:49 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 10:46:16 fir-io7-s1 kernel: LNetError: 14645:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 10:46:16 fir-io7-s1 kernel: LNetError: 14645:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 87 previous similar messages Mar 15 10:47:40 fir-io7-s1 kernel: Lustre: fir-OST004c: haven't heard from client 062ee0a6-a9ea-4 (at 10.50.9.27@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c71d99a0800, cur 1584294460 expire 1584294310 last 1584294233 Mar 15 10:47:40 fir-io7-s1 kernel: Lustre: Skipped 11 previous similar messages Mar 15 10:50:01 fir-io7-s1 kernel: LNetError: 14645:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 15 10:50:01 fir-io7-s1 kernel: LNetError: 14645:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 15 10:52:03 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.235@o2ib7: 1 seconds Mar 15 10:52:03 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 654 previous similar messages Mar 15 10:56:16 fir-io7-s1 kernel: LNetError: 15360:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 10:56:16 fir-io7-s1 kernel: LNetError: 15360:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 79 previous similar messages Mar 15 11:00:06 fir-io7-s1 kernel: LNetError: 14645:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 15 11:00:06 fir-io7-s1 kernel: LNetError: 14645:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 280 previous similar messages Mar 15 11:01:23 fir-io7-s1 kernel: Lustre: fir-OST004c: haven't heard from client 167f4ce2-d66c-4 (at 10.50.9.27@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c698e40cc00, cur 1584295283 expire 1584295133 last 1584295056 Mar 15 11:01:23 fir-io7-s1 kernel: Lustre: Skipped 11 previous similar messages Mar 15 11:01:49 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 2f501a16-5689-4 (at 10.50.9.27@o2ib2) Mar 15 11:01:49 fir-io7-s1 kernel: Lustre: Skipped 17 previous similar messages Mar 15 11:02:07 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.217@o2ib7: 2 seconds Mar 15 11:02:07 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 578 previous similar messages Mar 15 11:06:21 fir-io7-s1 kernel: LNetError: 15564:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 11:06:21 fir-io7-s1 kernel: LNetError: 15564:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 112 previous similar messages Mar 15 11:10:16 fir-io7-s1 kernel: LNetError: 15809:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 15 11:10:16 fir-io7-s1 kernel: LNetError: 15809:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 15 11:12:17 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.217@o2ib7: 6 seconds Mar 15 11:12:17 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 743 previous similar messages Mar 15 11:12:35 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 11:12:35 fir-io7-s1 kernel: Lustre: Skipped 11 previous similar messages Mar 15 11:12:41 fir-io7-s1 kernel: Lustre: fir-OST0050: haven't heard from client 74037cc7-ef53-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c71d99a2800, cur 1584295961 expire 1584295811 last 1584295734 Mar 15 11:12:41 fir-io7-s1 kernel: Lustre: Skipped 11 previous similar messages Mar 15 11:16:26 fir-io7-s1 kernel: LNetError: 15809:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 11:16:26 fir-io7-s1 kernel: LNetError: 15809:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 87 previous similar messages Mar 15 11:20:21 fir-io7-s1 kernel: LNetError: 16186:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 15 11:20:21 fir-io7-s1 kernel: LNetError: 16186:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 280 previous similar messages Mar 15 11:22:18 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.217@o2ib7: 0 seconds Mar 15 11:22:18 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 695 previous similar messages Mar 15 11:22:42 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 11:22:42 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 11:23:05 fir-io7-s1 kernel: Lustre: fir-OST0050: haven't heard from client 20b1c5bf-d0b1-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6990a9a000, cur 1584296585 expire 1584296435 last 1584296358 Mar 15 11:23:05 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 11:26:31 fir-io7-s1 kernel: LNetError: 16441:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 11:26:31 fir-io7-s1 kernel: LNetError: 16441:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 88 previous similar messages Mar 15 11:30:21 fir-io7-s1 kernel: LNetError: 16441:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 15 11:30:21 fir-io7-s1 kernel: LNetError: 16441:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 280 previous similar messages Mar 15 11:32:22 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.237@o2ib7: 6 seconds Mar 15 11:32:22 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 706 previous similar messages Mar 15 11:36:31 fir-io7-s1 kernel: LNetError: 16718:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 11:36:31 fir-io7-s1 kernel: LNetError: 16718:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 91 previous similar messages Mar 15 11:38:23 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 2f501a16-5689-4 (at 10.50.9.27@o2ib2) Mar 15 11:38:23 fir-io7-s1 kernel: Lustre: Skipped 11 previous similar messages Mar 15 11:40:26 fir-io7-s1 kernel: LNetError: 16951:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 15 11:40:26 fir-io7-s1 kernel: LNetError: 16951:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 280 previous similar messages Mar 15 11:42:27 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 7 seconds Mar 15 11:42:27 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 719 previous similar messages Mar 15 11:43:37 fir-io7-s1 kernel: Lustre: fir-OST0052: haven't heard from client faf7a4b7-afcd-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c79cef46400, cur 1584297817 expire 1584297667 last 1584297590 Mar 15 11:43:37 fir-io7-s1 kernel: Lustre: Skipped 17 previous similar messages Mar 15 11:46:31 fir-io7-s1 kernel: LNetError: 16951:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 11:46:31 fir-io7-s1 kernel: LNetError: 16951:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 89 previous similar messages Mar 15 11:50:26 fir-io7-s1 kernel: LNetError: 17329:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 15 11:50:26 fir-io7-s1 kernel: LNetError: 17329:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 15 11:52:28 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 2 seconds Mar 15 11:52:28 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 742 previous similar messages Mar 15 11:56:31 fir-io7-s1 kernel: LNetError: 17329:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 11:56:31 fir-io7-s1 kernel: LNetError: 17329:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 95 previous similar messages Mar 15 12:00:31 fir-io7-s1 kernel: LNetError: 17708:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 15 12:00:31 fir-io7-s1 kernel: LNetError: 17708:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 15 12:01:06 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 12:01:06 fir-io7-s1 kernel: Lustre: Skipped 11 previous similar messages Mar 15 12:02:08 fir-io7-s1 kernel: Lustre: fir-OST0052: haven't heard from client 33d38a04-292f-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c78fe142c00, cur 1584298928 expire 1584298778 last 1584298701 Mar 15 12:02:08 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 12:02:36 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.226@o2ib7: 0 seconds Mar 15 12:02:36 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 751 previous similar messages Mar 15 12:06:36 fir-io7-s1 kernel: LNetError: 17708:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 12:06:36 fir-io7-s1 kernel: LNetError: 17708:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 130 previous similar messages Mar 15 12:07:11 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 12:07:11 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 12:08:15 fir-io7-s1 kernel: Lustre: fir-OST0052: haven't heard from client d349f2ed-87e3-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6990efbc00, cur 1584299295 expire 1584299145 last 1584299068 Mar 15 12:08:15 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 12:10:41 fir-io7-s1 kernel: LNetError: 18217:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 15 12:10:41 fir-io7-s1 kernel: LNetError: 18217:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 15 12:12:37 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.226@o2ib7: 9 seconds Mar 15 12:12:37 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 648 previous similar messages Mar 15 12:16:36 fir-io7-s1 kernel: LNetError: 11825:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 12:16:36 fir-io7-s1 kernel: LNetError: 11825:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 111 previous similar messages Mar 15 12:20:51 fir-io7-s1 kernel: LNetError: 11825:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 15 12:20:51 fir-io7-s1 kernel: LNetError: 11825:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 300 previous similar messages Mar 15 12:22:37 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.232@o2ib7: 2 seconds Mar 15 12:22:37 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 662 previous similar messages Mar 15 12:23:57 fir-io7-s1 kernel: Lustre: fir-OST004a: haven't heard from client 574bdb3b-2734-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c7acf46f000, cur 1584300237 expire 1584300087 last 1584300010 Mar 15 12:23:57 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 12:24:00 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 12:24:00 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 12:26:36 fir-io7-s1 kernel: LNetError: 18610:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 12:26:36 fir-io7-s1 kernel: LNetError: 18610:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 127 previous similar messages Mar 15 12:26:56 fir-io7-s1 kernel: Lustre: fir-OST0048: haven't heard from client f8532c74-39dd-4 (at 10.50.9.27@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c5a30881400, cur 1584300416 expire 1584300266 last 1584300189 Mar 15 12:26:56 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 12:27:44 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 2f501a16-5689-4 (at 10.50.9.27@o2ib2) Mar 15 12:27:44 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 12:30:56 fir-io7-s1 kernel: LNetError: 11825:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 15 12:30:56 fir-io7-s1 kernel: LNetError: 11825:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 293 previous similar messages Mar 15 12:32:42 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.232@o2ib7: 7 seconds Mar 15 12:32:42 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 680 previous similar messages Mar 15 12:36:35 fir-io7-s1 kernel: Lustre: fir-OST0048: haven't heard from client a89de017-aa40-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c698f1acc00, cur 1584300995 expire 1584300845 last 1584300768 Mar 15 12:36:35 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 12:36:41 fir-io7-s1 kernel: LNetError: 18568:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 12:36:41 fir-io7-s1 kernel: LNetError: 18568:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 100 previous similar messages Mar 15 12:37:51 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 12:37:51 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 12:40:56 fir-io7-s1 kernel: LNetError: 19542:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 15 12:40:56 fir-io7-s1 kernel: LNetError: 19542:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 280 previous similar messages Mar 15 12:42:52 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 0 seconds Mar 15 12:42:52 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 660 previous similar messages Mar 15 12:46:41 fir-io7-s1 kernel: LNetError: 19875:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 12:46:41 fir-io7-s1 kernel: LNetError: 19875:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 108 previous similar messages Mar 15 12:47:05 fir-io7-s1 kernel: Lustre: fir-OST0050: haven't heard from client 39662fb6-82d0-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c828ebdb800, cur 1584301625 expire 1584301475 last 1584301398 Mar 15 12:47:05 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 12:47:50 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 12:47:50 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 12:50:21 fir-io7-s1 kernel: Lustre: fir-OST004e: haven't heard from client 6615a5ec-95a1-4 (at 10.50.9.27@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c79cdb19400, cur 1584301821 expire 1584301671 last 1584301594 Mar 15 12:50:21 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 12:50:50 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 2f501a16-5689-4 (at 10.50.9.27@o2ib2) Mar 15 12:50:50 fir-io7-s1 kernel: Lustre: Skipped 4 previous similar messages Mar 15 12:51:06 fir-io7-s1 kernel: LNetError: 19542:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 15 12:51:06 fir-io7-s1 kernel: LNetError: 19542:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 280 previous similar messages Mar 15 12:52:57 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 1 seconds Mar 15 12:52:57 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 699 previous similar messages Mar 15 12:56:41 fir-io7-s1 kernel: LNetError: 20120:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 12:56:41 fir-io7-s1 kernel: LNetError: 20120:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 170 previous similar messages Mar 15 12:59:05 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 12:59:05 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 13:00:00 fir-io7-s1 kernel: Lustre: fir-OST0052: haven't heard from client 31edf203-9092-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c73314d8000, cur 1584302400 expire 1584302250 last 1584302173 Mar 15 13:00:00 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 13:01:11 fir-io7-s1 kernel: LNetError: 20120:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 15 13:01:11 fir-io7-s1 kernel: LNetError: 20120:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 307 previous similar messages Mar 15 13:02:58 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.217@o2ib7: 6 seconds Mar 15 13:02:58 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 494 previous similar messages Mar 15 13:06:41 fir-io7-s1 kernel: LNetError: 20120:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 13:06:41 fir-io7-s1 kernel: LNetError: 20120:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 163 previous similar messages Mar 15 13:07:04 fir-io7-s1 kernel: Lustre: fir-OST004c: haven't heard from client 9d53dab8-8b20-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c79a04dac00, cur 1584302824 expire 1584302674 last 1584302597 Mar 15 13:07:04 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 13:11:11 fir-io7-s1 kernel: LNetError: 20718:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 15 13:11:11 fir-io7-s1 kernel: LNetError: 20718:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 280 previous similar messages Mar 15 13:12:59 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 0 seconds Mar 15 13:12:59 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 725 previous similar messages Mar 15 13:15:07 fir-io7-s1 kernel: Lustre: fir-OST0052: haven't heard from client ad80f0bd-64f6-4 (at 10.50.9.27@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6990932c00, cur 1584303307 expire 1584303157 last 1584303080 Mar 15 13:15:07 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 13:15:58 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 2f501a16-5689-4 (at 10.50.9.27@o2ib2) Mar 15 13:15:58 fir-io7-s1 kernel: Lustre: Skipped 11 previous similar messages Mar 15 13:16:41 fir-io7-s1 kernel: LNetError: 20718:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 13:16:41 fir-io7-s1 kernel: LNetError: 20718:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 133 previous similar messages Mar 15 13:19:11 fir-io7-s1 kernel: Lustre: fir-OST004e: haven't heard from client 60181d4b-3870-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6f1c5d2400, cur 1584303551 expire 1584303401 last 1584303324 Mar 15 13:19:11 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 13:21:16 fir-io7-s1 kernel: LNetError: 21103:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 15 13:21:16 fir-io7-s1 kernel: LNetError: 21103:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 280 previous similar messages Mar 15 13:23:00 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.239@o2ib7: 0 seconds Mar 15 13:23:00 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 630 previous similar messages Mar 15 13:24:51 fir-io7-s1 kernel: Lustre: fir-OST0052: haven't heard from client 82b2127e-8a56-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c7787166400, cur 1584303891 expire 1584303741 last 1584303664 Mar 15 13:24:51 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 13:26:41 fir-io7-s1 kernel: LNetError: 21438:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 13:26:41 fir-io7-s1 kernel: LNetError: 21438:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 194 previous similar messages Mar 15 13:30:49 fir-io7-s1 kernel: Lustre: fir-OST004a: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 13:30:49 fir-io7-s1 kernel: Lustre: Skipped 16 previous similar messages Mar 15 13:31:16 fir-io7-s1 kernel: LNetError: 21438:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 15 13:31:16 fir-io7-s1 kernel: LNetError: 21438:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 282 previous similar messages Mar 15 13:32:08 fir-io7-s1 kernel: Lustre: fir-OST0048: haven't heard from client bb9e85d3-40f4-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c7ee28f9c00, cur 1584304328 expire 1584304178 last 1584304101 Mar 15 13:32:08 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 13:33:01 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 0 seconds Mar 15 13:33:01 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 492 previous similar messages Mar 15 13:36:46 fir-io7-s1 kernel: LNetError: 21665:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 13:36:46 fir-io7-s1 kernel: LNetError: 21665:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 218 previous similar messages Mar 15 13:37:56 fir-io7-s1 kernel: Lustre: fir-OST0052: haven't heard from client 83177aa0-d276-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c80214ec000, cur 1584304676 expire 1584304526 last 1584304449 Mar 15 13:37:56 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 13:41:16 fir-io7-s1 kernel: LNetError: 21875:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 15 13:41:16 fir-io7-s1 kernel: LNetError: 21875:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 275 previous similar messages Mar 15 13:43:07 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.237@o2ib7: 4 seconds Mar 15 13:43:07 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 523 previous similar messages Mar 15 13:46:51 fir-io7-s1 kernel: LNetError: 21875:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 13:46:51 fir-io7-s1 kernel: LNetError: 21875:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 219 previous similar messages Mar 15 13:48:32 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 13:48:32 fir-io7-s1 kernel: Lustre: Skipped 18 previous similar messages Mar 15 13:48:42 fir-io7-s1 kernel: Lustre: fir-OST004c: haven't heard from client ce6a8c81-f468-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c70682c6c00, cur 1584305322 expire 1584305172 last 1584305095 Mar 15 13:48:42 fir-io7-s1 kernel: Lustre: Skipped 11 previous similar messages Mar 15 13:51:16 fir-io7-s1 kernel: LNetError: 21875:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 15 13:51:16 fir-io7-s1 kernel: LNetError: 21875:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 280 previous similar messages Mar 15 13:53:12 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 2 seconds Mar 15 13:53:12 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 535 previous similar messages Mar 15 13:56:56 fir-io7-s1 kernel: LNetError: 21875:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 13:56:56 fir-io7-s1 kernel: LNetError: 21875:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 143 previous similar messages Mar 15 13:59:05 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 13:59:05 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 14:00:16 fir-io7-s1 kernel: Lustre: fir-OST0048: haven't heard from client 915eeacb-ee1f-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c79e4c40c00, cur 1584306016 expire 1584305866 last 1584305789 Mar 15 14:00:16 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 14:01:21 fir-io7-s1 kernel: LNetError: 22645:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 15 14:01:21 fir-io7-s1 kernel: LNetError: 22645:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 292 previous similar messages Mar 15 14:03:16 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.239@o2ib7: 0 seconds Mar 15 14:03:16 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 748 previous similar messages Mar 15 14:06:56 fir-io7-s1 kernel: LNetError: 22645:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 14:06:56 fir-io7-s1 kernel: LNetError: 22645:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 152 previous similar messages Mar 15 14:11:19 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 14:11:19 fir-io7-s1 kernel: Lustre: Skipped 17 previous similar messages Mar 15 14:11:26 fir-io7-s1 kernel: LNetError: 23047:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 15 14:11:26 fir-io7-s1 kernel: LNetError: 23047:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 285 previous similar messages Mar 15 14:12:23 fir-io7-s1 kernel: Lustre: fir-OST0048: haven't heard from client 89d509eb-ad03-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c698e8a1800, cur 1584306743 expire 1584306593 last 1584306516 Mar 15 14:12:23 fir-io7-s1 kernel: Lustre: Skipped 17 previous similar messages Mar 15 14:13:27 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.217@o2ib7: 5 seconds Mar 15 14:13:27 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 725 previous similar messages Mar 15 14:16:56 fir-io7-s1 kernel: LNetError: 23047:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 14:16:56 fir-io7-s1 kernel: LNetError: 23047:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 106 previous similar messages Mar 15 14:21:36 fir-io7-s1 kernel: LNetError: 23047:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 15 14:21:36 fir-io7-s1 kernel: LNetError: 23047:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 280 previous similar messages Mar 15 14:23:32 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.217@o2ib7: 6 seconds Mar 15 14:23:32 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 816 previous similar messages Mar 15 14:26:56 fir-io7-s1 kernel: LNetError: 23047:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 14:26:56 fir-io7-s1 kernel: LNetError: 23047:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 118 previous similar messages Mar 15 14:30:06 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 14:30:06 fir-io7-s1 kernel: Lustre: Skipped 10 previous similar messages Mar 15 14:31:18 fir-io7-s1 kernel: Lustre: fir-OST004a: haven't heard from client 2e658739-61c1-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6c3ab74000, cur 1584307878 expire 1584307728 last 1584307651 Mar 15 14:31:18 fir-io7-s1 kernel: Lustre: Skipped 11 previous similar messages Mar 15 14:31:46 fir-io7-s1 kernel: LNetError: 23814:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 15 14:31:46 fir-io7-s1 kernel: LNetError: 23814:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 15 14:33:33 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 1 seconds Mar 15 14:33:33 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 726 previous similar messages Mar 15 14:36:56 fir-io7-s1 kernel: LNetError: 23814:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 14:36:56 fir-io7-s1 kernel: LNetError: 23814:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 132 previous similar messages Mar 15 14:41:16 fir-io7-s1 kernel: Lustre: fir-OST004a: Connection restored to 2f501a16-5689-4 (at 10.50.9.27@o2ib2) Mar 15 14:41:16 fir-io7-s1 kernel: Lustre: Skipped 10 previous similar messages Mar 15 14:41:56 fir-io7-s1 kernel: LNetError: 23814:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 15 14:41:56 fir-io7-s1 kernel: LNetError: 23814:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 285 previous similar messages Mar 15 14:43:42 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 7 seconds Mar 15 14:43:42 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 751 previous similar messages Mar 15 14:46:56 fir-io7-s1 kernel: LNetError: 24404:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 14:46:56 fir-io7-s1 kernel: LNetError: 24404:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 124 previous similar messages Mar 15 14:47:50 fir-io7-s1 kernel: Lustre: fir-OST0048: haven't heard from client 94878e03-ffe3-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c69c1498400, cur 1584308870 expire 1584308720 last 1584308643 Mar 15 14:47:50 fir-io7-s1 kernel: Lustre: Skipped 17 previous similar messages Mar 15 14:52:01 fir-io7-s1 kernel: LNetError: 23814:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 15 14:52:01 fir-io7-s1 kernel: LNetError: 23814:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 280 previous similar messages Mar 15 14:53:10 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 14:53:10 fir-io7-s1 kernel: Lustre: Skipped 12 previous similar messages Mar 15 14:53:47 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.237@o2ib7: 5 seconds Mar 15 14:53:47 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 821 previous similar messages Mar 15 14:57:00 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 14:57:00 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 129 previous similar messages Mar 15 15:02:01 fir-io7-s1 kernel: LNetError: 24963:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 15 15:02:01 fir-io7-s1 kernel: LNetError: 24963:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 280 previous similar messages Mar 15 15:03:57 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.217@o2ib7: 5 seconds Mar 15 15:03:57 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 821 previous similar messages Mar 15 15:05:54 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 15:05:54 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 15:07:00 fir-io7-s1 kernel: Lustre: fir-OST004e: haven't heard from client 6dc0d60c-8ac7-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c7c1a9b9c00, cur 1584310020 expire 1584309870 last 1584309793 Mar 15 15:07:00 fir-io7-s1 kernel: Lustre: Skipped 11 previous similar messages Mar 15 15:07:01 fir-io7-s1 kernel: LNetError: 24963:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 15:07:01 fir-io7-s1 kernel: LNetError: 24963:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 123 previous similar messages Mar 15 15:12:06 fir-io7-s1 kernel: LNetError: 24963:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 15 15:12:06 fir-io7-s1 kernel: LNetError: 24963:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 15 15:14:07 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.217@o2ib7: 7 seconds Mar 15 15:14:07 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 832 previous similar messages Mar 15 15:17:06 fir-io7-s1 kernel: LNetError: 98374:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 15:17:06 fir-io7-s1 kernel: LNetError: 98374:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 134 previous similar messages Mar 15 15:18:47 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 15:18:47 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 15:19:44 fir-io7-s1 kernel: Lustre: fir-OST004e: haven't heard from client 8ceb958e-032c-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c698e434400, cur 1584310784 expire 1584310634 last 1584310557 Mar 15 15:19:44 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 15:22:06 fir-io7-s1 kernel: LNetError: 25577:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 15 15:22:06 fir-io7-s1 kernel: LNetError: 25577:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 273 previous similar messages Mar 15 15:24:17 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 9 seconds Mar 15 15:24:17 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 822 previous similar messages Mar 15 15:27:16 fir-io7-s1 kernel: LNetError: 25959:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 15:27:16 fir-io7-s1 kernel: LNetError: 25959:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 130 previous similar messages Mar 15 15:32:11 fir-io7-s1 kernel: LNetError: 25959:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 15 15:32:11 fir-io7-s1 kernel: LNetError: 25959:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 280 previous similar messages Mar 15 15:33:52 fir-io7-s1 kernel: Lustre: fir-OST004a: haven't heard from client 22c790a7-8451-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6fd5226000, cur 1584311632 expire 1584311482 last 1584311405 Mar 15 15:33:52 fir-io7-s1 kernel: Lustre: Skipped 11 previous similar messages Mar 15 15:33:56 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 15:33:56 fir-io7-s1 kernel: Lustre: Skipped 11 previous similar messages Mar 15 15:34:18 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.219@o2ib7: 2 seconds Mar 15 15:34:18 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 782 previous similar messages Mar 15 15:37:21 fir-io7-s1 kernel: LNetError: 25959:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 15:37:21 fir-io7-s1 kernel: LNetError: 25959:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 152 previous similar messages Mar 15 15:42:21 fir-io7-s1 kernel: LNetError: 26532:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 15 15:42:21 fir-io7-s1 kernel: LNetError: 26532:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 301 previous similar messages Mar 15 15:44:19 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.227@o2ib7: 0 seconds Mar 15 15:44:19 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 713 previous similar messages Mar 15 15:44:50 fir-io7-s1 kernel: Lustre: fir-OST0052: haven't heard from client 408f6a50-8eb6-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6991eaa800, cur 1584312290 expire 1584312140 last 1584312063 Mar 15 15:44:50 fir-io7-s1 kernel: Lustre: Skipped 11 previous similar messages Mar 15 15:45:33 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 15:45:33 fir-io7-s1 kernel: Lustre: Skipped 11 previous similar messages Mar 15 15:47:21 fir-io7-s1 kernel: LNetError: 26454:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 15:47:21 fir-io7-s1 kernel: LNetError: 26454:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 174 previous similar messages Mar 15 15:52:21 fir-io7-s1 kernel: LNetError: 26532:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 15 15:52:21 fir-io7-s1 kernel: LNetError: 26532:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 307 previous similar messages Mar 15 15:54:20 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.235@o2ib7: 0 seconds Mar 15 15:54:20 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 353 previous similar messages Mar 15 15:57:21 fir-io7-s1 kernel: LNetError: 26532:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 15:57:21 fir-io7-s1 kernel: LNetError: 26532:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 214 previous similar messages Mar 15 15:58:33 fir-io7-s1 kernel: Lustre: fir-OST0048: haven't heard from client eeeff998-0fdd-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c89e5bd2000, cur 1584313113 expire 1584312963 last 1584312886 Mar 15 15:58:33 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 15:59:45 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 15:59:45 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 16:02:21 fir-io7-s1 kernel: LNetError: 27287:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 15 16:02:21 fir-io7-s1 kernel: LNetError: 27287:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 314 previous similar messages Mar 15 16:04:21 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 0 seconds Mar 15 16:04:21 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 330 previous similar messages Mar 15 16:07:21 fir-io7-s1 kernel: LNetError: 26454:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 16:07:21 fir-io7-s1 kernel: LNetError: 26454:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 243 previous similar messages Mar 15 16:10:14 fir-io7-s1 kernel: Lustre: fir-OST004c: haven't heard from client 0f9a6011-1fe8-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c70682c7000, cur 1584313814 expire 1584313664 last 1584313587 Mar 15 16:10:14 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 16:10:14 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 16:10:14 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 16:12:26 fir-io7-s1 kernel: LNetError: 27287:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 15 16:12:26 fir-io7-s1 kernel: LNetError: 27287:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 280 previous similar messages Mar 15 16:14:32 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.232@o2ib7: 1 seconds Mar 15 16:14:32 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 517 previous similar messages Mar 15 16:17:26 fir-io7-s1 kernel: LNetError: 27893:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 16:17:26 fir-io7-s1 kernel: LNetError: 27893:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 163 previous similar messages Mar 15 16:20:22 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 16:20:22 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 16:21:34 fir-io7-s1 kernel: Lustre: fir-OST004a: haven't heard from client 38486563-e1a8-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c79cef41400, cur 1584314494 expire 1584314344 last 1584314267 Mar 15 16:21:34 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 16:22:31 fir-io7-s1 kernel: LNetError: 27893:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 15 16:22:31 fir-io7-s1 kernel: LNetError: 27893:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 15 16:24:33 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.232@o2ib7: 10 seconds Mar 15 16:24:33 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 608 previous similar messages Mar 15 16:27:36 fir-io7-s1 kernel: LNetError: 27893:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 16:27:36 fir-io7-s1 kernel: LNetError: 27893:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 164 previous similar messages Mar 15 16:32:31 fir-io7-s1 kernel: LNetError: 27893:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 15 16:32:31 fir-io7-s1 kernel: LNetError: 27893:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 280 previous similar messages Mar 15 16:32:53 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 16:32:53 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 16:34:00 fir-io7-s1 kernel: Lustre: fir-OST0052: haven't heard from client 9567bfcb-3519-4 (at 10.50.9.27@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c7659e8e400, cur 1584315240 expire 1584315090 last 1584315013 Mar 15 16:34:00 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 16:34:37 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.219@o2ib7: 6 seconds Mar 15 16:34:37 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 630 previous similar messages Mar 15 16:37:36 fir-io7-s1 kernel: LNetError: 28692:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 16:37:36 fir-io7-s1 kernel: LNetError: 28692:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 168 previous similar messages Mar 15 16:42:31 fir-io7-s1 kernel: LNetError: 27893:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 15 16:42:31 fir-io7-s1 kernel: LNetError: 27893:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 15 16:44:47 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 6 seconds Mar 15 16:44:47 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 659 previous similar messages Mar 15 16:47:36 fir-io7-s1 kernel: LNetError: 29036:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 16:47:36 fir-io7-s1 kernel: LNetError: 29036:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 171 previous similar messages Mar 15 16:50:42 fir-io7-s1 kernel: Lustre: fir-OST004e: haven't heard from client b06e9d07-b8e6-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c69c14ac800, cur 1584316242 expire 1584316092 last 1584316015 Mar 15 16:50:42 fir-io7-s1 kernel: Lustre: Skipped 17 previous similar messages Mar 15 16:50:43 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 16:50:43 fir-io7-s1 kernel: Lustre: Skipped 17 previous similar messages Mar 15 16:52:41 fir-io7-s1 kernel: LNetError: 29224:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 15 16:52:41 fir-io7-s1 kernel: LNetError: 29224:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 292 previous similar messages Mar 15 16:54:52 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 5 seconds Mar 15 16:54:52 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 518 previous similar messages Mar 15 16:57:36 fir-io7-s1 kernel: LNetError: 29506:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 16:57:36 fir-io7-s1 kernel: LNetError: 29506:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 181 previous similar messages Mar 15 17:02:16 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 17:02:16 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 17:02:51 fir-io7-s1 kernel: LNetError: 29416:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 15 17:02:51 fir-io7-s1 kernel: LNetError: 29416:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 15 17:03:18 fir-io7-s1 kernel: Lustre: fir-OST0048: haven't heard from client deb9fe4c-1bab-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c751823c800, cur 1584316998 expire 1584316848 last 1584316771 Mar 15 17:03:18 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 17:04:57 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.232@o2ib7: 0 seconds Mar 15 17:04:57 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 649 previous similar messages Mar 15 17:07:36 fir-io7-s1 kernel: LNetError: 29821:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 17:07:36 fir-io7-s1 kernel: LNetError: 29821:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 146 previous similar messages Mar 15 17:13:01 fir-io7-s1 kernel: LNetError: 29821:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 15 17:13:01 fir-io7-s1 kernel: LNetError: 29821:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 15 17:14:58 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.235@o2ib7: 1 seconds Mar 15 17:14:58 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 617 previous similar messages Mar 15 17:14:59 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 17:14:59 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 17:15:17 fir-io7-s1 kernel: Lustre: fir-OST0048: haven't heard from client b458e0b9-b2a7-4 (at 10.50.9.27@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6992de5c00, cur 1584317717 expire 1584317567 last 1584317490 Mar 15 17:15:17 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 17:17:41 fir-io7-s1 kernel: LNetError: 30218:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 17:17:41 fir-io7-s1 kernel: LNetError: 30218:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 154 previous similar messages Mar 15 17:23:01 fir-io7-s1 kernel: LNetError: 30218:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 15 17:23:01 fir-io7-s1 kernel: LNetError: 30218:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 293 previous similar messages Mar 15 17:25:00 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 0 seconds Mar 15 17:25:00 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 572 previous similar messages Mar 15 17:27:41 fir-io7-s1 kernel: LNetError: 30225:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 17:27:41 fir-io7-s1 kernel: LNetError: 30225:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 179 previous similar messages Mar 15 17:28:41 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 17:28:41 fir-io7-s1 kernel: Lustre: Skipped 11 previous similar messages Mar 15 17:28:49 fir-io7-s1 kernel: Lustre: fir-OST004e: haven't heard from client ef0cba82-1a1c-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6992cecc00, cur 1584318529 expire 1584318379 last 1584318302 Mar 15 17:28:49 fir-io7-s1 kernel: Lustre: Skipped 11 previous similar messages Mar 15 17:33:11 fir-io7-s1 kernel: LNetError: 30218:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 15 17:33:11 fir-io7-s1 kernel: LNetError: 30218:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 15 17:35:02 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 6 seconds Mar 15 17:35:02 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 630 previous similar messages Mar 15 17:37:41 fir-io7-s1 kernel: LNetError: 30994:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 17:37:41 fir-io7-s1 kernel: LNetError: 30994:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 148 previous similar messages Mar 15 17:40:54 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 17:40:54 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 17:42:06 fir-io7-s1 kernel: Lustre: fir-OST0052: haven't heard from client 8b96fc01-90d6-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c768773ec00, cur 1584319326 expire 1584319176 last 1584319099 Mar 15 17:42:06 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 17:43:16 fir-io7-s1 kernel: LNetError: 30994:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 15 17:43:16 fir-io7-s1 kernel: LNetError: 30994:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 15 17:45:12 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.232@o2ib7: 5 seconds Mar 15 17:45:12 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 633 previous similar messages Mar 15 17:47:46 fir-io7-s1 kernel: LNetError: 31391:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 17:47:46 fir-io7-s1 kernel: LNetError: 31391:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 160 previous similar messages Mar 15 17:52:05 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 17:52:05 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 17:53:03 fir-io7-s1 kernel: Lustre: fir-OST0048: haven't heard from client ff5b3d1d-dc07-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c698eaeb000, cur 1584319983 expire 1584319833 last 1584319756 Mar 15 17:53:03 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 17:53:21 fir-io7-s1 kernel: LNetError: 31770:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 15 17:53:21 fir-io7-s1 kernel: LNetError: 31770:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 15 17:55:17 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.239@o2ib7: 1 seconds Mar 15 17:55:17 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 635 previous similar messages Mar 15 17:57:46 fir-io7-s1 kernel: LNetError: 31508:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 17:57:46 fir-io7-s1 kernel: LNetError: 31508:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 161 previous similar messages Mar 15 18:03:26 fir-io7-s1 kernel: LNetError: 7561:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 15 18:03:26 fir-io7-s1 kernel: LNetError: 7561:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 280 previous similar messages Mar 15 18:04:15 fir-io7-s1 kernel: Lustre: fir-OST004c: haven't heard from client 676cf4b5-4111-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c7319cf1c00, cur 1584320655 expire 1584320505 last 1584320428 Mar 15 18:04:15 fir-io7-s1 kernel: Lustre: Skipped 11 previous similar messages Mar 15 18:04:55 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 18:04:55 fir-io7-s1 kernel: Lustre: Skipped 10 previous similar messages Mar 15 18:05:19 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.233@o2ib7: 0 seconds Mar 15 18:05:19 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 585 previous similar messages Mar 15 18:07:46 fir-io7-s1 kernel: LNetError: 32169:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 18:07:46 fir-io7-s1 kernel: LNetError: 32169:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 180 previous similar messages Mar 15 18:13:26 fir-io7-s1 kernel: LNetError: 32489:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 15 18:13:26 fir-io7-s1 kernel: LNetError: 32489:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 336 previous similar messages Mar 15 18:15:22 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 0 seconds Mar 15 18:15:22 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 248 previous similar messages Mar 15 18:17:46 fir-io7-s1 kernel: LNetError: 107845:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 18:17:46 fir-io7-s1 kernel: LNetError: 107845:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 196 previous similar messages Mar 15 18:17:52 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 18:17:52 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 18:17:55 fir-io7-s1 kernel: Lustre: fir-OST0048: haven't heard from client 4a962491-2761-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c7646b47000, cur 1584321475 expire 1584321325 last 1584321248 Mar 15 18:17:55 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 18:23:31 fir-io7-s1 kernel: LNetError: 32909:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 15 18:23:31 fir-io7-s1 kernel: LNetError: 32909:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 317 previous similar messages Mar 15 18:25:32 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 6 seconds Mar 15 18:25:32 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 227 previous similar messages Mar 15 18:27:46 fir-io7-s1 kernel: LNetError: 32909:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 18:27:46 fir-io7-s1 kernel: LNetError: 32909:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 194 previous similar messages Mar 15 18:30:38 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 18:30:38 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 18:31:42 fir-io7-s1 kernel: Lustre: fir-OST0050: haven't heard from client 33475044-ed3e-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c4a4c009c00, cur 1584322302 expire 1584322152 last 1584322075 Mar 15 18:31:42 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 18:33:31 fir-io7-s1 kernel: LNetError: 33150:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 15 18:33:31 fir-io7-s1 kernel: LNetError: 33150:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 317 previous similar messages Mar 15 18:35:41 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.233@o2ib7: 0 seconds Mar 15 18:35:41 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 246 previous similar messages Mar 15 18:37:46 fir-io7-s1 kernel: LNetError: 33453:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 18:37:46 fir-io7-s1 kernel: LNetError: 33453:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 197 previous similar messages Mar 15 18:43:18 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 18:43:18 fir-io7-s1 kernel: Lustre: Skipped 11 previous similar messages Mar 15 18:43:41 fir-io7-s1 kernel: LNetError: 33537:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 15 18:43:41 fir-io7-s1 kernel: LNetError: 33537:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 349 previous similar messages Mar 15 18:44:28 fir-io7-s1 kernel: Lustre: fir-OST004a: haven't heard from client 2f66c7b6-88a6-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c79ced88800, cur 1584323068 expire 1584322918 last 1584322841 Mar 15 18:44:28 fir-io7-s1 kernel: Lustre: Skipped 11 previous similar messages Mar 15 18:45:55 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.238@o2ib7: 0 seconds Mar 15 18:45:55 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 245 previous similar messages Mar 15 18:47:46 fir-io7-s1 kernel: LNetError: 33537:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 18:47:46 fir-io7-s1 kernel: LNetError: 33537:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 197 previous similar messages Mar 15 18:53:46 fir-io7-s1 kernel: LNetError: 33843:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 15 18:53:46 fir-io7-s1 kernel: LNetError: 33843:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 306 previous similar messages Mar 15 18:54:58 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 18:54:58 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 18:56:16 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.226@o2ib7: 0 seconds Mar 15 18:56:16 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 169 previous similar messages Mar 15 18:56:17 fir-io7-s1 kernel: Lustre: fir-OST0048: haven't heard from client 8dbaa4df-1310-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c702cbcb400, cur 1584323777 expire 1584323627 last 1584323550 Mar 15 18:56:17 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 18:57:46 fir-io7-s1 kernel: LNetError: 34073:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 18:57:46 fir-io7-s1 kernel: LNetError: 34073:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 174 previous similar messages Mar 15 19:03:51 fir-io7-s1 kernel: LNetError: 34416:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 15 19:03:51 fir-io7-s1 kernel: LNetError: 34416:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 299 previous similar messages Mar 15 19:06:22 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.235@o2ib7: 0 seconds Mar 15 19:06:22 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 198 previous similar messages Mar 15 19:07:46 fir-io7-s1 kernel: LNetError: 34623:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 19:07:46 fir-io7-s1 kernel: LNetError: 34623:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 189 previous similar messages Mar 15 19:07:57 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 19:07:57 fir-io7-s1 kernel: Lustre: Skipped 11 previous similar messages Mar 15 19:09:00 fir-io7-s1 kernel: Lustre: fir-OST0048: haven't heard from client f872d9cc-01a1-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c698f0ab800, cur 1584324540 expire 1584324390 last 1584324313 Mar 15 19:09:00 fir-io7-s1 kernel: Lustre: Skipped 11 previous similar messages Mar 15 19:13:51 fir-io7-s1 kernel: LNetError: 34416:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 15 19:13:51 fir-io7-s1 kernel: LNetError: 34416:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 329 previous similar messages Mar 15 19:16:24 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 0 seconds Mar 15 19:16:24 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 394 previous similar messages Mar 15 19:17:51 fir-io7-s1 kernel: LNetError: 34868:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 19:17:51 fir-io7-s1 kernel: LNetError: 34868:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 187 previous similar messages Mar 15 19:19:58 fir-io7-s1 kernel: Lustre: fir-OST0050: haven't heard from client 0c24a55d-6d48-4 (at 10.50.9.27@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c70682c1000, cur 1584325198 expire 1584325048 last 1584324971 Mar 15 19:19:58 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 19:20:23 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 2f501a16-5689-4 (at 10.50.9.27@o2ib2) Mar 15 19:20:23 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 19:23:56 fir-io7-s1 kernel: LNetError: 34868:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 15 19:23:56 fir-io7-s1 kernel: LNetError: 34868:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 315 previous similar messages Mar 15 19:26:32 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 5 seconds Mar 15 19:26:32 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 320 previous similar messages Mar 15 19:27:51 fir-io7-s1 kernel: LNetError: 35309:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 19:27:51 fir-io7-s1 kernel: LNetError: 35309:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 187 previous similar messages Mar 15 19:34:01 fir-io7-s1 kernel: LNetError: 35251:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 15 19:34:01 fir-io7-s1 kernel: LNetError: 35251:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 308 previous similar messages Mar 15 19:35:18 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 19:35:18 fir-io7-s1 kernel: Lustre: Skipped 11 previous similar messages Mar 15 19:35:33 fir-io7-s1 kernel: Lustre: fir-OST004c: haven't heard from client e7a4f1fb-eed6-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c782953c000, cur 1584326133 expire 1584325983 last 1584325906 Mar 15 19:35:33 fir-io7-s1 kernel: Lustre: Skipped 11 previous similar messages Mar 15 19:36:36 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.219@o2ib7: 0 seconds Mar 15 19:36:36 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 370 previous similar messages Mar 15 19:37:51 fir-io7-s1 kernel: LNetError: 35639:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 19:37:51 fir-io7-s1 kernel: LNetError: 35639:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 187 previous similar messages Mar 15 19:44:01 fir-io7-s1 kernel: LNetError: 35975:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 15 19:44:01 fir-io7-s1 kernel: LNetError: 35975:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 360 previous similar messages Mar 15 19:46:46 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.239@o2ib7: 0 seconds Mar 15 19:46:46 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 169 previous similar messages Mar 15 19:47:03 fir-io7-s1 kernel: Lustre: fir-OST0052: haven't heard from client 02c3fc44-8921-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6992d8e800, cur 1584326823 expire 1584326673 last 1584326596 Mar 15 19:47:03 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 19:47:51 fir-io7-s1 kernel: LNetError: 35525:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 19:47:51 fir-io7-s1 kernel: LNetError: 35525:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 183 previous similar messages Mar 15 19:48:06 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 19:48:06 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 19:54:01 fir-io7-s1 kernel: LNetError: 35975:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 15 19:54:01 fir-io7-s1 kernel: LNetError: 35975:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 331 previous similar messages Mar 15 19:57:01 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 0 seconds Mar 15 19:57:01 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 310 previous similar messages Mar 15 19:57:51 fir-io7-s1 kernel: LNetError: 36401:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 19:57:51 fir-io7-s1 kernel: LNetError: 36401:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 192 previous similar messages Mar 15 19:59:10 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 19:59:10 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 20:00:16 fir-io7-s1 kernel: Lustre: fir-OST004a: haven't heard from client 0f7d2ce4-517d-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6d026e0800, cur 1584327616 expire 1584327466 last 1584327389 Mar 15 20:00:16 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 20:04:06 fir-io7-s1 kernel: LNetError: 36401:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 15 20:04:06 fir-io7-s1 kernel: LNetError: 36401:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 334 previous similar messages Mar 15 20:07:07 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.219@o2ib7: 4 seconds Mar 15 20:07:07 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 511 previous similar messages Mar 15 20:07:51 fir-io7-s1 kernel: LNetError: 36798:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 20:07:51 fir-io7-s1 kernel: LNetError: 36798:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 212 previous similar messages Mar 15 20:14:11 fir-io7-s1 kernel: LNetError: 36798:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 15 20:14:11 fir-io7-s1 kernel: LNetError: 36798:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 15 20:14:52 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 20:14:52 fir-io7-s1 kernel: Lustre: Skipped 10 previous similar messages Mar 15 20:15:43 fir-io7-s1 kernel: Lustre: fir-OST004a: haven't heard from client 7094ab63-149b-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c7fbf2b6400, cur 1584328543 expire 1584328393 last 1584328316 Mar 15 20:15:43 fir-io7-s1 kernel: Lustre: Skipped 11 previous similar messages Mar 15 20:17:13 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.238@o2ib7: 0 seconds Mar 15 20:17:13 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 394 previous similar messages Mar 15 20:17:51 fir-io7-s1 kernel: LNetError: 37177:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 20:17:51 fir-io7-s1 kernel: LNetError: 37177:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 185 previous similar messages Mar 15 20:24:11 fir-io7-s1 kernel: LNetError: 37177:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 15 20:24:11 fir-io7-s1 kernel: LNetError: 37177:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 306 previous similar messages Mar 15 20:25:23 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 20:25:23 fir-io7-s1 kernel: Lustre: Skipped 4 previous similar messages Mar 15 20:26:36 fir-io7-s1 kernel: Lustre: fir-OST0052: haven't heard from client 52f62767-8d31-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c78e1f1e400, cur 1584329196 expire 1584329046 last 1584328969 Mar 15 20:26:36 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 20:27:16 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.236@o2ib7: 0 seconds Mar 15 20:27:16 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 291 previous similar messages Mar 15 20:27:51 fir-io7-s1 kernel: LNetError: 36528:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 20:27:51 fir-io7-s1 kernel: LNetError: 36528:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 186 previous similar messages Mar 15 20:34:11 fir-io7-s1 kernel: LNetError: 37901:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 15 20:34:11 fir-io7-s1 kernel: LNetError: 37901:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 293 previous similar messages Mar 15 20:37:26 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.232@o2ib7: 0 seconds Mar 15 20:37:26 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 509 previous similar messages Mar 15 20:37:34 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 20:37:34 fir-io7-s1 kernel: Lustre: Skipped 11 previous similar messages Mar 15 20:37:51 fir-io7-s1 kernel: LNetError: 36528:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 20:37:51 fir-io7-s1 kernel: LNetError: 36528:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 194 previous similar messages Mar 15 20:38:36 fir-io7-s1 kernel: Lustre: fir-OST0050: haven't heard from client 04afba0c-55de-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c698f73d000, cur 1584329916 expire 1584329766 last 1584329689 Mar 15 20:38:36 fir-io7-s1 kernel: Lustre: Skipped 11 previous similar messages Mar 15 20:44:11 fir-io7-s1 kernel: LNetError: 107845:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 15 20:44:11 fir-io7-s1 kernel: LNetError: 107845:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 342 previous similar messages Mar 15 20:47:32 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.219@o2ib7: 3 seconds Mar 15 20:47:32 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 441 previous similar messages Mar 15 20:47:57 fir-io7-s1 kernel: LNetError: 8893:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 20:47:57 fir-io7-s1 kernel: LNetError: 8893:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 210 previous similar messages Mar 15 20:49:24 fir-io7-s1 kernel: Lustre: fir-OST004c: haven't heard from client 1e88b10a-7c97-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c73314da400, cur 1584330564 expire 1584330414 last 1584330337 Mar 15 20:49:24 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 20:49:27 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 20:49:27 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 20:54:21 fir-io7-s1 kernel: LNetError: 38208:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 15 20:54:21 fir-io7-s1 kernel: LNetError: 38208:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 318 previous similar messages Mar 15 20:57:33 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.235@o2ib7: 1 seconds Mar 15 20:57:33 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 428 previous similar messages Mar 15 20:57:57 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 20:57:57 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 203 previous similar messages Mar 15 21:00:19 fir-io7-s1 kernel: Lustre: fir-OST0050: haven't heard from client 98083cbe-ac97-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c73cf43e800, cur 1584331219 expire 1584331069 last 1584330992 Mar 15 21:00:19 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 21:01:34 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 21:01:34 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 21:04:21 fir-io7-s1 kernel: LNetError: 38701:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 15 21:04:21 fir-io7-s1 kernel: LNetError: 38701:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 342 previous similar messages Mar 15 21:07:36 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.236@o2ib7: 0 seconds Mar 15 21:07:36 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 289 previous similar messages Mar 15 21:08:01 fir-io7-s1 kernel: LNetError: 38701:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 21:08:01 fir-io7-s1 kernel: LNetError: 38701:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 186 previous similar messages Mar 15 21:12:03 fir-io7-s1 kernel: Lustre: fir-OST004a: haven't heard from client 6ed03cbd-7fc7-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c79cd492800, cur 1584331923 expire 1584331773 last 1584331696 Mar 15 21:12:03 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 21:13:34 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 21:13:34 fir-io7-s1 kernel: Lustre: Skipped 11 previous similar messages Mar 15 21:14:31 fir-io7-s1 kernel: LNetError: 38701:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 15 21:14:31 fir-io7-s1 kernel: LNetError: 38701:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 326 previous similar messages Mar 15 21:17:37 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 6 seconds Mar 15 21:17:37 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 449 previous similar messages Mar 15 21:18:06 fir-io7-s1 kernel: LNetError: 39480:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 21:18:06 fir-io7-s1 kernel: LNetError: 39480:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 212 previous similar messages Mar 15 21:24:36 fir-io7-s1 kernel: LNetError: 39742:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 15 21:24:36 fir-io7-s1 kernel: LNetError: 39742:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 315 previous similar messages Mar 15 21:27:11 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 21:27:11 fir-io7-s1 kernel: Lustre: Skipped 11 previous similar messages Mar 15 21:27:41 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.238@o2ib7: 2 seconds Mar 15 21:27:41 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 389 previous similar messages Mar 15 21:28:06 fir-io7-s1 kernel: LNetError: 39628:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 21:28:06 fir-io7-s1 kernel: LNetError: 39628:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 180 previous similar messages Mar 15 21:28:22 fir-io7-s1 kernel: Lustre: fir-OST004a: haven't heard from client dcb98cde-fefb-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c8a39b17400, cur 1584332902 expire 1584332752 last 1584332675 Mar 15 21:28:22 fir-io7-s1 kernel: Lustre: Skipped 17 previous similar messages Mar 15 21:34:41 fir-io7-s1 kernel: LNetError: 40068:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 15 21:34:41 fir-io7-s1 kernel: LNetError: 40068:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 15 21:37:46 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.219@o2ib7: 5 seconds Mar 15 21:37:46 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 432 previous similar messages Mar 15 21:38:06 fir-io7-s1 kernel: LNetError: 38592:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 21:38:06 fir-io7-s1 kernel: LNetError: 38592:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 183 previous similar messages Mar 15 21:41:57 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 21:41:57 fir-io7-s1 kernel: Lustre: Skipped 17 previous similar messages Mar 15 21:42:56 fir-io7-s1 kernel: Lustre: fir-OST004a: haven't heard from client 3453bf67-751b-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c69903b5c00, cur 1584333776 expire 1584333626 last 1584333549 Mar 15 21:42:56 fir-io7-s1 kernel: Lustre: Skipped 17 previous similar messages Mar 15 21:44:46 fir-io7-s1 kernel: LNetError: 40341:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 15 21:44:46 fir-io7-s1 kernel: LNetError: 40341:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 322 previous similar messages Mar 15 21:48:05 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.227@o2ib7: 0 seconds Mar 15 21:48:05 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 444 previous similar messages Mar 15 21:48:06 fir-io7-s1 kernel: LNetError: 40656:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 21:48:06 fir-io7-s1 kernel: LNetError: 40656:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 212 previous similar messages Mar 15 21:52:43 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 21:52:43 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 21:53:29 fir-io7-s1 kernel: Lustre: fir-OST004a: haven't heard from client 08e25698-9083-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6f91f95400, cur 1584334409 expire 1584334259 last 1584334182 Mar 15 21:53:29 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 21:54:46 fir-io7-s1 kernel: LNetError: 40656:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 15 21:54:46 fir-io7-s1 kernel: LNetError: 40656:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 320 previous similar messages Mar 15 21:58:06 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 0 seconds Mar 15 21:58:06 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 745 previous similar messages Mar 15 21:58:06 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 21:58:06 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 189 previous similar messages Mar 15 22:04:46 fir-io7-s1 kernel: LNetError: 41130:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 15 22:04:46 fir-io7-s1 kernel: LNetError: 41130:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 335 previous similar messages Mar 15 22:08:06 fir-io7-s1 kernel: LNetError: 41473:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 22:08:06 fir-io7-s1 kernel: LNetError: 41473:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 144 previous similar messages Mar 15 22:08:12 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.219@o2ib7: 6 seconds Mar 15 22:08:12 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 682 previous similar messages Mar 15 22:14:51 fir-io7-s1 kernel: LNetError: 41550:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 15 22:14:51 fir-io7-s1 kernel: LNetError: 41550:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 307 previous similar messages Mar 15 22:18:06 fir-io7-s1 kernel: LNetError: 41915:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 22:18:06 fir-io7-s1 kernel: LNetError: 41915:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 141 previous similar messages Mar 15 22:18:13 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 2 seconds Mar 15 22:18:13 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 588 previous similar messages Mar 15 22:24:56 fir-io7-s1 kernel: LNetError: 41915:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 15 22:24:56 fir-io7-s1 kernel: LNetError: 41915:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 301 previous similar messages Mar 15 22:28:16 fir-io7-s1 kernel: LNetError: 42305:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 22:28:16 fir-io7-s1 kernel: LNetError: 42305:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 120 previous similar messages Mar 15 22:28:17 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.219@o2ib7: 5 seconds Mar 15 22:28:17 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 691 previous similar messages Mar 15 22:34:56 fir-io7-s1 kernel: LNetError: 42305:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 15 22:34:56 fir-io7-s1 kernel: LNetError: 42305:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 313 previous similar messages Mar 15 22:36:20 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 22:36:20 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 22:37:29 fir-io7-s1 kernel: Lustre: fir-OST0050: haven't heard from client 422779b7-409c-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6cfcfb0400, cur 1584337049 expire 1584336899 last 1584336822 Mar 15 22:37:29 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 22:38:26 fir-io7-s1 kernel: LNetError: 42305:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 22:38:26 fir-io7-s1 kernel: LNetError: 42305:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 158 previous similar messages Mar 15 22:38:28 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 0 seconds Mar 15 22:38:28 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 539 previous similar messages Mar 15 22:44:28 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 22:44:28 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 22:45:01 fir-io7-s1 kernel: LNetError: 42814:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 15 22:45:01 fir-io7-s1 kernel: LNetError: 42814:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 307 previous similar messages Mar 15 22:45:34 fir-io7-s1 kernel: Lustre: fir-OST0048: haven't heard from client d1cd2da8-95ff-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c4a4c00e800, cur 1584337534 expire 1584337384 last 1584337307 Mar 15 22:45:34 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 22:47:03 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 22:47:03 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 22:48:15 fir-io7-s1 kernel: Lustre: fir-OST004c: haven't heard from client ce4d8772-392a-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c698f49ec00, cur 1584337695 expire 1584337545 last 1584337468 Mar 15 22:48:15 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 22:48:26 fir-io7-s1 kernel: LNetError: 42493:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 22:48:26 fir-io7-s1 kernel: LNetError: 42493:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 140 previous similar messages Mar 15 22:48:33 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds Mar 15 22:48:33 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 619 previous similar messages Mar 15 22:55:06 fir-io7-s1 kernel: LNetError: 43073:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 15 22:55:06 fir-io7-s1 kernel: LNetError: 43073:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 315 previous similar messages Mar 15 22:58:31 fir-io7-s1 kernel: LNetError: 43462:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 22:58:31 fir-io7-s1 kernel: LNetError: 43462:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 172 previous similar messages Mar 15 22:58:36 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.233@o2ib7: 0 seconds Mar 15 22:58:36 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 523 previous similar messages Mar 15 22:58:36 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 22:58:36 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 22:59:26 fir-io7-s1 kernel: Lustre: fir-OST004c: haven't heard from client 85864807-01f7-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6c456b3400, cur 1584338366 expire 1584338216 last 1584338139 Mar 15 22:59:26 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 23:05:06 fir-io7-s1 kernel: LNetError: 43462:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 15 23:05:06 fir-io7-s1 kernel: LNetError: 43462:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 15 23:08:31 fir-io7-s1 kernel: LNetError: 43787:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 23:08:31 fir-io7-s1 kernel: LNetError: 43787:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 179 previous similar messages Mar 15 23:08:37 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.233@o2ib7: 0 seconds Mar 15 23:08:37 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 442 previous similar messages Mar 15 23:15:06 fir-io7-s1 kernel: LNetError: 43860:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 15 23:15:06 fir-io7-s1 kernel: LNetError: 43860:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 304 previous similar messages Mar 15 23:18:31 fir-io7-s1 kernel: LNetError: 44249:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 23:18:31 fir-io7-s1 kernel: LNetError: 44249:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 163 previous similar messages Mar 15 23:18:51 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 0 seconds Mar 15 23:18:51 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 531 previous similar messages Mar 15 23:24:14 fir-io7-s1 kernel: Lustre: fir-OST0052: haven't heard from client 5f48b2b4-9058-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c698f0a8000, cur 1584339854 expire 1584339704 last 1584339627 Mar 15 23:24:14 fir-io7-s1 kernel: Lustre: Skipped 11 previous similar messages Mar 15 23:24:24 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 23:24:24 fir-io7-s1 kernel: Lustre: Skipped 10 previous similar messages Mar 15 23:25:11 fir-io7-s1 kernel: LNetError: 44249:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 15 23:25:11 fir-io7-s1 kernel: LNetError: 44249:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 15 23:28:31 fir-io7-s1 kernel: LNetError: 44457:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 23:28:31 fir-io7-s1 kernel: LNetError: 44457:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 144 previous similar messages Mar 15 23:28:57 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 7 seconds Mar 15 23:28:57 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 612 previous similar messages Mar 15 23:29:35 fir-io7-s1 kernel: Lustre: fir-OST004c: haven't heard from client 5c21d9dc-c7cb-4 (at 10.50.9.27@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c78e1f1a400, cur 1584340175 expire 1584340025 last 1584339948 Mar 15 23:29:35 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 23:34:55 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 2f501a16-5689-4 (at 10.50.9.27@o2ib2) Mar 15 23:34:55 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 23:35:11 fir-io7-s1 kernel: LNetError: 38456:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 15 23:35:11 fir-io7-s1 kernel: LNetError: 38456:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 300 previous similar messages Mar 15 23:36:09 fir-io7-s1 kernel: Lustre: fir-OST0052: haven't heard from client 70e227b0-08c3-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c7319cf1000, cur 1584340569 expire 1584340419 last 1584340342 Mar 15 23:36:09 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 23:38:36 fir-io7-s1 kernel: LNetError: 44457:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 23:38:36 fir-io7-s1 kernel: LNetError: 44457:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 186 previous similar messages Mar 15 23:39:00 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.232@o2ib7: 0 seconds Mar 15 23:39:00 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 419 previous similar messages Mar 15 23:45:21 fir-io7-s1 kernel: LNetError: 45250:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 15 23:45:21 fir-io7-s1 kernel: LNetError: 45250:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 293 previous similar messages Mar 15 23:48:36 fir-io7-s1 kernel: Lustre: fir-OST004e: haven't heard from client 6ac912fd-26da-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c698f5e6800, cur 1584341316 expire 1584341166 last 1584341089 Mar 15 23:48:36 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 15 23:48:36 fir-io7-s1 kernel: Lustre: Skipped 6 previous similar messages Mar 15 23:48:36 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 15 23:48:41 fir-io7-s1 kernel: LNetError: 45467:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 23:48:41 fir-io7-s1 kernel: LNetError: 45467:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 168 previous similar messages Mar 15 23:49:01 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.232@o2ib7: 1 seconds Mar 15 23:49:01 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 597 previous similar messages Mar 15 23:55:21 fir-io7-s1 kernel: LNetError: 45467:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 15 23:55:21 fir-io7-s1 kernel: LNetError: 45467:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 284 previous similar messages Mar 15 23:58:41 fir-io7-s1 kernel: LNetError: 44655:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 15 23:58:41 fir-io7-s1 kernel: LNetError: 44655:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 158 previous similar messages Mar 15 23:59:06 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 0 seconds Mar 15 23:59:06 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 653 previous similar messages Mar 16 00:01:15 fir-io7-s1 kernel: Lustre: fir-OST004a: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 16 00:01:15 fir-io7-s1 kernel: Lustre: Skipped 9 previous similar messages Mar 16 00:02:26 fir-io7-s1 kernel: Lustre: fir-OST0048: haven't heard from client 56704a97-9313-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c73314dc800, cur 1584342146 expire 1584341996 last 1584341919 Mar 16 00:02:26 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 00:05:21 fir-io7-s1 kernel: LNetError: 45787:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 16 00:05:21 fir-io7-s1 kernel: LNetError: 45787:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 16 00:08:51 fir-io7-s1 kernel: LNetError: 46268:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 00:08:51 fir-io7-s1 kernel: LNetError: 46268:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 198 previous similar messages Mar 16 00:09:07 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.219@o2ib7: 10 seconds Mar 16 00:09:07 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 507 previous similar messages Mar 16 00:09:14 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 16 00:09:14 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 00:15:26 fir-io7-s1 kernel: LNetError: 46268:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 16 00:15:26 fir-io7-s1 kernel: LNetError: 46268:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 16 00:16:09 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 16 00:16:09 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 00:17:13 fir-io7-s1 kernel: Lustre: fir-OST0048: haven't heard from client 62478f0c-53dc-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c78e1f1b400, cur 1584343033 expire 1584342883 last 1584342806 Mar 16 00:17:13 fir-io7-s1 kernel: Lustre: Skipped 11 previous similar messages Mar 16 00:18:51 fir-io7-s1 kernel: LNetError: 46572:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 00:18:51 fir-io7-s1 kernel: LNetError: 46572:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 139 previous similar messages Mar 16 00:19:10 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.235@o2ib7: 0 seconds Mar 16 00:19:10 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 644 previous similar messages Mar 16 00:24:05 fir-io7-s1 kernel: Lustre: fir-OST004a: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 16 00:24:05 fir-io7-s1 kernel: Lustre: Skipped 4 previous similar messages Mar 16 00:25:26 fir-io7-s1 kernel: LNetError: 46572:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 16 00:25:26 fir-io7-s1 kernel: LNetError: 46572:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 283 previous similar messages Mar 16 00:28:51 fir-io7-s1 kernel: LNetError: 46958:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 00:28:51 fir-io7-s1 kernel: LNetError: 46958:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 173 previous similar messages Mar 16 00:29:11 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.238@o2ib7: 0 seconds Mar 16 00:29:11 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 583 previous similar messages Mar 16 00:35:26 fir-io7-s1 kernel: LNetError: 46958:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 16 00:35:26 fir-io7-s1 kernel: LNetError: 46958:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 293 previous similar messages Mar 16 00:35:36 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 16 00:35:36 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 00:36:39 fir-io7-s1 kernel: Lustre: fir-OST004c: haven't heard from client 6969ba53-9b27-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c768773fc00, cur 1584344199 expire 1584344049 last 1584343972 Mar 16 00:36:39 fir-io7-s1 kernel: Lustre: Skipped 11 previous similar messages Mar 16 00:38:51 fir-io7-s1 kernel: LNetError: 47225:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 00:38:51 fir-io7-s1 kernel: LNetError: 47225:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 189 previous similar messages Mar 16 00:39:16 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.219@o2ib7: 0 seconds Mar 16 00:39:16 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 335 previous similar messages Mar 16 00:45:36 fir-io7-s1 kernel: LNetError: 47335:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 16 00:45:36 fir-io7-s1 kernel: LNetError: 47335:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 16 00:47:56 fir-io7-s1 kernel: Lustre: fir-OST004a: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 16 00:47:56 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 00:48:51 fir-io7-s1 kernel: LNetError: 47742:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 00:48:51 fir-io7-s1 kernel: LNetError: 47742:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 140 previous similar messages Mar 16 00:49:01 fir-io7-s1 kernel: Lustre: fir-OST0048: haven't heard from client 2305de8e-4938-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c73265edc00, cur 1584344941 expire 1584344791 last 1584344714 Mar 16 00:49:01 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 00:49:17 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.237@o2ib7: 9 seconds Mar 16 00:49:17 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 606 previous similar messages Mar 16 00:55:36 fir-io7-s1 kernel: LNetError: 47725:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 16 00:55:36 fir-io7-s1 kernel: LNetError: 47725:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 280 previous similar messages Mar 16 00:58:56 fir-io7-s1 kernel: LNetError: 48108:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 00:58:56 fir-io7-s1 kernel: LNetError: 48108:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 147 previous similar messages Mar 16 00:59:18 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 0 seconds Mar 16 00:59:18 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 590 previous similar messages Mar 16 01:01:40 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 16 01:01:40 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 01:01:46 fir-io7-s1 kernel: Lustre: fir-OST0052: haven't heard from client 26fa0bc5-6d05-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c7a01e01800, cur 1584345706 expire 1584345556 last 1584345479 Mar 16 01:01:46 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 01:05:36 fir-io7-s1 kernel: LNetError: 48306:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 16 01:05:36 fir-io7-s1 kernel: LNetError: 48306:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 16 01:09:06 fir-io7-s1 kernel: LNetError: 48306:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 01:09:06 fir-io7-s1 kernel: LNetError: 48306:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 142 previous similar messages Mar 16 01:09:22 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 10 seconds Mar 16 01:09:22 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 614 previous similar messages Mar 16 01:15:41 fir-io7-s1 kernel: LNetError: 48649:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 16 01:15:41 fir-io7-s1 kernel: LNetError: 48649:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 280 previous similar messages Mar 16 01:15:57 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 16 01:15:57 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 01:16:45 fir-io7-s1 kernel: Lustre: fir-OST0050: haven't heard from client 5ae147cc-2e90-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c8a2f186000, cur 1584346605 expire 1584346455 last 1584346378 Mar 16 01:16:45 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 01:19:11 fir-io7-s1 kernel: LNetError: 48893:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 01:19:11 fir-io7-s1 kernel: LNetError: 48893:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 116 previous similar messages Mar 16 01:19:27 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.232@o2ib7: 4 seconds Mar 16 01:19:27 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 660 previous similar messages Mar 16 01:25:41 fir-io7-s1 kernel: LNetError: 48893:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 16 01:25:41 fir-io7-s1 kernel: LNetError: 48893:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 273 previous similar messages Mar 16 01:29:16 fir-io7-s1 kernel: LNetError: 48881:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 01:29:16 fir-io7-s1 kernel: LNetError: 48881:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 122 previous similar messages Mar 16 01:29:41 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.236@o2ib7: 0 seconds Mar 16 01:29:41 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 640 previous similar messages Mar 16 01:35:41 fir-io7-s1 kernel: LNetError: 49274:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 16 01:35:41 fir-io7-s1 kernel: LNetError: 49274:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 286 previous similar messages Mar 16 01:39:21 fir-io7-s1 kernel: LNetError: 49274:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 01:39:21 fir-io7-s1 kernel: LNetError: 49274:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 213 previous similar messages Mar 16 01:39:55 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 0 seconds Mar 16 01:39:55 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 498 previous similar messages Mar 16 01:45:51 fir-io7-s1 kernel: LNetError: 49795:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 16 01:45:51 fir-io7-s1 kernel: LNetError: 49795:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 16 01:49:26 fir-io7-s1 kernel: LNetError: 50097:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 01:49:26 fir-io7-s1 kernel: LNetError: 50097:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 164 previous similar messages Mar 16 01:50:02 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 7 seconds Mar 16 01:50:02 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 529 previous similar messages Mar 16 01:55:56 fir-io7-s1 kernel: LNetError: 50392:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 16 01:55:56 fir-io7-s1 kernel: LNetError: 50392:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 288 previous similar messages Mar 16 01:59:02 fir-io7-s1 kernel: Lustre: fir-OST004c: haven't heard from client d6a4a8f7-79b9-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c8a2ee1b400, cur 1584349142 expire 1584348992 last 1584348915 Mar 16 01:59:02 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 01:59:26 fir-io7-s1 kernel: LNetError: 50097:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 01:59:26 fir-io7-s1 kernel: LNetError: 50097:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 156 previous similar messages Mar 16 02:00:07 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.219@o2ib7: 10 seconds Mar 16 02:00:07 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 657 previous similar messages Mar 16 02:00:22 fir-io7-s1 kernel: Lustre: fir-OST004a: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 16 02:00:22 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 02:05:56 fir-io7-s1 kernel: LNetError: 50689:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 16 02:05:56 fir-io7-s1 kernel: LNetError: 50689:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 290 previous similar messages Mar 16 02:09:19 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 16 02:09:19 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 02:09:28 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 02:09:28 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 206 previous similar messages Mar 16 02:10:08 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.219@o2ib7: 1 seconds Mar 16 02:10:08 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 478 previous similar messages Mar 16 02:10:26 fir-io7-s1 kernel: Lustre: fir-OST004e: haven't heard from client 9dc5b47c-9933-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c4a642e3000, cur 1584349826 expire 1584349676 last 1584349599 Mar 16 02:10:26 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 02:16:06 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 16 02:16:06 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 02:16:06 fir-io7-s1 kernel: LNetError: 51265:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 16 02:16:06 fir-io7-s1 kernel: LNetError: 51265:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 290 previous similar messages Mar 16 02:17:17 fir-io7-s1 kernel: Lustre: fir-OST004e: haven't heard from client 8ad41d1d-31af-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c54d4902000, cur 1584350237 expire 1584350087 last 1584350010 Mar 16 02:17:17 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 02:19:31 fir-io7-s1 kernel: LNetError: 49743:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 02:19:31 fir-io7-s1 kernel: LNetError: 49743:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 130 previous similar messages Mar 16 02:20:12 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.232@o2ib7: 1 seconds Mar 16 02:20:12 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 645 previous similar messages Mar 16 02:22:22 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 16 02:22:22 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 02:23:39 fir-io7-s1 kernel: Lustre: fir-OST0048: haven't heard from client 5dc9a79e-4d33-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6f5178dc00, cur 1584350619 expire 1584350469 last 1584350392 Mar 16 02:23:39 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 02:26:11 fir-io7-s1 kernel: LNetError: 51265:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 16 02:26:11 fir-io7-s1 kernel: LNetError: 51265:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 290 previous similar messages Mar 16 02:29:31 fir-io7-s1 kernel: LNetError: 51653:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 02:29:31 fir-io7-s1 kernel: LNetError: 51653:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 196 previous similar messages Mar 16 02:30:17 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 6 seconds Mar 16 02:30:17 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 409 previous similar messages Mar 16 02:31:13 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 16 02:31:13 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 02:36:11 fir-io7-s1 kernel: LNetError: 51653:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 16 02:36:11 fir-io7-s1 kernel: LNetError: 51653:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 298 previous similar messages Mar 16 02:39:31 fir-io7-s1 kernel: LNetError: 49743:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 02:39:31 fir-io7-s1 kernel: LNetError: 49743:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 175 previous similar messages Mar 16 02:40:22 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 9 seconds Mar 16 02:40:22 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 565 previous similar messages Mar 16 02:40:51 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 16 02:40:51 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 02:42:10 fir-io7-s1 kernel: Lustre: fir-OST0052: haven't heard from client 1dda5d28-9d40-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c4a1a4b0c00, cur 1584351730 expire 1584351580 last 1584351503 Mar 16 02:42:10 fir-io7-s1 kernel: Lustre: Skipped 11 previous similar messages Mar 16 02:46:11 fir-io7-s1 kernel: LNetError: 52384:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 16 02:46:11 fir-io7-s1 kernel: LNetError: 52384:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 289 previous similar messages Mar 16 02:47:09 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 16 02:47:09 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 02:49:31 fir-io7-s1 kernel: LNetError: 52384:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 02:49:31 fir-io7-s1 kernel: LNetError: 52384:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 138 previous similar messages Mar 16 02:50:27 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 5 seconds Mar 16 02:50:27 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 621 previous similar messages Mar 16 02:56:11 fir-io7-s1 kernel: LNetError: 52384:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 16 02:56:11 fir-io7-s1 kernel: LNetError: 52384:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 16 02:59:31 fir-io7-s1 kernel: LNetError: 52793:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 02:59:31 fir-io7-s1 kernel: LNetError: 52793:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 192 previous similar messages Mar 16 03:00:32 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.239@o2ib7: 6 seconds Mar 16 03:00:32 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 542 previous similar messages Mar 16 03:06:11 fir-io7-s1 kernel: LNetError: 52793:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 16 03:06:11 fir-io7-s1 kernel: LNetError: 52793:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 315 previous similar messages Mar 16 03:09:36 fir-io7-s1 kernel: LNetError: 53193:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 03:09:36 fir-io7-s1 kernel: LNetError: 53193:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 180 previous similar messages Mar 16 03:10:33 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.235@o2ib7: 2 seconds Mar 16 03:10:33 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 566 previous similar messages Mar 16 03:16:21 fir-io7-s1 kernel: LNetError: 53193:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 16 03:16:21 fir-io7-s1 kernel: LNetError: 53193:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 16 03:19:36 fir-io7-s1 kernel: LNetError: 53270:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 03:19:36 fir-io7-s1 kernel: LNetError: 53270:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 152 previous similar messages Mar 16 03:20:42 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 6 seconds Mar 16 03:20:42 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 588 previous similar messages Mar 16 03:26:21 fir-io7-s1 kernel: LNetError: 53575:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 16 03:26:21 fir-io7-s1 kernel: LNetError: 53575:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 280 previous similar messages Mar 16 03:29:36 fir-io7-s1 kernel: LNetError: 53270:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 03:29:36 fir-io7-s1 kernel: LNetError: 53270:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 142 previous similar messages Mar 16 03:30:47 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 6 seconds Mar 16 03:30:47 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 516 previous similar messages Mar 16 03:36:21 fir-io7-s1 kernel: LNetError: 53958:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 16 03:36:21 fir-io7-s1 kernel: LNetError: 53958:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 288 previous similar messages Mar 16 03:39:40 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 03:39:40 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 148 previous similar messages Mar 16 03:40:48 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.239@o2ib7: 0 seconds Mar 16 03:40:48 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 494 previous similar messages Mar 16 03:46:21 fir-io7-s1 kernel: LNetError: 54365:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 16 03:46:21 fir-io7-s1 kernel: LNetError: 54365:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 301 previous similar messages Mar 16 03:49:46 fir-io7-s1 kernel: LNetError: 54743:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 03:49:46 fir-io7-s1 kernel: LNetError: 54743:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 161 previous similar messages Mar 16 03:50:57 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.235@o2ib7: 4 seconds Mar 16 03:50:57 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 607 previous similar messages Mar 16 03:56:21 fir-io7-s1 kernel: LNetError: 41576:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 16 03:56:21 fir-io7-s1 kernel: LNetError: 41576:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 286 previous similar messages Mar 16 03:59:51 fir-io7-s1 kernel: LNetError: 55121:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 03:59:51 fir-io7-s1 kernel: LNetError: 55121:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 154 previous similar messages Mar 16 04:01:02 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.232@o2ib7: 6 seconds Mar 16 04:01:02 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 571 previous similar messages Mar 16 04:06:26 fir-io7-s1 kernel: LNetError: 55121:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 16 04:06:26 fir-io7-s1 kernel: LNetError: 55121:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 16 04:09:51 fir-io7-s1 kernel: LNetError: 55519:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 04:09:51 fir-io7-s1 kernel: LNetError: 55519:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 159 previous similar messages Mar 16 04:11:16 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 0 seconds Mar 16 04:11:16 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 509 previous similar messages Mar 16 04:16:26 fir-io7-s1 kernel: LNetError: 55519:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 16 04:16:26 fir-io7-s1 kernel: LNetError: 55519:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 278 previous similar messages Mar 16 04:19:51 fir-io7-s1 kernel: LNetError: 55735:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 04:19:51 fir-io7-s1 kernel: LNetError: 55735:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 169 previous similar messages Mar 16 04:21:27 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.238@o2ib7: 2 seconds Mar 16 04:21:27 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 534 previous similar messages Mar 16 04:26:26 fir-io7-s1 kernel: LNetError: 55906:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 16 04:26:26 fir-io7-s1 kernel: LNetError: 55906:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 281 previous similar messages Mar 16 04:29:56 fir-io7-s1 kernel: LNetError: 56356:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 04:29:56 fir-io7-s1 kernel: LNetError: 56356:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 150 previous similar messages Mar 16 04:31:29 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.238@o2ib7: 0 seconds Mar 16 04:31:29 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 658 previous similar messages Mar 16 04:36:36 fir-io7-s1 kernel: LNetError: 56284:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 16 04:36:36 fir-io7-s1 kernel: LNetError: 56284:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 302 previous similar messages Mar 16 04:39:56 fir-io7-s1 kernel: LNetError: 55948:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 04:39:56 fir-io7-s1 kernel: LNetError: 55948:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 196 previous similar messages Mar 16 04:41:31 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.236@o2ib7: 0 seconds Mar 16 04:41:31 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 562 previous similar messages Mar 16 04:46:36 fir-io7-s1 kernel: LNetError: 56672:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 16 04:46:36 fir-io7-s1 kernel: LNetError: 56672:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 290 previous similar messages Mar 16 04:50:01 fir-io7-s1 kernel: LNetError: 57057:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 04:50:01 fir-io7-s1 kernel: LNetError: 57057:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 165 previous similar messages Mar 16 04:51:32 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 5 seconds Mar 16 04:51:32 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 450 previous similar messages Mar 16 04:56:36 fir-io7-s1 kernel: LNetError: 57319:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 16 04:56:36 fir-io7-s1 kernel: LNetError: 57319:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 282 previous similar messages Mar 16 05:00:01 fir-io7-s1 kernel: LNetError: 57319:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 05:00:01 fir-io7-s1 kernel: LNetError: 57319:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 160 previous similar messages Mar 16 05:01:33 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.232@o2ib7: 3 seconds Mar 16 05:01:33 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 504 previous similar messages Mar 16 05:06:36 fir-io7-s1 kernel: LNetError: 57319:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 16 05:06:36 fir-io7-s1 kernel: LNetError: 57319:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 280 previous similar messages Mar 16 05:10:01 fir-io7-s1 kernel: LNetError: 57852:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 05:10:01 fir-io7-s1 kernel: LNetError: 57852:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 156 previous similar messages Mar 16 05:11:36 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.239@o2ib7: 0 seconds Mar 16 05:11:36 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 413 previous similar messages Mar 16 05:16:36 fir-io7-s1 kernel: LNetError: 57829:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 16 05:16:36 fir-io7-s1 kernel: LNetError: 57829:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 16 05:20:01 fir-io7-s1 kernel: LNetError: 58209:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 05:20:01 fir-io7-s1 kernel: LNetError: 58209:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 161 previous similar messages Mar 16 05:21:47 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 5 seconds Mar 16 05:21:47 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 434 previous similar messages Mar 16 05:26:41 fir-io7-s1 kernel: LNetError: 58209:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 16 05:26:41 fir-io7-s1 kernel: LNetError: 58209:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 16 05:30:01 fir-io7-s1 kernel: LNetError: 58209:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 05:30:01 fir-io7-s1 kernel: LNetError: 58209:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 188 previous similar messages Mar 16 05:31:48 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.219@o2ib7: 2 seconds Mar 16 05:31:48 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 409 previous similar messages Mar 16 05:36:41 fir-io7-s1 kernel: LNetError: 58732:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 16 05:36:41 fir-io7-s1 kernel: LNetError: 58732:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 280 previous similar messages Mar 16 05:40:06 fir-io7-s1 kernel: LNetError: 58968:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 05:40:06 fir-io7-s1 kernel: LNetError: 58968:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 192 previous similar messages Mar 16 05:41:52 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.232@o2ib7: 5 seconds Mar 16 05:41:52 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 471 previous similar messages Mar 16 05:46:46 fir-io7-s1 kernel: LNetError: 58968:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 16 05:46:46 fir-io7-s1 kernel: LNetError: 58968:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 16 05:50:06 fir-io7-s1 kernel: LNetError: 59267:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 05:50:06 fir-io7-s1 kernel: LNetError: 59267:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 158 previous similar messages Mar 16 05:51:53 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.236@o2ib7: 4 seconds Mar 16 05:51:53 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 497 previous similar messages Mar 16 05:56:46 fir-io7-s1 kernel: LNetError: 59348:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 16 05:56:46 fir-io7-s1 kernel: LNetError: 59348:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 16 06:00:11 fir-io7-s1 kernel: LNetError: 57573:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 06:00:11 fir-io7-s1 kernel: LNetError: 57573:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 157 previous similar messages Mar 16 06:01:57 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 6 seconds Mar 16 06:01:57 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 446 previous similar messages Mar 16 06:06:46 fir-io7-s1 kernel: LNetError: 60077:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 16 06:06:46 fir-io7-s1 kernel: LNetError: 60077:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 296 previous similar messages Mar 16 06:10:11 fir-io7-s1 kernel: LNetError: 60077:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 06:10:11 fir-io7-s1 kernel: LNetError: 60077:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 194 previous similar messages Mar 16 06:11:58 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 6 seconds Mar 16 06:11:58 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 320 previous similar messages Mar 16 06:16:56 fir-io7-s1 kernel: LNetError: 60324:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 16 06:16:56 fir-io7-s1 kernel: LNetError: 60324:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 308 previous similar messages Mar 16 06:20:11 fir-io7-s1 kernel: LNetError: 60107:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 06:20:11 fir-io7-s1 kernel: LNetError: 60107:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 153 previous similar messages Mar 16 06:22:00 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.219@o2ib7: 0 seconds Mar 16 06:22:00 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 534 previous similar messages Mar 16 06:26:56 fir-io7-s1 kernel: LNetError: 60618:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 16 06:26:56 fir-io7-s1 kernel: LNetError: 60618:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 293 previous similar messages Mar 16 06:30:16 fir-io7-s1 kernel: LNetError: 60896:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 06:30:16 fir-io7-s1 kernel: LNetError: 60896:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 157 previous similar messages Mar 16 06:32:03 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 1 seconds Mar 16 06:32:03 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 545 previous similar messages Mar 16 06:36:56 fir-io7-s1 kernel: LNetError: 60896:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 16 06:36:56 fir-io7-s1 kernel: LNetError: 60896:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 298 previous similar messages Mar 16 06:40:16 fir-io7-s1 kernel: LNetError: 61277:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 06:40:16 fir-io7-s1 kernel: LNetError: 61277:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 187 previous similar messages Mar 16 06:42:06 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.219@o2ib7: 0 seconds Mar 16 06:42:06 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 552 previous similar messages Mar 16 06:47:01 fir-io7-s1 kernel: LNetError: 61277:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 16 06:47:01 fir-io7-s1 kernel: LNetError: 61277:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 288 previous similar messages Mar 16 06:50:16 fir-io7-s1 kernel: LNetError: 61660:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 06:50:16 fir-io7-s1 kernel: LNetError: 61660:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 173 previous similar messages Mar 16 06:52:12 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 3 seconds Mar 16 06:52:12 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 486 previous similar messages Mar 16 06:57:06 fir-io7-s1 kernel: LNetError: 61660:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 16 06:57:06 fir-io7-s1 kernel: LNetError: 61660:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 288 previous similar messages Mar 16 07:00:16 fir-io7-s1 kernel: LNetError: 62047:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 07:00:16 fir-io7-s1 kernel: LNetError: 62047:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 185 previous similar messages Mar 16 07:02:21 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.233@o2ib7: 0 seconds Mar 16 07:02:21 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 489 previous similar messages Mar 16 07:07:16 fir-io7-s1 kernel: LNetError: 62047:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 16 07:07:16 fir-io7-s1 kernel: LNetError: 62047:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 16 07:10:16 fir-io7-s1 kernel: LNetError: 62439:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 07:10:16 fir-io7-s1 kernel: LNetError: 62439:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 141 previous similar messages Mar 16 07:12:22 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.232@o2ib7: 1 seconds Mar 16 07:12:22 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 545 previous similar messages Mar 16 07:17:26 fir-io7-s1 kernel: LNetError: 41576:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 16 07:17:26 fir-io7-s1 kernel: LNetError: 41576:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 293 previous similar messages Mar 16 07:20:16 fir-io7-s1 kernel: LNetError: 62864:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 07:20:16 fir-io7-s1 kernel: LNetError: 62864:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 137 previous similar messages Mar 16 07:22:25 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.226@o2ib7: 0 seconds Mar 16 07:22:25 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 552 previous similar messages Mar 16 07:27:31 fir-io7-s1 kernel: LNetError: 62439:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 16 07:27:31 fir-io7-s1 kernel: LNetError: 62439:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 280 previous similar messages Mar 16 07:30:26 fir-io7-s1 kernel: LNetError: 63202:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 07:30:26 fir-io7-s1 kernel: LNetError: 63202:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 145 previous similar messages Mar 16 07:32:27 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.217@o2ib7: 2 seconds Mar 16 07:32:27 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 592 previous similar messages Mar 16 07:37:36 fir-io7-s1 kernel: LNetError: 63578:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 16 07:37:36 fir-io7-s1 kernel: LNetError: 63578:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 16 07:40:26 fir-io7-s1 kernel: LNetError: 63578:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 07:40:26 fir-io7-s1 kernel: LNetError: 63578:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 181 previous similar messages Mar 16 07:42:32 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.226@o2ib7: 9 seconds Mar 16 07:42:32 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 466 previous similar messages Mar 16 07:47:46 fir-io7-s1 kernel: LNetError: 63578:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 16 07:47:46 fir-io7-s1 kernel: LNetError: 63578:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 290 previous similar messages Mar 16 07:50:26 fir-io7-s1 kernel: LNetError: 62864:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 07:50:26 fir-io7-s1 kernel: LNetError: 62864:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 140 previous similar messages Mar 16 07:52:33 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.232@o2ib7: 11 seconds Mar 16 07:52:33 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 622 previous similar messages Mar 16 07:57:46 fir-io7-s1 kernel: LNetError: 63979:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 16 07:57:46 fir-io7-s1 kernel: LNetError: 63979:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 273 previous similar messages Mar 16 08:00:31 fir-io7-s1 kernel: LNetError: 64381:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 08:00:31 fir-io7-s1 kernel: LNetError: 64381:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 168 previous similar messages Mar 16 08:02:34 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.232@o2ib7: 0 seconds Mar 16 08:02:34 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 555 previous similar messages Mar 16 08:07:46 fir-io7-s1 kernel: LNetError: 64618:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 16 08:07:46 fir-io7-s1 kernel: LNetError: 64618:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 290 previous similar messages Mar 16 08:10:31 fir-io7-s1 kernel: LNetError: 64223:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 08:10:31 fir-io7-s1 kernel: LNetError: 64223:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 185 previous similar messages Mar 16 08:12:42 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.219@o2ib7: 2 seconds Mar 16 08:12:42 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 410 previous similar messages Mar 16 08:17:56 fir-io7-s1 kernel: LNetError: 64941:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 16 08:17:56 fir-io7-s1 kernel: LNetError: 64941:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 290 previous similar messages Mar 16 08:20:36 fir-io7-s1 kernel: LNetError: 65280:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 08:20:36 fir-io7-s1 kernel: LNetError: 65280:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 181 previous similar messages Mar 16 08:22:52 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.219@o2ib7: 9 seconds Mar 16 08:22:52 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 507 previous similar messages Mar 16 08:28:06 fir-io7-s1 kernel: LNetError: 65280:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 16 08:28:06 fir-io7-s1 kernel: LNetError: 65280:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 280 previous similar messages Mar 16 08:30:36 fir-io7-s1 kernel: LNetError: 65676:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 08:30:36 fir-io7-s1 kernel: LNetError: 65676:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 203 previous similar messages Mar 16 08:32:53 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.219@o2ib7: 1 seconds Mar 16 08:32:53 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 471 previous similar messages Mar 16 08:38:06 fir-io7-s1 kernel: LNetError: 65676:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 16 08:38:06 fir-io7-s1 kernel: LNetError: 65676:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 282 previous similar messages Mar 16 08:40:36 fir-io7-s1 kernel: LNetError: 66062:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 08:40:36 fir-io7-s1 kernel: LNetError: 66062:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 179 previous similar messages Mar 16 08:42:54 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 1 seconds Mar 16 08:42:54 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 534 previous similar messages Mar 16 08:48:06 fir-io7-s1 kernel: LNetError: 66062:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 16 08:48:06 fir-io7-s1 kernel: LNetError: 66062:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 280 previous similar messages Mar 16 08:50:36 fir-io7-s1 kernel: LNetError: 66455:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 08:50:36 fir-io7-s1 kernel: LNetError: 66455:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 188 previous similar messages Mar 16 08:52:58 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.236@o2ib7: 0 seconds Mar 16 08:52:58 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 301 previous similar messages Mar 16 08:58:06 fir-io7-s1 kernel: LNetError: 66455:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 16 08:58:06 fir-io7-s1 kernel: LNetError: 66455:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 305 previous similar messages Mar 16 09:00:36 fir-io7-s1 kernel: LNetError: 66847:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 09:00:36 fir-io7-s1 kernel: LNetError: 66847:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 207 previous similar messages Mar 16 09:03:11 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds Mar 16 09:03:11 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 485 previous similar messages Mar 16 09:08:06 fir-io7-s1 kernel: LNetError: 67206:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 16 09:08:06 fir-io7-s1 kernel: LNetError: 67206:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 295 previous similar messages Mar 16 09:10:36 fir-io7-s1 kernel: LNetError: 67206:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 09:10:36 fir-io7-s1 kernel: LNetError: 67206:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 206 previous similar messages Mar 16 09:13:22 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.227@o2ib7: 0 seconds Mar 16 09:13:22 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 376 previous similar messages Mar 16 09:18:06 fir-io7-s1 kernel: LNetError: 67408:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 16 09:18:06 fir-io7-s1 kernel: LNetError: 67408:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 299 previous similar messages Mar 16 09:20:36 fir-io7-s1 kernel: LNetError: 67612:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 09:20:36 fir-io7-s1 kernel: LNetError: 67612:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 203 previous similar messages Mar 16 09:23:27 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 6 seconds Mar 16 09:23:27 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 178 previous similar messages Mar 16 09:28:06 fir-io7-s1 kernel: LNetError: 67712:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 16 09:28:06 fir-io7-s1 kernel: LNetError: 67712:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 315 previous similar messages Mar 16 09:30:36 fir-io7-s1 kernel: LNetError: 68030:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 09:30:36 fir-io7-s1 kernel: LNetError: 68030:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 205 previous similar messages Mar 16 09:33:37 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 4 seconds Mar 16 09:33:37 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 478 previous similar messages Mar 16 09:38:06 fir-io7-s1 kernel: LNetError: 67712:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 16 09:38:06 fir-io7-s1 kernel: LNetError: 67712:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 295 previous similar messages Mar 16 09:40:36 fir-io7-s1 kernel: LNetError: 68235:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 09:40:36 fir-io7-s1 kernel: LNetError: 68235:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 195 previous similar messages Mar 16 09:43:47 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 3 seconds Mar 16 09:43:47 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 494 previous similar messages Mar 16 09:48:11 fir-io7-s1 kernel: LNetError: 68982:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 16 09:48:11 fir-io7-s1 kernel: LNetError: 68982:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 300 previous similar messages Mar 16 09:50:36 fir-io7-s1 kernel: LNetError: 68235:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 09:50:36 fir-io7-s1 kernel: LNetError: 68235:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 214 previous similar messages Mar 16 09:53:48 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 0 seconds Mar 16 09:53:48 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 485 previous similar messages Mar 16 09:58:11 fir-io7-s1 kernel: LNetError: 69474:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 16 09:58:11 fir-io7-s1 kernel: LNetError: 69474:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 282 previous similar messages Mar 16 10:00:36 fir-io7-s1 kernel: LNetError: 69762:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 10:00:36 fir-io7-s1 kernel: LNetError: 69762:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 170 previous similar messages Mar 16 10:03:49 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds Mar 16 10:03:49 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 569 previous similar messages Mar 16 10:08:16 fir-io7-s1 kernel: LNetError: 69762:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 16 10:08:16 fir-io7-s1 kernel: LNetError: 69762:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 301 previous similar messages Mar 16 10:10:37 fir-io7-s1 kernel: LNetError: 69945:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 10:10:37 fir-io7-s1 kernel: LNetError: 69945:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 183 previous similar messages Mar 16 10:13:51 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.217@o2ib7: 0 seconds Mar 16 10:13:51 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 412 previous similar messages Mar 16 10:18:22 fir-io7-s1 kernel: LNetError: 70158:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 16 10:18:22 fir-io7-s1 kernel: LNetError: 70158:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 300 previous similar messages Mar 16 10:20:47 fir-io7-s1 kernel: LNetError: 70564:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 10:20:47 fir-io7-s1 kernel: LNetError: 70564:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 196 previous similar messages Mar 16 10:23:57 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.233@o2ib7: 6 seconds Mar 16 10:23:57 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 550 previous similar messages Mar 16 10:28:32 fir-io7-s1 kernel: LNetError: 70903:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 16 10:28:32 fir-io7-s1 kernel: LNetError: 70903:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 301 previous similar messages Mar 16 10:30:51 fir-io7-s1 kernel: LNetError: 70903:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 10:30:51 fir-io7-s1 kernel: LNetError: 70903:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 187 previous similar messages Mar 16 10:34:00 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.227@o2ib7: 0 seconds Mar 16 10:34:00 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 510 previous similar messages Mar 16 10:38:36 fir-io7-s1 kernel: LNetError: 70903:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 16 10:38:36 fir-io7-s1 kernel: LNetError: 70903:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 327 previous similar messages Mar 16 10:40:51 fir-io7-s1 kernel: LNetError: 71336:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 10:40:51 fir-io7-s1 kernel: LNetError: 71336:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 217 previous similar messages Mar 16 10:44:01 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.219@o2ib7: 3 seconds Mar 16 10:44:01 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 415 previous similar messages Mar 16 10:48:41 fir-io7-s1 kernel: LNetError: 71336:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 16 10:48:41 fir-io7-s1 kernel: LNetError: 71336:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 304 previous similar messages Mar 16 10:50:51 fir-io7-s1 kernel: LNetError: 69945:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 10:50:51 fir-io7-s1 kernel: LNetError: 69945:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 202 previous similar messages Mar 16 10:54:02 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.225@o2ib7: 4 seconds Mar 16 10:54:02 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 269 previous similar messages Mar 16 10:58:41 fir-io7-s1 kernel: LNetError: 71730:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 16 10:58:41 fir-io7-s1 kernel: LNetError: 71730:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 298 previous similar messages Mar 16 11:01:01 fir-io7-s1 kernel: LNetError: 71730:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 11:01:01 fir-io7-s1 kernel: LNetError: 71730:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 210 previous similar messages Mar 16 11:04:06 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.239@o2ib7: 7 seconds Mar 16 11:04:06 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 406 previous similar messages Mar 16 11:08:46 fir-io7-s1 kernel: LNetError: 72213:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 16 11:08:46 fir-io7-s1 kernel: LNetError: 72213:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 16 11:11:01 fir-io7-s1 kernel: LNetError: 72387:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 11:11:01 fir-io7-s1 kernel: LNetError: 72387:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 181 previous similar messages Mar 16 11:14:11 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 6 seconds Mar 16 11:14:11 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 453 previous similar messages Mar 16 11:18:51 fir-io7-s1 kernel: LNetError: 72817:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 16 11:18:51 fir-io7-s1 kernel: LNetError: 72817:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 16 11:21:06 fir-io7-s1 kernel: LNetError: 41576:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 11:21:06 fir-io7-s1 kernel: LNetError: 41576:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 169 previous similar messages Mar 16 11:24:21 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.237@o2ib7: 7 seconds Mar 16 11:24:21 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 431 previous similar messages Mar 16 11:28:51 fir-io7-s1 kernel: LNetError: 72817:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 16 11:28:51 fir-io7-s1 kernel: LNetError: 72817:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 301 previous similar messages Mar 16 11:31:06 fir-io7-s1 kernel: LNetError: 72817:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 11:31:06 fir-io7-s1 kernel: LNetError: 72817:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 171 previous similar messages Mar 16 11:31:31 fir-io7-s1 kernel: Lustre: fir-OST004e: haven't heard from client 1a4cd521-9e5e-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c7c377a5400, cur 1584383491 expire 1584383341 last 1584383264 Mar 16 11:31:31 fir-io7-s1 kernel: Lustre: Skipped 11 previous similar messages Mar 16 11:32:59 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 16 11:32:59 fir-io7-s1 kernel: Lustre: fir-OST004c: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 16 11:32:59 fir-io7-s1 kernel: Lustre: Skipped 4 previous similar messages Mar 16 11:34:22 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 1 seconds Mar 16 11:34:22 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 485 previous similar messages Mar 16 11:38:56 fir-io7-s1 kernel: LNetError: 73383:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 16 11:38:56 fir-io7-s1 kernel: LNetError: 73383:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 316 previous similar messages Mar 16 11:41:06 fir-io7-s1 kernel: LNetError: 73611:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 11:41:06 fir-io7-s1 kernel: LNetError: 73611:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 184 previous similar messages Mar 16 11:42:13 fir-io7-s1 kernel: Lustre: fir-OST0050: haven't heard from client e4267963-0097-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c7659e8c000, cur 1584384133 expire 1584383983 last 1584383906 Mar 16 11:42:13 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 11:42:22 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 16 11:42:22 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 11:44:23 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.225@o2ib7: 1 seconds Mar 16 11:44:23 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 332 previous similar messages Mar 16 11:48:56 fir-io7-s1 kernel: LNetError: 73680:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 16 11:48:56 fir-io7-s1 kernel: LNetError: 73680:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 315 previous similar messages Mar 16 11:51:06 fir-io7-s1 kernel: LNetError: 73384:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 11:51:06 fir-io7-s1 kernel: LNetError: 73384:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 192 previous similar messages Mar 16 11:53:41 fir-io7-s1 kernel: Lustre: fir-OST004e: haven't heard from client 7d350b3a-a9a2-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c79cd372800, cur 1584384821 expire 1584384671 last 1584384594 Mar 16 11:53:41 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 11:53:41 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 16 11:53:41 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 11:54:31 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 9 seconds Mar 16 11:54:31 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 383 previous similar messages Mar 16 11:55:24 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to c391bfc0-9e0c-4 (at 10.50.17.28@o2ib2) Mar 16 11:55:24 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 11:59:06 fir-io7-s1 kernel: LNetError: 74094:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 16 11:59:06 fir-io7-s1 kernel: LNetError: 74094:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 290 previous similar messages Mar 16 12:01:08 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 12:01:08 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 184 previous similar messages Mar 16 12:04:36 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 4 seconds Mar 16 12:04:36 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 472 previous similar messages Mar 16 12:05:00 fir-io7-s1 kernel: Lustre: fir-OST004e: haven't heard from client 5623a124-ac35-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c4a5ae65000, cur 1584385500 expire 1584385350 last 1584385273 Mar 16 12:05:00 fir-io7-s1 kernel: Lustre: Skipped 11 previous similar messages Mar 16 12:05:31 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 16 12:05:31 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 12:09:16 fir-io7-s1 kernel: LNetError: 74812:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 16 12:09:16 fir-io7-s1 kernel: LNetError: 74812:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 16 12:11:11 fir-io7-s1 kernel: LNetError: 74812:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 12:11:11 fir-io7-s1 kernel: LNetError: 74812:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 172 previous similar messages Mar 16 12:14:46 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 6 seconds Mar 16 12:14:46 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 645 previous similar messages Mar 16 12:15:54 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 16 12:15:54 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 12:16:51 fir-io7-s1 kernel: Lustre: fir-OST004e: haven't heard from client 1338b959-7737-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c8810e93800, cur 1584386211 expire 1584386061 last 1584385984 Mar 16 12:16:51 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 12:19:21 fir-io7-s1 kernel: LNetError: 7562:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 16 12:19:21 fir-io7-s1 kernel: LNetError: 7562:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 16 12:21:16 fir-io7-s1 kernel: LNetError: 74812:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 12:21:16 fir-io7-s1 kernel: LNetError: 74812:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 172 previous similar messages Mar 16 12:24:46 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 3 seconds Mar 16 12:24:46 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 873 previous similar messages Mar 16 12:26:00 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 16 12:26:00 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 12:27:14 fir-io7-s1 kernel: Lustre: fir-OST004c: haven't heard from client 692b35ea-e7c0-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6992e33400, cur 1584386834 expire 1584386684 last 1584386607 Mar 16 12:27:14 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 12:29:26 fir-io7-s1 kernel: LNetError: 75380:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 16 12:29:26 fir-io7-s1 kernel: LNetError: 75380:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 307 previous similar messages Mar 16 12:31:21 fir-io7-s1 kernel: LNetError: 75681:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 12:31:21 fir-io7-s1 kernel: LNetError: 75681:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 137 previous similar messages Mar 16 12:34:47 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.233@o2ib7: 6 seconds Mar 16 12:34:47 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 826 previous similar messages Mar 16 12:36:02 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 16 12:36:02 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 12:37:19 fir-io7-s1 kernel: Lustre: fir-OST004c: haven't heard from client 6f221a6e-87f5-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c698f100400, cur 1584387439 expire 1584387289 last 1584387212 Mar 16 12:37:19 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 12:39:26 fir-io7-s1 kernel: LNetError: 75681:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 16 12:39:26 fir-io7-s1 kernel: LNetError: 75681:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 295 previous similar messages Mar 16 12:41:21 fir-io7-s1 kernel: LNetError: 75911:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 12:41:21 fir-io7-s1 kernel: LNetError: 75911:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 88 previous similar messages Mar 16 12:44:52 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.233@o2ib7: 9 seconds Mar 16 12:44:52 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 714 previous similar messages Mar 16 12:46:19 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 16 12:46:19 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 12:47:22 fir-io7-s1 kernel: Lustre: fir-OST004c: haven't heard from client 90135585-4161-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c768773ec00, cur 1584388042 expire 1584387892 last 1584387815 Mar 16 12:47:22 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 12:49:36 fir-io7-s1 kernel: LNetError: 76336:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 16 12:49:36 fir-io7-s1 kernel: LNetError: 76336:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 290 previous similar messages Mar 16 12:51:31 fir-io7-s1 kernel: LNetError: 76336:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 12:51:31 fir-io7-s1 kernel: LNetError: 76336:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 105 previous similar messages Mar 16 12:54:57 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.217@o2ib7: 0 seconds Mar 16 12:54:57 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 825 previous similar messages Mar 16 12:59:41 fir-io7-s1 kernel: LNetError: 76528:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 16 12:59:41 fir-io7-s1 kernel: LNetError: 76528:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 350 previous similar messages Mar 16 13:01:31 fir-io7-s1 kernel: LNetError: 76833:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 13:01:31 fir-io7-s1 kernel: LNetError: 76833:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 107 previous similar messages Mar 16 13:04:59 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.236@o2ib7: 0 seconds Mar 16 13:04:59 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 816 previous similar messages Mar 16 13:09:46 fir-io7-s1 kernel: LNetError: 76833:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 16 13:09:46 fir-io7-s1 kernel: LNetError: 76833:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 315 previous similar messages Mar 16 13:11:31 fir-io7-s1 kernel: LNetError: 77234:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 13:11:31 fir-io7-s1 kernel: LNetError: 77234:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 100 previous similar messages Mar 16 13:15:05 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.233@o2ib7: 0 seconds Mar 16 13:15:05 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 728 previous similar messages Mar 16 13:19:56 fir-io7-s1 kernel: LNetError: 77234:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 16 13:19:56 fir-io7-s1 kernel: LNetError: 77234:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 296 previous similar messages Mar 16 13:21:41 fir-io7-s1 kernel: LNetError: 77234:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 13:21:41 fir-io7-s1 kernel: LNetError: 77234:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 121 previous similar messages Mar 16 13:25:06 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.237@o2ib7: 0 seconds Mar 16 13:25:06 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 788 previous similar messages Mar 16 13:29:56 fir-io7-s1 kernel: LNetError: 77234:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 16 13:29:56 fir-io7-s1 kernel: LNetError: 77234:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 307 previous similar messages Mar 16 13:31:46 fir-io7-s1 kernel: LNetError: 73965:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 13:31:46 fir-io7-s1 kernel: LNetError: 73965:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 108 previous similar messages Mar 16 13:35:17 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.233@o2ib7: 1 seconds Mar 16 13:35:17 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 708 previous similar messages Mar 16 13:39:56 fir-io7-s1 kernel: LNetError: 78438:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 16 13:39:56 fir-io7-s1 kernel: LNetError: 78438:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 301 previous similar messages Mar 16 13:41:51 fir-io7-s1 kernel: LNetError: 78438:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 13:41:51 fir-io7-s1 kernel: LNetError: 78438:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 137 previous similar messages Mar 16 13:45:27 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.217@o2ib7: 7 seconds Mar 16 13:45:27 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 698 previous similar messages Mar 16 13:50:06 fir-io7-s1 kernel: LNetError: 78438:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 16 13:50:06 fir-io7-s1 kernel: LNetError: 78438:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 16 13:51:06 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 38133cfa-06b9-4 (at 10.50.1.38@o2ib2) Mar 16 13:51:06 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 13:51:56 fir-io7-s1 kernel: LNetError: 78910:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 13:51:56 fir-io7-s1 kernel: LNetError: 78910:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 144 previous similar messages Mar 16 13:52:11 fir-io7-s1 kernel: Lustre: fir-OST0050: haven't heard from client 38133cfa-06b9-4 (at 10.50.1.38@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c875aac6000, cur 1584391931 expire 1584391781 last 1584391704 Mar 16 13:52:11 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 13:55:36 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.217@o2ib7: 0 seconds Mar 16 13:55:36 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 701 previous similar messages Mar 16 14:00:11 fir-io7-s1 kernel: LNetError: 79158:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 16 14:00:11 fir-io7-s1 kernel: LNetError: 79158:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 301 previous similar messages Mar 16 14:01:56 fir-io7-s1 kernel: LNetError: 79262:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 14:01:56 fir-io7-s1 kernel: LNetError: 79262:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 160 previous similar messages Mar 16 14:05:42 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.217@o2ib7: 5 seconds Mar 16 14:05:42 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 658 previous similar messages Mar 16 14:10:11 fir-io7-s1 kernel: LNetError: 79368:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 16 14:10:11 fir-io7-s1 kernel: LNetError: 79368:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 16 14:11:56 fir-io7-s1 kernel: LNetError: 79492:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 14:11:56 fir-io7-s1 kernel: LNetError: 79492:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 167 previous similar messages Mar 16 14:15:43 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.227@o2ib7: 0 seconds Mar 16 14:15:43 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 657 previous similar messages Mar 16 14:20:11 fir-io7-s1 kernel: LNetError: 79711:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 16 14:20:11 fir-io7-s1 kernel: LNetError: 79711:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 16 14:22:06 fir-io7-s1 kernel: LNetError: 80097:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 14:22:06 fir-io7-s1 kernel: LNetError: 80097:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 143 previous similar messages Mar 16 14:25:44 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.233@o2ib7: 1 seconds Mar 16 14:25:44 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 609 previous similar messages Mar 16 14:30:16 fir-io7-s1 kernel: LNetError: 80097:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 16 14:30:16 fir-io7-s1 kernel: LNetError: 80097:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 16 14:32:06 fir-io7-s1 kernel: LNetError: 80478:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 14:32:06 fir-io7-s1 kernel: LNetError: 80478:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 146 previous similar messages Mar 16 14:35:47 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 7 seconds Mar 16 14:35:47 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 667 previous similar messages Mar 16 14:40:21 fir-io7-s1 kernel: LNetError: 80478:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 16 14:40:21 fir-io7-s1 kernel: LNetError: 80478:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 16 14:42:06 fir-io7-s1 kernel: LNetError: 80872:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 14:42:06 fir-io7-s1 kernel: LNetError: 80872:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 144 previous similar messages Mar 16 14:44:26 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to 2cf4b245-9c94-4 (at 10.50.1.60@o2ib2) Mar 16 14:44:26 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 14:45:52 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.217@o2ib7: 2 seconds Mar 16 14:45:52 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 618 previous similar messages Mar 16 14:47:01 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 16 14:47:01 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 14:48:01 fir-io7-s1 kernel: Lustre: fir-OST004a: haven't heard from client 0258d085-4695-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c698f1d7000, cur 1584395281 expire 1584395131 last 1584395054 Mar 16 14:48:01 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 14:50:31 fir-io7-s1 kernel: LNetError: 80872:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 16 14:50:31 fir-io7-s1 kernel: LNetError: 80872:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 16 14:52:11 fir-io7-s1 kernel: LNetError: 80872:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 14:52:11 fir-io7-s1 kernel: LNetError: 80872:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 159 previous similar messages Mar 16 14:55:57 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.227@o2ib7: 10 seconds Mar 16 14:55:57 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 639 previous similar messages Mar 16 14:57:27 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 16 14:57:27 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 14:58:45 fir-io7-s1 kernel: Lustre: fir-OST004a: haven't heard from client 0d24f2b1-f5fd-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c4e80adc000, cur 1584395925 expire 1584395775 last 1584395698 Mar 16 14:58:45 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 15:00:41 fir-io7-s1 kernel: LNetError: 80872:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 16 15:00:41 fir-io7-s1 kernel: LNetError: 80872:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 315 previous similar messages Mar 16 15:02:11 fir-io7-s1 kernel: LNetError: 81665:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 15:02:11 fir-io7-s1 kernel: LNetError: 81665:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 178 previous similar messages Mar 16 15:06:02 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.236@o2ib7: 6 seconds Mar 16 15:06:02 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 586 previous similar messages Mar 16 15:10:41 fir-io7-s1 kernel: LNetError: 81665:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 16 15:10:41 fir-io7-s1 kernel: LNetError: 81665:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 304 previous similar messages Mar 16 15:12:16 fir-io7-s1 kernel: LNetError: 82061:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 15:12:16 fir-io7-s1 kernel: LNetError: 82061:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 166 previous similar messages Mar 16 15:16:03 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.233@o2ib7: 9 seconds Mar 16 15:16:03 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 610 previous similar messages Mar 16 15:19:24 fir-io7-s1 kernel: Lustre: fir-OST0052: haven't heard from client dd28c588-e6bf-4 (at 10.50.6.54@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c7a01e02800, cur 1584397164 expire 1584397014 last 1584396937 Mar 16 15:19:24 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 15:19:29 fir-io7-s1 kernel: Lustre: fir-OST004e: haven't heard from client dd28c588-e6bf-4 (at 10.50.6.54@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6991e83c00, cur 1584397169 expire 1584397019 last 1584396942 Mar 16 15:20:23 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to a282389e-3a6d-4 (at 10.50.6.54@o2ib2) Mar 16 15:20:23 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 15:20:46 fir-io7-s1 kernel: LNetError: 82061:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 16 15:20:46 fir-io7-s1 kernel: LNetError: 82061:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 295 previous similar messages Mar 16 15:22:16 fir-io7-s1 kernel: LNetError: 82448:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 15:22:16 fir-io7-s1 kernel: LNetError: 82448:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 166 previous similar messages Mar 16 15:26:07 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 5 seconds Mar 16 15:26:07 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 554 previous similar messages Mar 16 15:30:46 fir-io7-s1 kernel: LNetError: 82448:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 16 15:30:46 fir-io7-s1 kernel: LNetError: 82448:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 289 previous similar messages Mar 16 15:32:21 fir-io7-s1 kernel: LNetError: 82448:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 15:32:21 fir-io7-s1 kernel: LNetError: 82448:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 179 previous similar messages Mar 16 15:36:12 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 0 seconds Mar 16 15:36:12 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 608 previous similar messages Mar 16 15:40:56 fir-io7-s1 kernel: LNetError: 83209:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 16 15:40:56 fir-io7-s1 kernel: LNetError: 83209:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 296 previous similar messages Mar 16 15:42:21 fir-io7-s1 kernel: LNetError: 83209:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 15:42:21 fir-io7-s1 kernel: LNetError: 83209:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 185 previous similar messages Mar 16 15:46:13 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 0 seconds Mar 16 15:46:13 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 583 previous similar messages Mar 16 15:51:06 fir-io7-s1 kernel: LNetError: 83209:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 16 15:51:06 fir-io7-s1 kernel: LNetError: 83209:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 16 15:52:21 fir-io7-s1 kernel: LNetError: 83449:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 15:52:21 fir-io7-s1 kernel: LNetError: 83449:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 184 previous similar messages Mar 16 15:56:22 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.232@o2ib7: 4 seconds Mar 16 15:56:22 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 643 previous similar messages Mar 16 16:01:06 fir-io7-s1 kernel: LNetError: 83969:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 16 16:01:06 fir-io7-s1 kernel: LNetError: 83969:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 308 previous similar messages Mar 16 16:02:21 fir-io7-s1 kernel: LNetError: 83969:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 16:02:21 fir-io7-s1 kernel: LNetError: 83969:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 187 previous similar messages Mar 16 16:06:27 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.227@o2ib7: 10 seconds Mar 16 16:06:27 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 601 previous similar messages Mar 16 16:11:06 fir-io7-s1 kernel: LNetError: 76748:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 16 16:11:06 fir-io7-s1 kernel: LNetError: 76748:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 16 16:12:21 fir-io7-s1 kernel: LNetError: 83449:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 16:12:21 fir-io7-s1 kernel: LNetError: 83449:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 178 previous similar messages Mar 16 16:16:32 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 1 seconds Mar 16 16:16:32 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 543 previous similar messages Mar 16 16:21:11 fir-io7-s1 kernel: LNetError: 84566:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 16 16:21:11 fir-io7-s1 kernel: LNetError: 84566:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 295 previous similar messages Mar 16 16:22:26 fir-io7-s1 kernel: LNetError: 84787:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 16:22:26 fir-io7-s1 kernel: LNetError: 84787:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 184 previous similar messages Mar 16 16:26:33 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 0 seconds Mar 16 16:26:33 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 541 previous similar messages Mar 16 16:31:16 fir-io7-s1 kernel: LNetError: 85057:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 16 16:31:16 fir-io7-s1 kernel: LNetError: 85057:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 16 16:32:26 fir-io7-s1 kernel: LNetError: 83449:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 16:32:26 fir-io7-s1 kernel: LNetError: 83449:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 189 previous similar messages Mar 16 16:36:37 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.233@o2ib7: 6 seconds Mar 16 16:36:37 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 548 previous similar messages Mar 16 16:41:21 fir-io7-s1 kernel: LNetError: 85057:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 16 16:41:21 fir-io7-s1 kernel: LNetError: 85057:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 308 previous similar messages Mar 16 16:42:26 fir-io7-s1 kernel: LNetError: 85565:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 16:42:26 fir-io7-s1 kernel: LNetError: 85565:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 186 previous similar messages Mar 16 16:46:38 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 7 seconds Mar 16 16:46:38 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 545 previous similar messages Mar 16 16:51:26 fir-io7-s1 kernel: LNetError: 85942:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 16 16:51:26 fir-io7-s1 kernel: LNetError: 85942:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 315 previous similar messages Mar 16 16:52:31 fir-io7-s1 kernel: LNetError: 85942:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 16:52:31 fir-io7-s1 kernel: LNetError: 85942:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 193 previous similar messages Mar 16 16:56:42 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 10 seconds Mar 16 16:56:42 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 561 previous similar messages Mar 16 17:01:26 fir-io7-s1 kernel: LNetError: 86338:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 16 17:01:26 fir-io7-s1 kernel: LNetError: 86338:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 315 previous similar messages Mar 16 17:01:57 fir-io7-s1 kernel: Lustre: fir-OST004a: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 16 17:01:57 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 17:02:31 fir-io7-s1 kernel: LNetError: 86338:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 17:02:31 fir-io7-s1 kernel: LNetError: 86338:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 203 previous similar messages Mar 16 17:02:50 fir-io7-s1 kernel: Lustre: fir-OST0050: haven't heard from client f8dd51cd-3d9c-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c4d7ab70000, cur 1584403370 expire 1584403220 last 1584403143 Mar 16 17:02:50 fir-io7-s1 kernel: Lustre: Skipped 4 previous similar messages Mar 16 17:02:53 fir-io7-s1 kernel: Lustre: fir-OST004e: haven't heard from client f8dd51cd-3d9c-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6bffcff000, cur 1584403373 expire 1584403223 last 1584403146 Mar 16 17:02:53 fir-io7-s1 kernel: Lustre: Skipped 4 previous similar messages Mar 16 17:06:52 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.217@o2ib7: 3 seconds Mar 16 17:06:52 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 573 previous similar messages Mar 16 17:07:59 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 16 17:07:59 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 17:08:54 fir-io7-s1 kernel: Lustre: fir-OST004c: haven't heard from client 7fa51ec6-4ec4-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c4a592db400, cur 1584403734 expire 1584403584 last 1584403507 Mar 16 17:09:06 fir-io7-s1 kernel: Lustre: fir-OST0048: haven't heard from client 7fa51ec6-4ec4-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c79ccd07000, cur 1584403746 expire 1584403596 last 1584403519 Mar 16 17:09:06 fir-io7-s1 kernel: Lustre: Skipped 3 previous similar messages Mar 16 17:11:36 fir-io7-s1 kernel: LNetError: 86608:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 16 17:11:36 fir-io7-s1 kernel: LNetError: 86608:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 307 previous similar messages Mar 16 17:12:31 fir-io7-s1 kernel: LNetError: 86608:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 17:12:31 fir-io7-s1 kernel: LNetError: 86608:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 198 previous similar messages Mar 16 17:16:57 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.227@o2ib7: 5 seconds Mar 16 17:16:57 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 554 previous similar messages Mar 16 17:18:43 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 16 17:18:43 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 17:19:28 fir-io7-s1 kernel: Lustre: fir-OST0048: haven't heard from client d126da0c-bc1e-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6992900c00, cur 1584404368 expire 1584404218 last 1584404141 Mar 16 17:19:28 fir-io7-s1 kernel: Lustre: Skipped 1 previous similar message Mar 16 17:19:44 fir-io7-s1 kernel: Lustre: fir-OST004a: haven't heard from client d126da0c-bc1e-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c78ef419c00, cur 1584404384 expire 1584404234 last 1584404157 Mar 16 17:19:44 fir-io7-s1 kernel: Lustre: Skipped 3 previous similar messages Mar 16 17:21:41 fir-io7-s1 kernel: LNetError: 87002:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 16 17:21:41 fir-io7-s1 kernel: LNetError: 87002:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 16 17:22:31 fir-io7-s1 kernel: LNetError: 86960:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 17:22:31 fir-io7-s1 kernel: LNetError: 86960:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 183 previous similar messages Mar 16 17:27:07 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 7 seconds Mar 16 17:27:07 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 534 previous similar messages Mar 16 17:30:47 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 16 17:30:47 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 17:31:46 fir-io7-s1 kernel: LNetError: 87298:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 16 17:31:46 fir-io7-s1 kernel: LNetError: 87298:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 295 previous similar messages Mar 16 17:32:08 fir-io7-s1 kernel: Lustre: fir-OST0048: haven't heard from client 482aef99-f122-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6a04fcc000, cur 1584405128 expire 1584404978 last 1584404901 Mar 16 17:32:08 fir-io7-s1 kernel: Lustre: Skipped 1 previous similar message Mar 16 17:32:33 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 17:32:33 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 170 previous similar messages Mar 16 17:37:17 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.233@o2ib7: 5 seconds Mar 16 17:37:17 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 535 previous similar messages Mar 16 17:41:46 fir-io7-s1 kernel: LNetError: 87512:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 16 17:41:46 fir-io7-s1 kernel: LNetError: 87512:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 288 previous similar messages Mar 16 17:42:34 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 17:42:34 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 168 previous similar messages Mar 16 17:43:25 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 16 17:43:25 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 17:44:36 fir-io7-s1 kernel: Lustre: fir-OST004a: haven't heard from client a882ebd6-baae-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c70682c7000, cur 1584405876 expire 1584405726 last 1584405649 Mar 16 17:44:36 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 17:47:22 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 1 seconds Mar 16 17:47:22 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 580 previous similar messages Mar 16 17:49:44 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 16 17:49:44 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 17:50:38 fir-io7-s1 kernel: Lustre: fir-OST0048: haven't heard from client 3b771435-ddb1-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c70682c0000, cur 1584406238 expire 1584406088 last 1584406011 Mar 16 17:50:38 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 17:50:58 fir-io7-s1 kernel: Lustre: fir-OST004a: haven't heard from client 3b771435-ddb1-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c8182fb8800, cur 1584406258 expire 1584406108 last 1584406031 Mar 16 17:51:51 fir-io7-s1 kernel: LNetError: 87899:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 16 17:51:51 fir-io7-s1 kernel: LNetError: 87899:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 301 previous similar messages Mar 16 17:52:36 fir-io7-s1 kernel: LNetError: 88290:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 17:52:36 fir-io7-s1 kernel: LNetError: 88290:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 176 previous similar messages Mar 16 17:57:27 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 5 seconds Mar 16 17:57:27 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 503 previous similar messages Mar 16 18:02:01 fir-io7-s1 kernel: LNetError: 88290:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 16 18:02:01 fir-io7-s1 kernel: LNetError: 88290:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 16 18:02:36 fir-io7-s1 kernel: LNetError: 88717:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 18:02:36 fir-io7-s1 kernel: LNetError: 88717:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 165 previous similar messages Mar 16 18:03:09 fir-io7-s1 kernel: Lustre: fir-OST0052: haven't heard from client c50201f4-d5fc-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6991dda800, cur 1584406989 expire 1584406839 last 1584406762 Mar 16 18:03:09 fir-io7-s1 kernel: Lustre: Skipped 4 previous similar messages Mar 16 18:04:23 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 16 18:04:23 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 18:04:25 fir-io7-s1 kernel: Lustre: fir-OST004e: haven't heard from client b7a74782-5b77-4 (at 10.50.6.54@o2ib2) in 208 seconds. I think it's dead, and I am evicting it. exp ffff9c78e1f1a400, cur 1584407065 expire 1584406915 last 1584406857 Mar 16 18:04:25 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 18:04:27 fir-io7-s1 kernel: Lustre: fir-OST004c: haven't heard from client b7a74782-5b77-4 (at 10.50.6.54@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c698f4eb000, cur 1584407067 expire 1584406917 last 1584406840 Mar 16 18:04:27 fir-io7-s1 kernel: Lustre: Skipped 3 previous similar messages Mar 16 18:04:44 fir-io7-s1 kernel: Lustre: fir-OST0050: haven't heard from client b7a74782-5b77-4 (at 10.50.6.54@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c69923ad400, cur 1584407084 expire 1584406934 last 1584406857 Mar 16 18:05:53 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to a282389e-3a6d-4 (at 10.50.6.54@o2ib2) Mar 16 18:05:53 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 18:07:28 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 5 seconds Mar 16 18:07:28 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 470 previous similar messages Mar 16 18:12:11 fir-io7-s1 kernel: LNetError: 89073:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 16 18:12:11 fir-io7-s1 kernel: LNetError: 89073:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 16 18:12:36 fir-io7-s1 kernel: LNetError: 88499:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 18:12:36 fir-io7-s1 kernel: LNetError: 88499:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 173 previous similar messages Mar 16 18:13:35 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 16 18:13:35 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 18:14:52 fir-io7-s1 kernel: Lustre: fir-OST0048: haven't heard from client a073a39b-c583-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c8a2f123400, cur 1584407692 expire 1584407542 last 1584407465 Mar 16 18:17:31 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.239@o2ib7: 0 seconds Mar 16 18:17:31 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 442 previous similar messages Mar 16 18:22:21 fir-io7-s1 kernel: LNetError: 89316:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 16 18:22:21 fir-io7-s1 kernel: LNetError: 89316:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 16 18:22:38 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 18:22:38 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 160 previous similar messages Mar 16 18:27:54 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds Mar 16 18:27:54 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 463 previous similar messages Mar 16 18:28:21 fir-io7-s1 kernel: Lustre: fir-OST004a: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 16 18:28:21 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 16 18:28:21 fir-io7-s1 kernel: Lustre: Skipped 4 previous similar messages Mar 16 18:29:17 fir-io7-s1 kernel: Lustre: fir-OST0048: haven't heard from client 38e6bff4-bdfa-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c7f3c559400, cur 1584408557 expire 1584408407 last 1584408330 Mar 16 18:29:17 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 18:29:29 fir-io7-s1 kernel: Lustre: fir-OST004c: haven't heard from client 38e6bff4-bdfa-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c70682c5000, cur 1584408569 expire 1584408419 last 1584408342 Mar 16 18:29:29 fir-io7-s1 kernel: Lustre: Skipped 3 previous similar messages Mar 16 18:32:26 fir-io7-s1 kernel: LNetError: 89696:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 16 18:32:26 fir-io7-s1 kernel: LNetError: 89696:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 301 previous similar messages Mar 16 18:32:41 fir-io7-s1 kernel: LNetError: 89696:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 18:32:41 fir-io7-s1 kernel: LNetError: 89696:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 187 previous similar messages Mar 16 18:37:55 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 0 seconds Mar 16 18:37:55 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 450 previous similar messages Mar 16 18:38:59 fir-io7-s1 kernel: Lustre: fir-OST004a: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 16 18:38:59 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 18:39:54 fir-io7-s1 kernel: Lustre: fir-OST004e: haven't heard from client 6549b717-5684-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c79cebb6800, cur 1584409194 expire 1584409044 last 1584408967 Mar 16 18:39:54 fir-io7-s1 kernel: Lustre: Skipped 1 previous similar message Mar 16 18:40:05 fir-io7-s1 kernel: Lustre: fir-OST004c: haven't heard from client 6549b717-5684-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c79cebb5000, cur 1584409205 expire 1584409055 last 1584408978 Mar 16 18:40:05 fir-io7-s1 kernel: Lustre: Skipped 1 previous similar message Mar 16 18:42:26 fir-io7-s1 kernel: LNetError: 89696:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 16 18:42:26 fir-io7-s1 kernel: LNetError: 89696:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 16 18:42:41 fir-io7-s1 kernel: LNetError: 90221:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 18:42:41 fir-io7-s1 kernel: LNetError: 90221:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 165 previous similar messages Mar 16 18:48:00 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.237@o2ib7: 0 seconds Mar 16 18:48:00 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 322 previous similar messages Mar 16 18:48:00 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 16 18:48:55 fir-io7-s1 kernel: Lustre: fir-OST004e: haven't heard from client 2be98bf3-0d78-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6992381c00, cur 1584409735 expire 1584409585 last 1584409508 Mar 16 18:48:55 fir-io7-s1 kernel: Lustre: Skipped 3 previous similar messages Mar 16 18:49:03 fir-io7-s1 kernel: Lustre: fir-OST0050: haven't heard from client 2be98bf3-0d78-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c7a2f17ec00, cur 1584409743 expire 1584409593 last 1584409516 Mar 16 18:49:03 fir-io7-s1 kernel: Lustre: Skipped 4 previous similar messages Mar 16 18:52:31 fir-io7-s1 kernel: LNetError: 90221:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 16 18:52:31 fir-io7-s1 kernel: LNetError: 90221:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 280 previous similar messages Mar 16 18:52:46 fir-io7-s1 kernel: LNetError: 90612:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 18:52:46 fir-io7-s1 kernel: LNetError: 90612:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 146 previous similar messages Mar 16 18:58:05 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 16 18:58:05 fir-io7-s1 kernel: Lustre: Skipped 10 previous similar messages Mar 16 18:58:07 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.225@o2ib7: 5 seconds Mar 16 18:58:07 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 412 previous similar messages Mar 16 18:59:00 fir-io7-s1 kernel: Lustre: fir-OST0050: haven't heard from client 28873c82-f845-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c7319cf5000, cur 1584410340 expire 1584410190 last 1584410113 Mar 16 19:02:41 fir-io7-s1 kernel: LNetError: 90845:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 16 19:02:41 fir-io7-s1 kernel: LNetError: 90845:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 16 19:02:46 fir-io7-s1 kernel: LNetError: 89507:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 19:02:46 fir-io7-s1 kernel: LNetError: 89507:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 139 previous similar messages Mar 16 19:08:08 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.225@o2ib7: 1 seconds Mar 16 19:08:08 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 448 previous similar messages Mar 16 19:08:26 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 16 19:08:26 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 19:09:18 fir-io7-s1 kernel: Lustre: fir-OST004a: haven't heard from client fbcbc839-8376-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6f1c5d7c00, cur 1584410958 expire 1584410808 last 1584410731 Mar 16 19:09:18 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 19:09:48 fir-io7-s1 kernel: Lustre: fir-OST0050: haven't heard from client fbcbc839-8376-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6991c4f800, cur 1584410988 expire 1584410838 last 1584410761 Mar 16 19:09:48 fir-io7-s1 kernel: Lustre: Skipped 4 previous similar messages Mar 16 19:12:41 fir-io7-s1 kernel: LNetError: 91162:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 16 19:12:41 fir-io7-s1 kernel: LNetError: 91162:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 16 19:12:46 fir-io7-s1 kernel: LNetError: 89507:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 19:12:46 fir-io7-s1 kernel: LNetError: 89507:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 159 previous similar messages Mar 16 19:18:12 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 9 seconds Mar 16 19:18:12 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 508 previous similar messages Mar 16 19:18:36 fir-io7-s1 kernel: Lustre: fir-OST004a: haven't heard from client 034007ad-82e0-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6f91f94c00, cur 1584411516 expire 1584411366 last 1584411289 Mar 16 19:18:56 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 16 19:18:56 fir-io7-s1 kernel: Lustre: Skipped 4 previous similar messages Mar 16 19:22:46 fir-io7-s1 kernel: LNetError: 91443:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 16 19:22:46 fir-io7-s1 kernel: LNetError: 91443:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 16 19:22:46 fir-io7-s1 kernel: LNetError: 73965:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 19:22:46 fir-io7-s1 kernel: LNetError: 73965:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 163 previous similar messages Mar 16 19:25:37 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 16 19:25:37 fir-io7-s1 kernel: Lustre: Skipped 6 previous similar messages Mar 16 19:26:55 fir-io7-s1 kernel: Lustre: fir-OST0048: haven't heard from client dd233763-f339-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c7a2d835400, cur 1584412015 expire 1584411865 last 1584411788 Mar 16 19:26:55 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 19:28:15 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.235@o2ib7: 0 seconds Mar 16 19:28:15 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 477 previous similar messages Mar 16 19:32:46 fir-io7-s1 kernel: LNetError: 91826:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 16 19:32:46 fir-io7-s1 kernel: LNetError: 91826:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 16 19:32:46 fir-io7-s1 kernel: LNetError: 73965:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 19:32:46 fir-io7-s1 kernel: LNetError: 73965:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 175 previous similar messages Mar 16 19:34:47 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 16 19:34:47 fir-io7-s1 kernel: Lustre: Skipped 4 previous similar messages Mar 16 19:36:06 fir-io7-s1 kernel: Lustre: fir-OST004c: haven't heard from client 3b91e39e-d317-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c5816afec00, cur 1584412566 expire 1584412416 last 1584412339 Mar 16 19:36:06 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 19:38:16 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.235@o2ib7: 0 seconds Mar 16 19:38:16 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 501 previous similar messages Mar 16 19:42:56 fir-io7-s1 kernel: LNetError: 91826:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 16 19:42:56 fir-io7-s1 kernel: LNetError: 91826:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 16 19:42:56 fir-io7-s1 kernel: LNetError: 73965:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 19:42:56 fir-io7-s1 kernel: LNetError: 73965:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 154 previous similar messages Mar 16 19:44:07 fir-io7-s1 kernel: Lustre: fir-OST0052: haven't heard from client 48978fb4-178b-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c5a08761000, cur 1584413047 expire 1584412897 last 1584412820 Mar 16 19:44:07 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 19:44:28 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 16 19:44:28 fir-io7-s1 kernel: Lustre: Skipped 6 previous similar messages Mar 16 19:48:17 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 5 seconds Mar 16 19:48:17 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 398 previous similar messages Mar 16 19:52:56 fir-io7-s1 kernel: LNetError: 92016:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 19:52:56 fir-io7-s1 kernel: LNetError: 92016:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 187 previous similar messages Mar 16 19:53:01 fir-io7-s1 kernel: LNetError: 92597:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 16 19:53:01 fir-io7-s1 kernel: LNetError: 92597:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 308 previous similar messages Mar 16 19:58:22 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.225@o2ib7: 5 seconds Mar 16 19:58:22 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 427 previous similar messages Mar 16 19:58:51 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 16 19:58:51 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 19:59:45 fir-io7-s1 kernel: Lustre: fir-OST004a: haven't heard from client aa68f28a-1fa6-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6f1c5d6c00, cur 1584413985 expire 1584413835 last 1584413758 Mar 16 19:59:45 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 20:02:56 fir-io7-s1 kernel: LNetError: 92016:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 20:02:56 fir-io7-s1 kernel: LNetError: 92016:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 197 previous similar messages Mar 16 20:03:06 fir-io7-s1 kernel: LNetError: 92990:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 16 20:03:06 fir-io7-s1 kernel: LNetError: 92990:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 343 previous similar messages Mar 16 20:07:28 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 16 20:07:28 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 20:08:27 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds Mar 16 20:08:27 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 402 previous similar messages Mar 16 20:12:56 fir-io7-s1 kernel: LNetError: 93735:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 20:12:56 fir-io7-s1 kernel: LNetError: 93735:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 180 previous similar messages Mar 16 20:13:06 fir-io7-s1 kernel: LNetError: 93387:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 16 20:13:06 fir-io7-s1 kernel: LNetError: 93387:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 315 previous similar messages Mar 16 20:14:13 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 16 20:14:13 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 20:15:26 fir-io7-s1 kernel: Lustre: fir-OST0048: haven't heard from client 6d4fbd75-65c6-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c7787165800, cur 1584414926 expire 1584414776 last 1584414699 Mar 16 20:15:26 fir-io7-s1 kernel: Lustre: Skipped 11 previous similar messages Mar 16 20:18:31 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.232@o2ib7: 0 seconds Mar 16 20:18:31 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 453 previous similar messages Mar 16 20:21:54 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 16 20:21:54 fir-io7-s1 kernel: Lustre: Skipped 4 previous similar messages Mar 16 20:22:56 fir-io7-s1 kernel: LNetError: 93769:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 20:22:56 fir-io7-s1 kernel: LNetError: 93769:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 177 previous similar messages Mar 16 20:23:11 fir-io7-s1 kernel: LNetError: 94131:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 16 20:23:11 fir-io7-s1 kernel: LNetError: 94131:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 315 previous similar messages Mar 16 20:28:32 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.225@o2ib7: 0 seconds Mar 16 20:28:32 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 432 previous similar messages Mar 16 20:29:28 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 16 20:29:28 fir-io7-s1 kernel: Lustre: Skipped 6 previous similar messages Mar 16 20:30:11 fir-io7-s1 kernel: Lustre: fir-OST0050: haven't heard from client ab955080-d485-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c51af4e0000, cur 1584415811 expire 1584415661 last 1584415584 Mar 16 20:30:11 fir-io7-s1 kernel: Lustre: Skipped 11 previous similar messages Mar 16 20:32:56 fir-io7-s1 kernel: LNetError: 94498:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 20:32:56 fir-io7-s1 kernel: LNetError: 94498:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 215 previous similar messages Mar 16 20:33:16 fir-io7-s1 kernel: LNetError: 94498:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 16 20:33:16 fir-io7-s1 kernel: LNetError: 94498:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 371 previous similar messages Mar 16 20:37:28 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 16 20:37:28 fir-io7-s1 kernel: Lustre: Skipped 4 previous similar messages Mar 16 20:38:42 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.237@o2ib7: 5 seconds Mar 16 20:38:42 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 496 previous similar messages Mar 16 20:43:02 fir-io7-s1 kernel: LNetError: 94498:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 20:43:02 fir-io7-s1 kernel: LNetError: 94498:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 179 previous similar messages Mar 16 20:43:17 fir-io7-s1 kernel: LNetError: 94498:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 16 20:43:17 fir-io7-s1 kernel: LNetError: 94498:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 301 previous similar messages Mar 16 20:44:45 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 16 20:44:45 fir-io7-s1 kernel: Lustre: Skipped 6 previous similar messages Mar 16 20:45:41 fir-io7-s1 kernel: Lustre: fir-OST004a: haven't heard from client 89d9eed7-4475-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c54d4901400, cur 1584416741 expire 1584416591 last 1584416514 Mar 16 20:45:41 fir-io7-s1 kernel: Lustre: Skipped 11 previous similar messages Mar 16 20:48:47 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.237@o2ib7: 5 seconds Mar 16 20:48:47 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 402 previous similar messages Mar 16 20:51:40 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 16 20:51:40 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 20:53:02 fir-io7-s1 kernel: LNetError: 94498:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 20:53:02 fir-io7-s1 kernel: LNetError: 94498:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 169 previous similar messages Mar 16 20:53:17 fir-io7-s1 kernel: LNetError: 95288:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 16 20:53:17 fir-io7-s1 kernel: LNetError: 95288:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 308 previous similar messages Mar 16 20:57:29 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 16 20:57:29 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 20:58:49 fir-io7-s1 kernel: Lustre: fir-OST004a: haven't heard from client 8c9e3460-d89a-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c698e534400, cur 1584417529 expire 1584417379 last 1584417302 Mar 16 20:58:49 fir-io7-s1 kernel: Lustre: Skipped 11 previous similar messages Mar 16 20:58:52 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds Mar 16 20:58:52 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 440 previous similar messages Mar 16 21:03:02 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 21:03:02 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 149 previous similar messages Mar 16 21:03:31 fir-io7-s1 kernel: LNetError: 95288:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 16 21:03:31 fir-io7-s1 kernel: LNetError: 95288:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 16 21:04:42 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 16 21:04:42 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 21:08:56 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.226@o2ib7: 0 seconds Mar 16 21:08:56 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 429 previous similar messages Mar 16 21:13:06 fir-io7-s1 kernel: LNetError: 95961:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 21:13:06 fir-io7-s1 kernel: LNetError: 95961:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 174 previous similar messages Mar 16 21:13:36 fir-io7-s1 kernel: LNetError: 95961:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 16 21:13:36 fir-io7-s1 kernel: LNetError: 95961:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 312 previous similar messages Mar 16 21:19:08 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.237@o2ib7: 0 seconds Mar 16 21:19:08 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 396 previous similar messages Mar 16 21:22:02 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 16 21:22:02 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 21:22:53 fir-io7-s1 kernel: Lustre: fir-OST004e: haven't heard from client 3dfee321-99d6-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c799a1c9800, cur 1584418973 expire 1584418823 last 1584418746 Mar 16 21:22:53 fir-io7-s1 kernel: Lustre: Skipped 11 previous similar messages Mar 16 21:23:06 fir-io7-s1 kernel: LNetError: 95961:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 21:23:06 fir-io7-s1 kernel: LNetError: 95961:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 197 previous similar messages Mar 16 21:23:36 fir-io7-s1 kernel: LNetError: 96451:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 16 21:23:36 fir-io7-s1 kernel: LNetError: 96451:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 297 previous similar messages Mar 16 21:29:16 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.235@o2ib7: 0 seconds Mar 16 21:29:16 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 165 previous similar messages Mar 16 21:33:06 fir-io7-s1 kernel: LNetError: 96451:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 21:33:06 fir-io7-s1 kernel: LNetError: 96451:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 180 previous similar messages Mar 16 21:33:41 fir-io7-s1 kernel: LNetError: 96824:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 16 21:33:41 fir-io7-s1 kernel: LNetError: 96824:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 290 previous similar messages Mar 16 21:34:01 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 16 21:34:01 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 21:34:53 fir-io7-s1 kernel: Lustre: fir-OST004c: haven't heard from client 90644034-8ed0-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6990409400, cur 1584419693 expire 1584419543 last 1584419466 Mar 16 21:34:53 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 21:39:21 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.219@o2ib7: 0 seconds Mar 16 21:39:21 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 397 previous similar messages Mar 16 21:43:06 fir-io7-s1 kernel: LNetError: 96824:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 21:43:06 fir-io7-s1 kernel: LNetError: 96824:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 189 previous similar messages Mar 16 21:43:41 fir-io7-s1 kernel: LNetError: 97202:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 16 21:43:41 fir-io7-s1 kernel: LNetError: 97202:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 329 previous similar messages Mar 16 21:47:02 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 16 21:47:02 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 21:47:55 fir-io7-s1 kernel: Lustre: fir-OST0050: haven't heard from client 6bbdb147-7dca-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6990a18400, cur 1584420475 expire 1584420325 last 1584420248 Mar 16 21:47:55 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 21:48:16 fir-io7-s1 kernel: Lustre: fir-OST004c: haven't heard from client 6bbdb147-7dca-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6990a1fc00, cur 1584420496 expire 1584420346 last 1584420269 Mar 16 21:48:16 fir-io7-s1 kernel: Lustre: Skipped 3 previous similar messages Mar 16 21:49:29 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.233@o2ib7: 0 seconds Mar 16 21:49:29 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 457 previous similar messages Mar 16 21:53:06 fir-io7-s1 kernel: LNetError: 97201:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 21:53:06 fir-io7-s1 kernel: LNetError: 97201:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 169 previous similar messages Mar 16 21:53:46 fir-io7-s1 kernel: LNetError: 97202:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 16 21:53:46 fir-io7-s1 kernel: LNetError: 97202:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 16 21:53:58 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 16 21:53:58 fir-io7-s1 kernel: Lustre: Skipped 4 previous similar messages Mar 16 21:54:52 fir-io7-s1 kernel: Lustre: fir-OST004e: haven't heard from client 7c3b957e-2822-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6991c96000, cur 1584420892 expire 1584420742 last 1584420665 Mar 16 21:54:52 fir-io7-s1 kernel: Lustre: Skipped 1 previous similar message Mar 16 21:55:01 fir-io7-s1 kernel: Lustre: fir-OST004a: haven't heard from client 7c3b957e-2822-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6991c92000, cur 1584420901 expire 1584420751 last 1584420674 Mar 16 21:55:01 fir-io7-s1 kernel: Lustre: Skipped 3 previous similar messages Mar 16 21:59:36 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.217@o2ib7: 5 seconds Mar 16 21:59:36 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 363 previous similar messages Mar 16 22:01:04 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 16 22:01:04 fir-io7-s1 kernel: Lustre: Skipped 6 previous similar messages Mar 16 22:02:22 fir-io7-s1 kernel: Lustre: fir-OST004e: haven't heard from client 538c6cb7-b94b-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6f1c5d5c00, cur 1584421342 expire 1584421192 last 1584421115 Mar 16 22:02:22 fir-io7-s1 kernel: Lustre: Skipped 1 previous similar message Mar 16 22:03:07 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 22:03:07 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 159 previous similar messages Mar 16 22:03:46 fir-io7-s1 kernel: LNetError: 97984:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 16 22:03:46 fir-io7-s1 kernel: LNetError: 97984:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 16 22:08:05 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 16 22:08:05 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 22:08:58 fir-io7-s1 kernel: Lustre: fir-OST0048: haven't heard from client 39d3e7fe-1a81-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c803b02cc00, cur 1584421738 expire 1584421588 last 1584421511 Mar 16 22:08:58 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 22:09:46 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.232@o2ib7: 10 seconds Mar 16 22:09:46 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 469 previous similar messages Mar 16 22:13:11 fir-io7-s1 kernel: LNetError: 97201:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 22:13:11 fir-io7-s1 kernel: LNetError: 97201:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 200 previous similar messages Mar 16 22:13:46 fir-io7-s1 kernel: LNetError: 98262:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 16 22:13:46 fir-io7-s1 kernel: LNetError: 98262:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 336 previous similar messages Mar 16 22:14:32 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 16 22:14:32 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 22:15:26 fir-io7-s1 kernel: Lustre: fir-OST0052: haven't heard from client 352c2869-a286-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c7ae1d7f400, cur 1584422126 expire 1584421976 last 1584421899 Mar 16 22:15:26 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 22:19:47 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.233@o2ib7: 1 seconds Mar 16 22:19:47 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 399 previous similar messages Mar 16 22:21:21 fir-io7-s1 kernel: Lustre: fir-OST004a: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 16 22:21:21 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 22:22:17 fir-io7-s1 kernel: Lustre: fir-OST0052: haven't heard from client 38180d71-907b-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c89e5bd7000, cur 1584422537 expire 1584422387 last 1584422310 Mar 16 22:22:17 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 22:23:16 fir-io7-s1 kernel: LNetError: 98262:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 22:23:16 fir-io7-s1 kernel: LNetError: 98262:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 157 previous similar messages Mar 16 22:23:46 fir-io7-s1 kernel: LNetError: 98801:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 16 22:23:46 fir-io7-s1 kernel: LNetError: 98801:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 16 22:27:51 fir-io7-s1 kernel: Lustre: fir-OST004a: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 16 22:27:51 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 22:28:46 fir-io7-s1 kernel: Lustre: fir-OST0048: haven't heard from client 47073aca-09fe-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c51af39b800, cur 1584422926 expire 1584422776 last 1584422699 Mar 16 22:28:46 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 22:29:49 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.217@o2ib7: 0 seconds Mar 16 22:29:49 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 448 previous similar messages Mar 16 22:33:16 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 22:33:16 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 166 previous similar messages Mar 16 22:33:51 fir-io7-s1 kernel: LNetError: 98801:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 16 22:33:51 fir-io7-s1 kernel: LNetError: 98801:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 16 22:33:57 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 16 22:33:57 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 22:34:54 fir-io7-s1 kernel: Lustre: fir-OST004c: haven't heard from client 4605bcc2-6092-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c79cd7e3000, cur 1584423294 expire 1584423144 last 1584423067 Mar 16 22:34:54 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 22:39:57 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.225@o2ib7: 1 seconds Mar 16 22:39:57 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 806 previous similar messages Mar 16 22:43:16 fir-io7-s1 kernel: LNetError: 99202:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 22:43:16 fir-io7-s1 kernel: LNetError: 99202:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 160 previous similar messages Mar 16 22:43:56 fir-io7-s1 kernel: LNetError: 99571:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 16 22:43:56 fir-io7-s1 kernel: LNetError: 99571:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 330 previous similar messages Mar 16 22:46:18 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 16 22:46:18 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 22:47:01 fir-io7-s1 kernel: Lustre: fir-OST004a: haven't heard from client b530962c-4cdb-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c698ea3e000, cur 1584424021 expire 1584423871 last 1584423794 Mar 16 22:47:01 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 22:50:06 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 0 seconds Mar 16 22:50:06 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 838 previous similar messages Mar 16 22:53:16 fir-io7-s1 kernel: LNetError: 99571:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 22:53:16 fir-io7-s1 kernel: LNetError: 99571:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 90 previous similar messages Mar 16 22:53:28 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 16 22:53:28 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 22:53:56 fir-io7-s1 kernel: LNetError: 99954:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 16 22:53:56 fir-io7-s1 kernel: LNetError: 99954:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 302 previous similar messages Mar 16 22:55:57 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 16 22:55:57 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 22:57:15 fir-io7-s1 kernel: Lustre: fir-OST0052: haven't heard from client ac5dd0f6-5df4-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c7c1a9b9800, cur 1584424635 expire 1584424485 last 1584424408 Mar 16 22:57:15 fir-io7-s1 kernel: Lustre: Skipped 11 previous similar messages Mar 16 23:00:12 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.225@o2ib7: 6 seconds Mar 16 23:00:12 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 562 previous similar messages Mar 16 23:03:21 fir-io7-s1 kernel: LNetError: 99954:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 23:03:21 fir-io7-s1 kernel: LNetError: 99954:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 121 previous similar messages Mar 16 23:04:01 fir-io7-s1 kernel: LNetError: 100440:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 16 23:04:01 fir-io7-s1 kernel: LNetError: 100440:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 295 previous similar messages Mar 16 23:05:32 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 16 23:05:32 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 23:10:22 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.232@o2ib7: 8 seconds Mar 16 23:10:22 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 719 previous similar messages Mar 16 23:11:56 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 16 23:11:56 fir-io7-s1 kernel: Lustre: Skipped 4 previous similar messages Mar 16 23:12:50 fir-io7-s1 kernel: Lustre: fir-OST004a: haven't heard from client 3135b037-8f09-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6b34efec00, cur 1584425570 expire 1584425420 last 1584425343 Mar 16 23:12:50 fir-io7-s1 kernel: Lustre: Skipped 11 previous similar messages Mar 16 23:13:21 fir-io7-s1 kernel: LNetError: 100440:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 23:13:21 fir-io7-s1 kernel: LNetError: 100440:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 98 previous similar messages Mar 16 23:14:01 fir-io7-s1 kernel: LNetError: 100440:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 16 23:14:01 fir-io7-s1 kernel: LNetError: 100440:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 301 previous similar messages Mar 16 23:20:23 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.233@o2ib7: 1 seconds Mar 16 23:20:23 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 581 previous similar messages Mar 16 23:23:31 fir-io7-s1 kernel: LNetError: 101098:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 23:23:31 fir-io7-s1 kernel: LNetError: 101098:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 112 previous similar messages Mar 16 23:24:06 fir-io7-s1 kernel: LNetError: 101098:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 16 23:24:06 fir-io7-s1 kernel: LNetError: 101098:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 318 previous similar messages Mar 16 23:25:25 fir-io7-s1 kernel: Lustre: fir-OST0048: haven't heard from client 80d08068-682f-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6991ced800, cur 1584426325 expire 1584426175 last 1584426098 Mar 16 23:25:25 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 23:26:57 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 16 23:26:57 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 23:30:27 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.219@o2ib7: 1 seconds Mar 16 23:30:27 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 684 previous similar messages Mar 16 23:33:36 fir-io7-s1 kernel: LNetError: 101098:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 23:33:36 fir-io7-s1 kernel: LNetError: 101098:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 122 previous similar messages Mar 16 23:34:06 fir-io7-s1 kernel: LNetError: 101098:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 16 23:34:06 fir-io7-s1 kernel: LNetError: 101098:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 322 previous similar messages Mar 16 23:37:55 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 16 23:37:55 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 23:39:07 fir-io7-s1 kernel: Lustre: fir-OST0052: haven't heard from client d473f17f-04f4-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6991f96400, cur 1584427147 expire 1584426997 last 1584426920 Mar 16 23:39:07 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 23:40:32 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.232@o2ib7: 0 seconds Mar 16 23:40:32 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 689 previous similar messages Mar 16 23:43:36 fir-io7-s1 kernel: LNetError: 101971:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 23:43:36 fir-io7-s1 kernel: LNetError: 101971:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 145 previous similar messages Mar 16 23:44:11 fir-io7-s1 kernel: LNetError: 101098:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 16 23:44:11 fir-io7-s1 kernel: LNetError: 101098:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 364 previous similar messages Mar 16 23:49:44 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 16 23:49:44 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 23:50:37 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.219@o2ib7: 3 seconds Mar 16 23:50:37 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 651 previous similar messages Mar 16 23:50:38 fir-io7-s1 kernel: Lustre: fir-OST004c: haven't heard from client b1a5af66-a9b6-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c69929a2c00, cur 1584427838 expire 1584427688 last 1584427611 Mar 16 23:50:38 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 16 23:53:36 fir-io7-s1 kernel: LNetError: 102052:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 16 23:53:36 fir-io7-s1 kernel: LNetError: 102052:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 177 previous similar messages Mar 16 23:54:16 fir-io7-s1 kernel: LNetError: 102403:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 16 23:54:16 fir-io7-s1 kernel: LNetError: 102403:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 357 previous similar messages Mar 17 00:00:38 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.232@o2ib7: 2 seconds Mar 17 00:00:38 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 605 previous similar messages Mar 17 00:02:44 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 17 00:02:44 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 17 00:03:27 fir-io7-s1 kernel: Lustre: fir-OST0048: haven't heard from client 0350f539-9f48-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c79ced88800, cur 1584428607 expire 1584428457 last 1584428380 Mar 17 00:03:27 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 17 00:03:36 fir-io7-s1 kernel: LNetError: 102676:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 00:03:36 fir-io7-s1 kernel: LNetError: 102676:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 206 previous similar messages Mar 17 00:04:21 fir-io7-s1 kernel: LNetError: 102676:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 17 00:04:21 fir-io7-s1 kernel: LNetError: 102676:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 382 previous similar messages Mar 17 00:10:42 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.238@o2ib7: 0 seconds Mar 17 00:10:42 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 553 previous similar messages Mar 17 00:13:36 fir-io7-s1 kernel: LNetError: 102648:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 00:13:36 fir-io7-s1 kernel: LNetError: 102648:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 230 previous similar messages Mar 17 00:14:26 fir-io7-s1 kernel: LNetError: 102676:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 17 00:14:26 fir-io7-s1 kernel: LNetError: 102676:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 358 previous similar messages Mar 17 00:15:44 fir-io7-s1 kernel: Lustre: fir-OST004c: haven't heard from client 6dea263f-344e-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c698f5bfc00, cur 1584429344 expire 1584429194 last 1584429117 Mar 17 00:15:44 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 17 00:16:41 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 17 00:16:41 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 17 00:20:46 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds Mar 17 00:20:46 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 574 previous similar messages Mar 17 00:23:41 fir-io7-s1 kernel: LNetError: 103490:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 00:23:41 fir-io7-s1 kernel: LNetError: 103490:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 186 previous similar messages Mar 17 00:24:36 fir-io7-s1 kernel: LNetError: 103490:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 17 00:24:36 fir-io7-s1 kernel: LNetError: 103490:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 336 previous similar messages Mar 17 00:28:50 fir-io7-s1 kernel: Lustre: fir-OST004a: haven't heard from client 89248df3-9543-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c768773b400, cur 1584430130 expire 1584429980 last 1584429903 Mar 17 00:28:50 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 17 00:29:43 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 17 00:29:43 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 17 00:30:52 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.226@o2ib7: 6 seconds Mar 17 00:30:52 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 536 previous similar messages Mar 17 00:33:41 fir-io7-s1 kernel: LNetError: 103490:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 00:33:41 fir-io7-s1 kernel: LNetError: 103490:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 165 previous similar messages Mar 17 00:34:36 fir-io7-s1 kernel: LNetError: 103946:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 17 00:34:36 fir-io7-s1 kernel: LNetError: 103946:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 318 previous similar messages Mar 17 00:39:50 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 17 00:39:50 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 17 00:40:57 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.225@o2ib7: 1 seconds Mar 17 00:40:57 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 572 previous similar messages Mar 17 00:41:03 fir-io7-s1 kernel: Lustre: fir-OST0050: haven't heard from client 457d862e-dffa-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6efeaf1000, cur 1584430863 expire 1584430713 last 1584430636 Mar 17 00:41:03 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 17 00:43:46 fir-io7-s1 kernel: LNetError: 103946:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 00:43:46 fir-io7-s1 kernel: LNetError: 103946:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 180 previous similar messages Mar 17 00:44:36 fir-io7-s1 kernel: LNetError: 104337:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 17 00:44:36 fir-io7-s1 kernel: LNetError: 104337:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 322 previous similar messages Mar 17 00:51:02 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.232@o2ib7: 0 seconds Mar 17 00:51:02 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 554 previous similar messages Mar 17 00:52:33 fir-io7-s1 kernel: Lustre: fir-OST0050: haven't heard from client faf4655a-d9dd-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6b34efec00, cur 1584431553 expire 1584431403 last 1584431326 Mar 17 00:52:33 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 17 00:52:47 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 17 00:52:47 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 17 00:53:46 fir-io7-s1 kernel: LNetError: 104602:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 00:53:46 fir-io7-s1 kernel: LNetError: 104602:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 185 previous similar messages Mar 17 00:54:41 fir-io7-s1 kernel: LNetError: 104602:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 17 00:54:41 fir-io7-s1 kernel: LNetError: 104602:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 336 previous similar messages Mar 17 01:01:07 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.239@o2ib7: 1 seconds Mar 17 01:01:07 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 547 previous similar messages Mar 17 01:03:08 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 17 01:03:08 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 17 01:03:46 fir-io7-s1 kernel: LNetError: 104986:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 01:03:46 fir-io7-s1 kernel: LNetError: 104986:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 168 previous similar messages Mar 17 01:04:03 fir-io7-s1 kernel: Lustre: fir-OST0050: haven't heard from client fad43a06-12c6-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c66bd34fc00, cur 1584432243 expire 1584432093 last 1584432016 Mar 17 01:04:03 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 17 01:04:41 fir-io7-s1 kernel: LNetError: 104986:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 17 01:04:41 fir-io7-s1 kernel: LNetError: 104986:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 322 previous similar messages Mar 17 01:11:20 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.226@o2ib7: 0 seconds Mar 17 01:11:20 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 490 previous similar messages Mar 17 01:13:46 fir-io7-s1 kernel: LNetError: 104986:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 01:13:46 fir-io7-s1 kernel: LNetError: 104986:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 163 previous similar messages Mar 17 01:14:41 fir-io7-s1 kernel: LNetError: 105509:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 17 01:14:41 fir-io7-s1 kernel: LNetError: 105509:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 301 previous similar messages Mar 17 01:21:21 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 0 seconds Mar 17 01:21:21 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 529 previous similar messages Mar 17 01:23:46 fir-io7-s1 kernel: LNetError: 105509:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 01:23:46 fir-io7-s1 kernel: LNetError: 105509:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 186 previous similar messages Mar 17 01:24:46 fir-io7-s1 kernel: LNetError: 105885:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 17 01:24:46 fir-io7-s1 kernel: LNetError: 105885:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 343 previous similar messages Mar 17 01:31:22 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.232@o2ib7: 0 seconds Mar 17 01:31:22 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 497 previous similar messages Mar 17 01:33:46 fir-io7-s1 kernel: LNetError: 105885:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 01:33:46 fir-io7-s1 kernel: LNetError: 105885:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 239 previous similar messages Mar 17 01:34:46 fir-io7-s1 kernel: LNetError: 106265:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 17 01:34:46 fir-io7-s1 kernel: LNetError: 106265:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 399 previous similar messages Mar 17 01:41:23 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.232@o2ib7: 4 seconds Mar 17 01:41:23 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 507 previous similar messages Mar 17 01:43:51 fir-io7-s1 kernel: LNetError: 106265:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 01:43:51 fir-io7-s1 kernel: LNetError: 106265:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 212 previous similar messages Mar 17 01:44:51 fir-io7-s1 kernel: LNetError: 106656:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 17 01:44:51 fir-io7-s1 kernel: LNetError: 106656:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 329 previous similar messages Mar 17 01:51:31 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.225@o2ib7: 0 seconds Mar 17 01:51:31 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 331 previous similar messages Mar 17 01:53:56 fir-io7-s1 kernel: LNetError: 106656:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 01:53:56 fir-io7-s1 kernel: LNetError: 106656:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 203 previous similar messages Mar 17 01:55:01 fir-io7-s1 kernel: LNetError: 107034:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 17 01:55:01 fir-io7-s1 kernel: LNetError: 107034:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 341 previous similar messages Mar 17 02:01:49 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds Mar 17 02:01:49 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 310 previous similar messages Mar 17 02:03:56 fir-io7-s1 kernel: LNetError: 107034:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 02:03:56 fir-io7-s1 kernel: LNetError: 107034:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 182 previous similar messages Mar 17 02:05:06 fir-io7-s1 kernel: LNetError: 107438:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 17 02:05:06 fir-io7-s1 kernel: LNetError: 107438:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 306 previous similar messages Mar 17 02:11:57 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 5 seconds Mar 17 02:11:57 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 355 previous similar messages Mar 17 02:13:56 fir-io7-s1 kernel: LNetError: 106179:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 02:13:56 fir-io7-s1 kernel: LNetError: 106179:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 177 previous similar messages Mar 17 02:15:16 fir-io7-s1 kernel: LNetError: 107438:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 17 02:15:16 fir-io7-s1 kernel: LNetError: 107438:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 329 previous similar messages Mar 17 02:22:07 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.217@o2ib7: 5 seconds Mar 17 02:22:07 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 475 previous similar messages Mar 17 02:24:01 fir-io7-s1 kernel: LNetError: 93137:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 02:24:01 fir-io7-s1 kernel: LNetError: 93137:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 163 previous similar messages Mar 17 02:25:16 fir-io7-s1 kernel: LNetError: 107879:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 17 02:25:16 fir-io7-s1 kernel: LNetError: 107879:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 17 02:32:08 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.226@o2ib7: 9 seconds Mar 17 02:32:08 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 459 previous similar messages Mar 17 02:34:01 fir-io7-s1 kernel: LNetError: 108528:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 02:34:01 fir-io7-s1 kernel: LNetError: 108528:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 173 previous similar messages Mar 17 02:35:21 fir-io7-s1 kernel: LNetError: 108528:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 17 02:35:21 fir-io7-s1 kernel: LNetError: 108528:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 301 previous similar messages Mar 17 02:42:17 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.232@o2ib7: 0 seconds Mar 17 02:42:17 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 496 previous similar messages Mar 17 02:44:01 fir-io7-s1 kernel: LNetError: 108778:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 02:44:01 fir-io7-s1 kernel: LNetError: 108778:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 158 previous similar messages Mar 17 02:45:26 fir-io7-s1 kernel: LNetError: 93137:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 17 02:45:26 fir-io7-s1 kernel: LNetError: 93137:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 293 previous similar messages Mar 17 02:52:45 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.237@o2ib7: 0 seconds Mar 17 02:52:45 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 374 previous similar messages Mar 17 02:54:01 fir-io7-s1 kernel: LNetError: 109290:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 02:54:01 fir-io7-s1 kernel: LNetError: 109290:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 167 previous similar messages Mar 17 02:55:26 fir-io7-s1 kernel: LNetError: 109290:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 17 02:55:26 fir-io7-s1 kernel: LNetError: 109290:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 299 previous similar messages Mar 17 03:02:52 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.237@o2ib7: 2 seconds Mar 17 03:02:52 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 172 previous similar messages Mar 17 03:04:01 fir-io7-s1 kernel: LNetError: 109290:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 03:04:01 fir-io7-s1 kernel: LNetError: 109290:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 202 previous similar messages Mar 17 03:05:26 fir-io7-s1 kernel: LNetError: 109755:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 17 03:05:26 fir-io7-s1 kernel: LNetError: 109755:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 292 previous similar messages Mar 17 03:08:49 fir-io7-s1 kernel: Lustre: fir-OST004c: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 17 03:08:49 fir-io7-s1 kernel: Lustre: Skipped 4 previous similar messages Mar 17 03:09:45 fir-io7-s1 kernel: Lustre: fir-OST0050: haven't heard from client bef97b9d-51e8-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c7b14733c00, cur 1584439785 expire 1584439635 last 1584439558 Mar 17 03:09:45 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 17 03:13:02 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.225@o2ib7: 0 seconds Mar 17 03:13:02 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 213 previous similar messages Mar 17 03:14:01 fir-io7-s1 kernel: LNetError: 108984:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 03:14:01 fir-io7-s1 kernel: LNetError: 108984:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 206 previous similar messages Mar 17 03:15:26 fir-io7-s1 kernel: LNetError: 109755:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 17 03:15:26 fir-io7-s1 kernel: LNetError: 109755:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 302 previous similar messages Mar 17 03:23:12 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 1 seconds Mar 17 03:23:12 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 406 previous similar messages Mar 17 03:24:01 fir-io7-s1 kernel: LNetError: 110178:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 03:24:01 fir-io7-s1 kernel: LNetError: 110178:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 200 previous similar messages Mar 17 03:25:31 fir-io7-s1 kernel: LNetError: 110515:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 17 03:25:31 fir-io7-s1 kernel: LNetError: 110515:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 301 previous similar messages Mar 17 03:33:17 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 4 seconds Mar 17 03:33:17 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 448 previous similar messages Mar 17 03:34:06 fir-io7-s1 kernel: LNetError: 110706:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 03:34:06 fir-io7-s1 kernel: LNetError: 110706:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 177 previous similar messages Mar 17 03:35:31 fir-io7-s1 kernel: LNetError: 110706:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 17 03:35:31 fir-io7-s1 kernel: LNetError: 110706:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 304 previous similar messages Mar 17 03:43:20 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.226@o2ib7: 0 seconds Mar 17 03:43:20 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 525 previous similar messages Mar 17 03:44:06 fir-io7-s1 kernel: LNetError: 110778:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 03:44:06 fir-io7-s1 kernel: LNetError: 110778:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 172 previous similar messages Mar 17 03:45:15 fir-io7-s1 kernel: Lustre: fir-OST0052: haven't heard from client a1ba2593-106e-4 (at 10.50.5.52@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c4f90092400, cur 1584441915 expire 1584441765 last 1584441688 Mar 17 03:45:15 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 17 03:45:18 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to a1ba2593-106e-4 (at 10.50.5.52@o2ib2) Mar 17 03:45:18 fir-io7-s1 kernel: Lustre: Skipped 4 previous similar messages Mar 17 03:45:41 fir-io7-s1 kernel: Lustre: fir-OST0050: haven't heard from client a1ba2593-106e-4 (at 10.50.5.52@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c8243e2a000, cur 1584441941 expire 1584441791 last 1584441714 Mar 17 03:45:41 fir-io7-s1 kernel: Lustre: Skipped 1 previous similar message Mar 17 03:45:41 fir-io7-s1 kernel: LNetError: 110955:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 17 03:45:41 fir-io7-s1 kernel: LNetError: 110955:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 17 03:53:27 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 7 seconds Mar 17 03:53:27 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 587 previous similar messages Mar 17 03:54:06 fir-io7-s1 kernel: LNetError: 111369:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 03:54:06 fir-io7-s1 kernel: LNetError: 111369:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 163 previous similar messages Mar 17 03:55:46 fir-io7-s1 kernel: LNetError: 111715:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 17 03:55:46 fir-io7-s1 kernel: LNetError: 111715:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 288 previous similar messages Mar 17 04:03:32 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 0 seconds Mar 17 04:03:32 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 515 previous similar messages Mar 17 04:04:06 fir-io7-s1 kernel: LNetError: 111715:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 04:04:06 fir-io7-s1 kernel: LNetError: 111715:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 149 previous similar messages Mar 17 04:05:46 fir-io7-s1 kernel: LNetError: 112113:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 17 04:05:46 fir-io7-s1 kernel: LNetError: 112113:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 280 previous similar messages Mar 17 04:13:50 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.234@o2ib7: 0 seconds Mar 17 04:13:50 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 136 previous similar messages Mar 17 04:14:06 fir-io7-s1 kernel: LNetError: 112417:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 04:14:06 fir-io7-s1 kernel: LNetError: 112417:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 172 previous similar messages Mar 17 04:15:46 fir-io7-s1 kernel: LNetError: 112417:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 17 04:15:46 fir-io7-s1 kernel: LNetError: 112417:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 289 previous similar messages Mar 17 04:24:06 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.233@o2ib7: 0 seconds Mar 17 04:24:06 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 369 previous similar messages Mar 17 04:24:06 fir-io7-s1 kernel: LNetError: 112769:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 04:24:06 fir-io7-s1 kernel: LNetError: 112769:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 171 previous similar messages Mar 17 04:25:46 fir-io7-s1 kernel: LNetError: 112769:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 17 04:25:46 fir-io7-s1 kernel: LNetError: 112769:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 302 previous similar messages Mar 17 04:34:06 fir-io7-s1 kernel: LNetError: 113231:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 04:34:06 fir-io7-s1 kernel: LNetError: 113231:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 193 previous similar messages Mar 17 04:34:07 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.233@o2ib7: 10 seconds Mar 17 04:34:07 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 565 previous similar messages Mar 17 04:35:51 fir-io7-s1 kernel: LNetError: 113231:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 17 04:35:51 fir-io7-s1 kernel: LNetError: 113231:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 322 previous similar messages Mar 17 04:44:07 fir-io7-s1 kernel: LNetError: 113231:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 04:44:07 fir-io7-s1 kernel: LNetError: 113231:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 214 previous similar messages Mar 17 04:44:17 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.237@o2ib7: 4 seconds Mar 17 04:44:17 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 448 previous similar messages Mar 17 04:45:51 fir-io7-s1 kernel: LNetError: 113637:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 17 04:45:51 fir-io7-s1 kernel: LNetError: 113637:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 327 previous similar messages Mar 17 04:54:07 fir-io7-s1 kernel: LNetError: 113637:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 04:54:07 fir-io7-s1 kernel: LNetError: 113637:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 217 previous similar messages Mar 17 04:54:17 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.217@o2ib7: 5 seconds Mar 17 04:54:17 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 352 previous similar messages Mar 17 04:55:57 fir-io7-s1 kernel: LNetError: 114022:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 17 04:55:57 fir-io7-s1 kernel: LNetError: 114022:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 319 previous similar messages Mar 17 05:04:16 fir-io7-s1 kernel: LNetError: 114379:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 05:04:16 fir-io7-s1 kernel: LNetError: 114379:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 181 previous similar messages Mar 17 05:04:18 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.217@o2ib7: 0 seconds Mar 17 05:04:18 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 524 previous similar messages Mar 17 05:06:01 fir-io7-s1 kernel: LNetError: 114379:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 17 05:06:01 fir-io7-s1 kernel: LNetError: 114379:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 298 previous similar messages Mar 17 05:14:16 fir-io7-s1 kernel: LNetError: 114726:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 05:14:16 fir-io7-s1 kernel: LNetError: 114726:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 196 previous similar messages Mar 17 05:14:21 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.237@o2ib7: 5 seconds Mar 17 05:14:21 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 325 previous similar messages Mar 17 05:16:06 fir-io7-s1 kernel: LNetError: 114379:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 17 05:16:06 fir-io7-s1 kernel: LNetError: 114379:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 17 05:16:49 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 17 05:16:49 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 17 05:18:03 fir-io7-s1 kernel: Lustre: fir-OST0050: haven't heard from client e5fed329-a06c-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6f91f93400, cur 1584447483 expire 1584447333 last 1584447256 Mar 17 05:18:03 fir-io7-s1 kernel: Lustre: Skipped 3 previous similar messages Mar 17 05:24:16 fir-io7-s1 kernel: LNetError: 111958:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 05:24:16 fir-io7-s1 kernel: LNetError: 111958:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 182 previous similar messages Mar 17 05:24:27 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.237@o2ib7: 0 seconds Mar 17 05:24:27 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 466 previous similar messages Mar 17 05:26:11 fir-io7-s1 kernel: LNetError: 114879:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 17 05:26:11 fir-io7-s1 kernel: LNetError: 114879:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 301 previous similar messages Mar 17 05:34:21 fir-io7-s1 kernel: LNetError: 115255:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 05:34:21 fir-io7-s1 kernel: LNetError: 115255:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 158 previous similar messages Mar 17 05:34:31 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 0 seconds Mar 17 05:34:31 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 461 previous similar messages Mar 17 05:36:11 fir-io7-s1 kernel: LNetError: 115564:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 17 05:36:11 fir-io7-s1 kernel: LNetError: 115564:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 17 05:44:21 fir-io7-s1 kernel: LNetError: 115803:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 05:44:21 fir-io7-s1 kernel: LNetError: 115803:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 180 previous similar messages Mar 17 05:44:36 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 7 seconds Mar 17 05:44:36 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 485 previous similar messages Mar 17 05:46:16 fir-io7-s1 kernel: LNetError: 115803:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 17 05:46:16 fir-io7-s1 kernel: LNetError: 115803:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 310 previous similar messages Mar 17 05:54:21 fir-io7-s1 kernel: LNetError: 116024:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 05:54:21 fir-io7-s1 kernel: LNetError: 116024:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 180 previous similar messages Mar 17 05:54:43 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 0 seconds Mar 17 05:54:43 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 404 previous similar messages Mar 17 05:56:16 fir-io7-s1 kernel: LNetError: 116328:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 17 05:56:16 fir-io7-s1 kernel: LNetError: 116328:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 308 previous similar messages Mar 17 06:04:21 fir-io7-s1 kernel: LNetError: 116328:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 06:04:21 fir-io7-s1 kernel: LNetError: 116328:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 170 previous similar messages Mar 17 06:04:51 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.239@o2ib7: 5 seconds Mar 17 06:04:51 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 353 previous similar messages Mar 17 06:06:21 fir-io7-s1 kernel: LNetError: 116725:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 17 06:06:21 fir-io7-s1 kernel: LNetError: 116725:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 17 06:14:21 fir-io7-s1 kernel: LNetError: 116682:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 06:14:21 fir-io7-s1 kernel: LNetError: 116682:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 172 previous similar messages Mar 17 06:14:56 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.219@o2ib7: 9 seconds Mar 17 06:14:56 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 419 previous similar messages Mar 17 06:16:26 fir-io7-s1 kernel: LNetError: 116724:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 17 06:16:26 fir-io7-s1 kernel: LNetError: 116724:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 300 previous similar messages Mar 17 06:24:31 fir-io7-s1 kernel: LNetError: 117366:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 06:24:31 fir-io7-s1 kernel: LNetError: 117366:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 174 previous similar messages Mar 17 06:25:01 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 5 seconds Mar 17 06:25:01 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 447 previous similar messages Mar 17 06:26:31 fir-io7-s1 kernel: LNetError: 117366:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 17 06:26:31 fir-io7-s1 kernel: LNetError: 117366:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 17 06:34:31 fir-io7-s1 kernel: LNetError: 117860:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 06:34:31 fir-io7-s1 kernel: LNetError: 117860:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 162 previous similar messages Mar 17 06:35:02 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.235@o2ib7: 0 seconds Mar 17 06:35:02 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 449 previous similar messages Mar 17 06:36:31 fir-io7-s1 kernel: LNetError: 117860:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 17 06:36:31 fir-io7-s1 kernel: LNetError: 117860:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 280 previous similar messages Mar 17 06:44:31 fir-io7-s1 kernel: LNetError: 117860:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 06:44:31 fir-io7-s1 kernel: LNetError: 117860:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 167 previous similar messages Mar 17 06:45:04 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.239@o2ib7: 0 seconds Mar 17 06:45:04 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 455 previous similar messages Mar 17 06:46:31 fir-io7-s1 kernel: LNetError: 118248:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 17 06:46:31 fir-io7-s1 kernel: LNetError: 118248:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 17 06:54:31 fir-io7-s1 kernel: LNetError: 118248:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 06:54:31 fir-io7-s1 kernel: LNetError: 118248:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 176 previous similar messages Mar 17 06:55:16 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.239@o2ib7: 0 seconds Mar 17 06:55:16 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 404 previous similar messages Mar 17 06:56:31 fir-io7-s1 kernel: LNetError: 118625:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 17 06:56:31 fir-io7-s1 kernel: LNetError: 118625:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 17 07:04:31 fir-io7-s1 kernel: LNetError: 118625:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 07:04:31 fir-io7-s1 kernel: LNetError: 118625:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 176 previous similar messages Mar 17 07:05:23 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.235@o2ib7: 0 seconds Mar 17 07:05:23 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 409 previous similar messages Mar 17 07:06:36 fir-io7-s1 kernel: LNetError: 119017:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 17 07:06:36 fir-io7-s1 kernel: LNetError: 119017:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 301 previous similar messages Mar 17 07:14:31 fir-io7-s1 kernel: LNetError: 119357:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 07:14:31 fir-io7-s1 kernel: LNetError: 119357:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 166 previous similar messages Mar 17 07:15:35 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.219@o2ib7: 0 seconds Mar 17 07:15:35 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 455 previous similar messages Mar 17 07:16:41 fir-io7-s1 kernel: LNetError: 119357:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 17 07:16:41 fir-io7-s1 kernel: LNetError: 119357:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 17 07:17:28 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 17 07:17:28 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 17 07:18:21 fir-io7-s1 kernel: Lustre: fir-OST004e: haven't heard from client b09288b0-7ef1-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c552bd42c00, cur 1584454701 expire 1584454551 last 1584454474 Mar 17 07:18:21 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 17 07:18:22 fir-io7-s1 kernel: Lustre: fir-OST0050: haven't heard from client b09288b0-7ef1-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c69922e7800, cur 1584454702 expire 1584454552 last 1584454475 Mar 17 07:18:23 fir-io7-s1 kernel: Lustre: fir-OST004c: haven't heard from client b09288b0-7ef1-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c69915f1000, cur 1584454703 expire 1584454553 last 1584454476 Mar 17 07:18:23 fir-io7-s1 kernel: Lustre: Skipped 1 previous similar message Mar 17 07:18:32 fir-io7-s1 kernel: Lustre: fir-OST004a: haven't heard from client b09288b0-7ef1-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c552bd41c00, cur 1584454712 expire 1584454562 last 1584454485 Mar 17 07:24:31 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 07:24:31 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 174 previous similar messages Mar 17 07:25:37 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.219@o2ib7: 7 seconds Mar 17 07:25:37 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 773 previous similar messages Mar 17 07:26:41 fir-io7-s1 kernel: LNetError: 119357:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 17 07:26:41 fir-io7-s1 kernel: LNetError: 119357:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 308 previous similar messages Mar 17 07:29:26 fir-io7-s1 kernel: Lustre: fir-OST0050: haven't heard from client 0484804a-325b-4 (at 10.49.26.4@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c6bfe4cbc00, cur 1584455366 expire 1584455216 last 1584455139 Mar 17 07:29:26 fir-io7-s1 kernel: Lustre: Skipped 1 previous similar message Mar 17 07:29:33 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ae2c84fa-dc5c-4 (at 10.49.26.4@o2ib1) Mar 17 07:29:33 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 17 07:34:41 fir-io7-s1 kernel: LNetError: 119857:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 07:34:41 fir-io7-s1 kernel: LNetError: 119857:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 100 previous similar messages Mar 17 07:35:43 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.219@o2ib7: 0 seconds Mar 17 07:35:43 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 718 previous similar messages Mar 17 07:36:41 fir-io7-s1 kernel: LNetError: 120165:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 17 07:36:41 fir-io7-s1 kernel: LNetError: 120165:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 308 previous similar messages Mar 17 07:44:46 fir-io7-s1 kernel: LNetError: 120396:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 07:44:46 fir-io7-s1 kernel: LNetError: 120396:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 87 previous similar messages Mar 17 07:45:44 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 0 seconds Mar 17 07:45:44 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 684 previous similar messages Mar 17 07:46:41 fir-io7-s1 kernel: LNetError: 120396:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 17 07:46:41 fir-io7-s1 kernel: LNetError: 120396:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 17 07:54:56 fir-io7-s1 kernel: LNetError: 120619:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 07:54:56 fir-io7-s1 kernel: LNetError: 120619:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 87 previous similar messages Mar 17 07:55:46 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds Mar 17 07:55:46 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 647 previous similar messages Mar 17 07:56:41 fir-io7-s1 kernel: LNetError: 120939:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 17 07:56:41 fir-io7-s1 kernel: LNetError: 120939:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 17 08:05:01 fir-io7-s1 kernel: LNetError: 120939:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 08:05:01 fir-io7-s1 kernel: LNetError: 120939:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 113 previous similar messages Mar 17 08:05:47 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.219@o2ib7: 9 seconds Mar 17 08:05:47 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 611 previous similar messages Mar 17 08:06:51 fir-io7-s1 kernel: LNetError: 121337:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 17 08:06:51 fir-io7-s1 kernel: LNetError: 121337:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 308 previous similar messages Mar 17 08:15:11 fir-io7-s1 kernel: LNetError: 121337:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 08:15:11 fir-io7-s1 kernel: LNetError: 121337:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 89 previous similar messages Mar 17 08:15:57 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.219@o2ib7: 7 seconds Mar 17 08:15:57 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 675 previous similar messages Mar 17 08:16:51 fir-io7-s1 kernel: LNetError: 121738:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 17 08:16:51 fir-io7-s1 kernel: LNetError: 121738:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 280 previous similar messages Mar 17 08:25:16 fir-io7-s1 kernel: LNetError: 121965:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 08:25:16 fir-io7-s1 kernel: LNetError: 121965:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 103 previous similar messages Mar 17 08:25:58 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.217@o2ib7: 6 seconds Mar 17 08:25:58 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 646 previous similar messages Mar 17 08:26:56 fir-io7-s1 kernel: LNetError: 121965:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 17 08:26:56 fir-io7-s1 kernel: LNetError: 121965:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 280 previous similar messages Mar 17 08:35:21 fir-io7-s1 kernel: LNetError: 122434:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 08:35:21 fir-io7-s1 kernel: LNetError: 122434:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 94 previous similar messages Mar 17 08:36:07 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.238@o2ib7: 9 seconds Mar 17 08:36:07 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 657 previous similar messages Mar 17 08:37:06 fir-io7-s1 kernel: LNetError: 122434:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 17 08:37:06 fir-io7-s1 kernel: LNetError: 122434:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 280 previous similar messages Mar 17 08:45:26 fir-io7-s1 kernel: LNetError: 122803:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 08:45:26 fir-io7-s1 kernel: LNetError: 122803:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 128 previous similar messages Mar 17 08:46:12 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.236@o2ib7: 0 seconds Mar 17 08:46:12 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 584 previous similar messages Mar 17 08:47:06 fir-io7-s1 kernel: LNetError: 122803:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 17 08:47:06 fir-io7-s1 kernel: LNetError: 122803:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 289 previous similar messages Mar 17 08:55:31 fir-io7-s1 kernel: LNetError: 122803:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 08:55:31 fir-io7-s1 kernel: LNetError: 122803:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 153 previous similar messages Mar 17 08:56:17 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.217@o2ib7: 4 seconds Mar 17 08:56:17 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 493 previous similar messages Mar 17 08:57:11 fir-io7-s1 kernel: LNetError: 123261:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 17 08:57:11 fir-io7-s1 kernel: LNetError: 123261:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 17 09:05:31 fir-io7-s1 kernel: LNetError: 122697:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 09:05:31 fir-io7-s1 kernel: LNetError: 122697:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 106 previous similar messages Mar 17 09:06:21 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.227@o2ib7: 0 seconds Mar 17 09:06:21 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 639 previous similar messages Mar 17 09:07:11 fir-io7-s1 kernel: LNetError: 123261:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 17 09:07:11 fir-io7-s1 kernel: LNetError: 123261:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 17 09:15:31 fir-io7-s1 kernel: LNetError: 123728:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 09:15:31 fir-io7-s1 kernel: LNetError: 123728:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 124 previous similar messages Mar 17 09:16:27 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.227@o2ib7: 7 seconds Mar 17 09:16:27 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 640 previous similar messages Mar 17 09:17:16 fir-io7-s1 kernel: LNetError: 124037:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 17 09:17:16 fir-io7-s1 kernel: LNetError: 124037:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 299 previous similar messages Mar 17 09:25:31 fir-io7-s1 kernel: LNetError: 124037:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 09:25:31 fir-io7-s1 kernel: LNetError: 124037:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 100 previous similar messages Mar 17 09:26:28 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.233@o2ib7: 0 seconds Mar 17 09:26:28 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 693 previous similar messages Mar 17 09:27:21 fir-io7-s1 kernel: LNetError: 124422:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 17 09:27:21 fir-io7-s1 kernel: LNetError: 124422:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 301 previous similar messages Mar 17 09:35:31 fir-io7-s1 kernel: LNetError: 124651:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 09:35:31 fir-io7-s1 kernel: LNetError: 124651:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 174 previous similar messages Mar 17 09:36:32 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 5 seconds Mar 17 09:36:32 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 692 previous similar messages Mar 17 09:37:21 fir-io7-s1 kernel: LNetError: 124651:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.203@o2ib7 rejected: consumer defined fatal error Mar 17 09:37:21 fir-io7-s1 kernel: LNetError: 124651:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 322 previous similar messages Mar 17 09:45:31 fir-io7-s1 kernel: LNetError: 124729:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 09:45:31 fir-io7-s1 kernel: LNetError: 124729:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 205 previous similar messages Mar 17 09:46:35 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.235@o2ib7: 0 seconds Mar 17 09:46:35 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 605 previous similar messages Mar 17 09:47:21 fir-io7-s1 kernel: LNetError: 125100:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 17 09:47:21 fir-io7-s1 kernel: LNetError: 125100:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 322 previous similar messages Mar 17 09:55:41 fir-io7-s1 kernel: LNetError: 125100:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 09:55:41 fir-io7-s1 kernel: LNetError: 125100:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 218 previous similar messages Mar 17 09:56:37 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 6 seconds Mar 17 09:56:37 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 627 previous similar messages Mar 17 09:57:26 fir-io7-s1 kernel: LNetError: 125559:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 17 09:57:26 fir-io7-s1 kernel: LNetError: 125559:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 329 previous similar messages Mar 17 10:05:41 fir-io7-s1 kernel: LNetError: 125559:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 10:05:41 fir-io7-s1 kernel: LNetError: 125559:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 226 previous similar messages Mar 17 10:06:42 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.237@o2ib7: 0 seconds Mar 17 10:06:42 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 613 previous similar messages Mar 17 10:07:26 fir-io7-s1 kernel: LNetError: 125968:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 17 10:07:26 fir-io7-s1 kernel: LNetError: 125968:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 322 previous similar messages Mar 17 10:15:46 fir-io7-s1 kernel: LNetError: 126290:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 10:15:46 fir-io7-s1 kernel: LNetError: 126290:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 183 previous similar messages Mar 17 10:16:48 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.239@o2ib7: 10 seconds Mar 17 10:16:48 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 584 previous similar messages Mar 17 10:17:31 fir-io7-s1 kernel: LNetError: 126290:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 17 10:17:31 fir-io7-s1 kernel: LNetError: 126290:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 287 previous similar messages Mar 17 10:25:46 fir-io7-s1 kernel: LNetError: 126131:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 10:25:46 fir-io7-s1 kernel: LNetError: 126131:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 197 previous similar messages Mar 17 10:26:57 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 0 seconds Mar 17 10:26:57 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 517 previous similar messages Mar 17 10:27:41 fir-io7-s1 kernel: LNetError: 126423:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 17 10:27:41 fir-io7-s1 kernel: LNetError: 126423:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 308 previous similar messages Mar 17 10:35:46 fir-io7-s1 kernel: LNetError: 126423:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 10:35:46 fir-io7-s1 kernel: LNetError: 126423:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 181 previous similar messages Mar 17 10:37:12 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.236@o2ib7: 0 seconds Mar 17 10:37:12 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 478 previous similar messages Mar 17 10:37:51 fir-io7-s1 kernel: LNetError: 127112:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 17 10:37:51 fir-io7-s1 kernel: LNetError: 127112:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 315 previous similar messages Mar 17 10:45:46 fir-io7-s1 kernel: LNetError: 127382:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 10:45:46 fir-io7-s1 kernel: LNetError: 127382:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 198 previous similar messages Mar 17 10:47:17 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.216@o2ib7: 0 seconds Mar 17 10:47:17 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 526 previous similar messages Mar 17 10:47:56 fir-io7-s1 kernel: LNetError: 127382:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 17 10:47:56 fir-io7-s1 kernel: LNetError: 127382:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 329 previous similar messages Mar 17 10:55:46 fir-io7-s1 kernel: LNetError: 127587:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 10:55:46 fir-io7-s1 kernel: LNetError: 127587:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 234 previous similar messages Mar 17 10:57:27 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.233@o2ib7: 2 seconds Mar 17 10:57:27 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 469 previous similar messages Mar 17 10:57:56 fir-io7-s1 kernel: LNetError: 127925:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 17 10:57:56 fir-io7-s1 kernel: LNetError: 127925:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 320 previous similar messages Mar 17 11:05:46 fir-io7-s1 kernel: LNetError: 127381:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 11:05:46 fir-io7-s1 kernel: LNetError: 127381:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 223 previous similar messages Mar 17 11:07:29 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 0 seconds Mar 17 11:07:29 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 410 previous similar messages Mar 17 11:07:56 fir-io7-s1 kernel: LNetError: 127925:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 17 11:07:56 fir-io7-s1 kernel: LNetError: 127925:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 346 previous similar messages Mar 17 11:15:46 fir-io7-s1 kernel: LNetError: 127381:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 11:15:46 fir-io7-s1 kernel: LNetError: 127381:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 203 previous similar messages Mar 17 11:17:37 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.239@o2ib7: 6 seconds Mar 17 11:17:37 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 404 previous similar messages Mar 17 11:17:56 fir-io7-s1 kernel: LNetError: 128417:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 17 11:17:56 fir-io7-s1 kernel: LNetError: 128417:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 308 previous similar messages Mar 17 11:25:46 fir-io7-s1 kernel: LNetError: 128805:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 11:25:46 fir-io7-s1 kernel: LNetError: 128805:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 191 previous similar messages Mar 17 11:27:39 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.232@o2ib7: 0 seconds Mar 17 11:27:39 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 464 previous similar messages Mar 17 11:28:06 fir-io7-s1 kernel: LNetError: 129099:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 17 11:28:06 fir-io7-s1 kernel: LNetError: 129099:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 322 previous similar messages Mar 17 11:33:05 fir-io7-s1 kernel: Lustre: fir-OST004e: Client ab87f0a5-0357-4 (at 10.49.29.5@o2ib1) reconnecting Mar 17 11:33:05 fir-io7-s1 kernel: Lustre: Skipped 1 previous similar message Mar 17 11:33:05 fir-io7-s1 kernel: Lustre: fir-OST004e: Connection restored to ab87f0a5-0357-4 (at 10.49.29.5@o2ib1) Mar 17 11:33:08 fir-io7-s1 kernel: Lustre: fir-OST0050: Connection restored to ab87f0a5-0357-4 (at 10.49.29.5@o2ib1) Mar 17 11:33:08 fir-io7-s1 kernel: Lustre: Skipped 2 previous similar messages Mar 17 11:33:16 fir-io7-s1 kernel: Lustre: fir-OST0048: Client ab87f0a5-0357-4 (at 10.49.29.5@o2ib1) reconnecting Mar 17 11:33:16 fir-io7-s1 kernel: Lustre: Skipped 3 previous similar messages Mar 17 11:33:16 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ab87f0a5-0357-4 (at 10.49.29.5@o2ib1) Mar 17 11:33:30 fir-io7-s1 kernel: LustreError: 137-5: fir-OST0049_UUID: not available for connect from 10.49.29.5@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server. Mar 17 11:33:33 fir-io7-s1 kernel: Lustre: fir-OST004e: Client ab87f0a5-0357-4 (at 10.49.29.5@o2ib1) reconnecting Mar 17 11:33:33 fir-io7-s1 kernel: Lustre: fir-OST004e: Connection restored to ab87f0a5-0357-4 (at 10.49.29.5@o2ib1) Mar 17 11:33:36 fir-io7-s1 kernel: LustreError: 137-5: fir-OST0051_UUID: not available for connect from 10.49.29.5@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server. Mar 17 11:33:38 fir-io7-s1 kernel: LustreError: 137-5: fir-OST004f_UUID: not available for connect from 10.49.29.5@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server. Mar 17 11:33:41 fir-io7-s1 kernel: LustreError: 137-5: fir-OST004b_UUID: not available for connect from 10.49.29.5@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server. Mar 17 11:33:44 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ab87f0a5-0357-4 (at 10.49.29.5@o2ib1) Mar 17 11:33:44 fir-io7-s1 kernel: Lustre: Skipped 1 previous similar message Mar 17 11:34:09 fir-io7-s1 kernel: LustreError: 137-5: fir-OST004d_UUID: not available for connect from 10.49.29.5@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server. Mar 17 11:34:16 fir-io7-s1 kernel: Lustre: fir-OST004e: Client ab87f0a5-0357-4 (at 10.49.29.5@o2ib1) reconnecting Mar 17 11:34:16 fir-io7-s1 kernel: Lustre: Skipped 5 previous similar messages Mar 17 11:34:16 fir-io7-s1 kernel: Lustre: fir-OST004e: Connection restored to ab87f0a5-0357-4 (at 10.49.29.5@o2ib1) Mar 17 11:34:16 fir-io7-s1 kernel: Lustre: Skipped 3 previous similar messages Mar 17 11:34:32 fir-io7-s1 kernel: Lustre: fir-OST0048: Connection restored to ab87f0a5-0357-4 (at 10.49.29.5@o2ib1) Mar 17 11:34:32 fir-io7-s1 kernel: Lustre: Skipped 2 previous similar messages Mar 17 11:35:46 fir-io7-s1 kernel: LNetError: 128792:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 11:35:46 fir-io7-s1 kernel: LNetError: 128792:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 192 previous similar messages Mar 17 11:37:42 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 5 seconds Mar 17 11:37:42 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 445 previous similar messages Mar 17 11:38:11 fir-io7-s1 kernel: LNetError: 129364:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 17 11:38:11 fir-io7-s1 kernel: LNetError: 129364:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 315 previous similar messages Mar 17 11:45:48 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 11:45:48 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 203 previous similar messages Mar 17 11:47:44 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.235@o2ib7: 0 seconds Mar 17 11:47:44 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 421 previous similar messages Mar 17 11:48:21 fir-io7-s1 kernel: LNetError: 129816:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 17 11:48:21 fir-io7-s1 kernel: LNetError: 129816:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 294 previous similar messages Mar 17 11:55:51 fir-io7-s1 kernel: LNetError: 130186:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 11:55:51 fir-io7-s1 kernel: LNetError: 130186:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 206 previous similar messages Mar 17 11:57:58 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 1 seconds Mar 17 11:57:58 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 327 previous similar messages Mar 17 11:58:21 fir-io7-s1 kernel: LNetError: 130186:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 17 11:58:21 fir-io7-s1 kernel: LNetError: 130186:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 329 previous similar messages Mar 17 12:05:51 fir-io7-s1 kernel: LNetError: 130529:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 12:05:51 fir-io7-s1 kernel: LNetError: 130529:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 199 previous similar messages Mar 17 12:08:07 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.227@o2ib7: 0 seconds Mar 17 12:08:07 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 204 previous similar messages Mar 17 12:08:21 fir-io7-s1 kernel: LNetError: 130675:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 17 12:08:21 fir-io7-s1 kernel: LNetError: 130675:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 370 previous similar messages Mar 17 12:15:51 fir-io7-s1 kernel: LNetError: 130529:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 12:15:51 fir-io7-s1 kernel: LNetError: 130529:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 209 previous similar messages Mar 17 12:18:08 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.217@o2ib7: 0 seconds Mar 17 12:18:08 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 351 previous similar messages Mar 17 12:18:26 fir-io7-s1 kernel: LNetError: 336:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 17 12:18:26 fir-io7-s1 kernel: LNetError: 336:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 405 previous similar messages Mar 17 12:25:51 fir-io7-s1 kernel: LNetError: 666:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 12:25:51 fir-io7-s1 kernel: LNetError: 666:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 244 previous similar messages Mar 17 12:28:12 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.224@o2ib7: 0 seconds Mar 17 12:28:12 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 501 previous similar messages Mar 17 12:28:31 fir-io7-s1 kernel: LNetError: 744:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 17 12:28:31 fir-io7-s1 kernel: LNetError: 744:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 415 previous similar messages Mar 17 12:35:52 fir-io7-s1 kernel: LNetError: 744:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 12:35:52 fir-io7-s1 kernel: LNetError: 744:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 241 previous similar messages Mar 17 12:38:17 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.234@o2ib7: 0 seconds Mar 17 12:38:17 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 493 previous similar messages Mar 17 12:38:32 fir-io7-s1 kernel: LNetError: 1069:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 17 12:38:32 fir-io7-s1 kernel: LNetError: 1069:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 399 previous similar messages Mar 17 12:45:52 fir-io7-s1 kernel: LNetError: 1287:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 12:45:52 fir-io7-s1 kernel: LNetError: 1287:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 233 previous similar messages Mar 17 12:48:25 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.217@o2ib7: 0 seconds Mar 17 12:48:25 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 396 previous similar messages Mar 17 12:48:37 fir-io7-s1 kernel: LNetError: 118397:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.202@o2ib7 rejected: consumer defined fatal error Mar 17 12:48:37 fir-io7-s1 kernel: LNetError: 118397:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 386 previous similar messages Mar 17 12:55:52 fir-io7-s1 kernel: LNetError: 1287:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 12:55:52 fir-io7-s1 kernel: LNetError: 1287:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 176 previous similar messages Mar 17 12:58:26 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.232@o2ib7: 0 seconds Mar 17 12:58:26 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 198 previous similar messages Mar 17 12:58:47 fir-io7-s1 kernel: LNetError: 1849:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 17 12:58:47 fir-io7-s1 kernel: LNetError: 1849:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 341 previous similar messages Mar 17 13:05:52 fir-io7-s1 kernel: LNetError: 1849:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 13:05:52 fir-io7-s1 kernel: LNetError: 1849:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 194 previous similar messages Mar 17 13:08:33 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.218@o2ib7: 0 seconds Mar 17 13:08:33 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 229 previous similar messages Mar 17 13:08:52 fir-io7-s1 kernel: LNetError: 2255:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: consumer defined fatal error Mar 17 13:08:52 fir-io7-s1 kernel: LNetError: 2255:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 364 previous similar messages Mar 17 13:15:52 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 13:15:52 fir-io7-s1 kernel: LNetError: 64781:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 170 previous similar messages Mar 17 13:18:35 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.235@o2ib7: 0 seconds Mar 17 13:18:35 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 601 previous similar messages Mar 17 13:18:52 fir-io7-s1 kernel: LNetError: 2630:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.211@o2ib7 rejected: consumer defined fatal error Mar 17 13:18:52 fir-io7-s1 kernel: LNetError: 2630:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 345 previous similar messages Mar 17 13:25:52 fir-io7-s1 kernel: LNetError: 2630:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 13:25:52 fir-io7-s1 kernel: LNetError: 2630:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 177 previous similar messages Mar 17 13:28:37 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.238@o2ib7: 0 seconds Mar 17 13:28:37 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 542 previous similar messages Mar 17 13:28:52 fir-io7-s1 kernel: LNetError: 3032:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 17 13:28:52 fir-io7-s1 kernel: LNetError: 3032:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 376 previous similar messages Mar 17 13:35:52 fir-io7-s1 kernel: LNetError: 3402:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 13:35:52 fir-io7-s1 kernel: LNetError: 3402:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 210 previous similar messages Mar 17 13:38:43 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.232@o2ib7: 1 seconds Mar 17 13:38:43 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 477 previous similar messages Mar 17 13:38:57 fir-io7-s1 kernel: LNetError: 3032:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.210@o2ib7 rejected: consumer defined fatal error Mar 17 13:38:57 fir-io7-s1 kernel: LNetError: 3032:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 388 previous similar messages Mar 17 13:45:52 fir-io7-s1 kernel: LNetError: 1597:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 13:45:52 fir-io7-s1 kernel: LNetError: 1597:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 205 previous similar messages Mar 17 13:48:48 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.236@o2ib7: 1 seconds Mar 17 13:48:48 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 436 previous similar messages Mar 17 13:48:57 fir-io7-s1 kernel: LNetError: 115948:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 17 13:48:57 fir-io7-s1 kernel: LNetError: 115948:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 368 previous similar messages Mar 17 13:55:52 fir-io7-s1 kernel: LNetError: 3558:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 13:55:52 fir-io7-s1 kernel: LNetError: 3558:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 218 previous similar messages Mar 17 13:58:57 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.226@o2ib7: 0 seconds Mar 17 13:58:57 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 345 previous similar messages Mar 17 13:59:02 fir-io7-s1 kernel: LNetError: 115948:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.212@o2ib7 rejected: consumer defined fatal error Mar 17 13:59:02 fir-io7-s1 kernel: LNetError: 115948:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 393 previous similar messages Mar 17 14:03:09 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.0.10.212@o2ib7: -125 Mar 17 14:03:09 fir-io7-s1 kernel: LNetError: 64798:0:(lib-move.c:2963:lnet_resend_pending_msgs_locked()) Skipped 4 previous similar messages Mar 17 14:05:52 fir-io7-s1 kernel: LNetError: 4245:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 14:05:52 fir-io7-s1 kernel: LNetError: 4245:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 237 previous similar messages Mar 17 14:09:02 fir-io7-s1 kernel: LNetError: 4245:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) 10.0.10.209@o2ib7 rejected: consumer defined fatal error Mar 17 14:09:02 fir-io7-s1 kernel: LNetError: 4245:0:(o2iblnd_cb.c:2923:kiblnd_rejected()) Skipped 389 previous similar messages Mar 17 14:09:07 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.2.236@o2ib7: 0 seconds Mar 17 14:09:07 fir-io7-s1 kernel: LNet: 64781:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 393 previous similar messages Mar 17 14:15:52 fir-io7-s1 kernel: LNetError: 4883:0:(lib-msg.c:485:lnet_handle_local_failure()) ni 10.0.10.113@o2ib7 added to recovery queue. Health = 900 Mar 17 14:15:52 fir-io7-s1 kernel: LNetError: 4883:0:(lib-msg.c:485:lnet_handle_local_failure()) Skipped 219 previous similar messages