.
HOSTS -------------------------------------------------------------------------
dac-e-10
-------------------------------------------------------------------------------
-- Logs begin at Tue 2019-05-07 14:07:27 BST, end at Wed 2019-08-21 11:34:55 BST. --
Aug 20 22:50:13 dac-e-10 kernel: Lustre: fs1-OST006c: Connection restored to 60f2ae35-b178-1c80-ba50-e7c416f9922a (at 10.47.20.68@o2ib1)
Aug 20 22:50:13 dac-e-10 kernel: Lustre: Skipped 9 previous similar messages
Aug 20 22:55:51 dac-e-10 kernel: Lustre: fs1-OST006f: Connection restored to 60f2ae35-b178-1c80-ba50-e7c416f9922a (at 10.47.20.68@o2ib1)
Aug 20 22:55:51 dac-e-10 kernel: Lustre: Skipped 85 previous similar messages
Aug 20 22:55:53 dac-e-10 kernel: Lustre: fs1-OST0076: Client 3f28bd14-132c-e629-0b42-8ea0fdd5d2a4 (at 10.47.21.31@o2ib1) reconnecting
Aug 20 22:55:53 dac-e-10 kernel: Lustre: 92145:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1566338151/real 1566338153]  req@ffff905e8c6eba80 x1642422107576528/t0(0) o105->fs1-OST0071@10.47.21.35@o2ib1:15/16 lens 360/224 e 0 to 1 dl 1566338162 ref 1 fl Rpc:eX/0/ffffffff rc 0/-1
Aug 20 22:55:53 dac-e-10 kernel: LustreError: 5418:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9044cee8a400
Aug 20 22:55:53 dac-e-10 kernel: LustreError: 5419:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9044cee8a400
Aug 20 22:55:53 dac-e-10 kernel: LustreError: 5417:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9044cee8a400
Aug 20 22:55:53 dac-e-10 kernel: LustreError: 5428:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9044cee8a400
Aug 20 22:55:53 dac-e-10 kernel: LustreError: 5420:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9044cee8a400
Aug 20 22:55:53 dac-e-10 kernel: LustreError: 5418:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9044cee8a400
Aug 20 22:55:53 dac-e-10 kernel: LustreError: 5419:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9044cee8a400
Aug 20 22:55:53 dac-e-10 kernel: LustreError: 5417:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9044cee8a400
Aug 20 22:55:53 dac-e-10 kernel: LustreError: 5420:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9044cee8a400
Aug 20 22:55:53 dac-e-10 kernel: LustreError: 5418:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9044cee8a400
Aug 20 22:55:53 dac-e-10 kernel: LustreError: 5417:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9044cee8a400
Aug 20 22:55:53 dac-e-10 kernel: LustreError: 5418:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9044cee8a400
Aug 20 22:55:53 dac-e-10 kernel: LustreError: 5428:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9044cee8a400
Aug 20 22:55:53 dac-e-10 kernel: LustreError: 5419:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9044cee8a400
Aug 20 22:55:53 dac-e-10 kernel: LustreError: 5420:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9044cee8a400
Aug 20 22:55:53 dac-e-10 kernel: LustreError: 5417:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9044cee8a400
Aug 20 22:55:55 dac-e-10 kernel: Lustre: fs1-OST0074: Client efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1) reconnecting
Aug 20 22:55:56 dac-e-10 kernel: Lustre: 108324:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1566338154/real 1566338156]  req@ffff905e90011680 x1642422107580032/t0(0) o104->fs1-OST006d@10.47.21.34@o2ib1:15/16 lens 296/224 e 0 to 1 dl 1566338161 ref 1 fl Rpc:eX/0/ffffffff rc 0/-1
Aug 20 22:55:56 dac-e-10 kernel: Lustre: 108324:0:(client.c:2134:ptlrpc_expire_one_request()) Skipped 1 previous similar message
Aug 20 22:55:57 dac-e-10 kernel: Lustre: fs1-OST006f: Client efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1) reconnecting
Aug 20 22:55:57 dac-e-10 kernel: Lustre: Skipped 6 previous similar messages
Aug 20 22:56:07 dac-e-10 kernel: Lustre: 88111:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566338156/real 1566338156]  req@ffff904671edb180 x1642422107587136/t0(0) o105->fs1-OST0072@10.47.21.34@o2ib1:15/16 lens 360/224 e 0 to 1 dl 1566338167 ref 1 fl Rpc:X/0/ffffffff rc 0/-1
Aug 20 22:56:16 dac-e-10 kernel: Lustre: fs1-OST0074: Client efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1) reconnecting
Aug 20 22:56:16 dac-e-10 kernel: Lustre: Skipped 2 previous similar messages
Aug 20 22:56:49 dac-e-10 kernel: LustreError: 183185:0:(ldlm_lib.c:3259:target_bulk_io()) @@@ network error on bulk READ  req@ffff90450dc86050 x1642422175275264/t0(0) o3->d51aba7f-3e9b-b409-76a8-ec3ceb1b1ca5@10.47.21.35@o2ib1:119/0 lens 488/440 e 1 to 0 dl 1566338219 ref 1 fl Interpret:/0/0 rc 0/0
Aug 20 22:56:49 dac-e-10 kernel: LustreError: 130073:0:(ldlm_lib.c:3268:target_bulk_io()) @@@ truncated bulk READ 15728640(16777216)  req@ffff90450dc9e050 x1642422175279264/t0(0) o3->d51aba7f-3e9b-b409-76a8-ec3ceb1b1ca5@10.47.21.35@o2ib1:123/0 lens 488/440 e 1 to 0 dl 1566338223 ref 1 fl Interpret:/2/0 rc 0/0
Aug 20 22:56:49 dac-e-10 kernel: Lustre: fs1-OST0071: Bulk IO read error with d51aba7f-3e9b-b409-76a8-ec3ceb1b1ca5 (at 10.47.21.35@o2ib1), client will retry: rc -110
Aug 20 23:01:37 dac-e-10 kernel: Lustre: fs1-OST0072: Client efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1) reconnecting
Aug 20 23:01:37 dac-e-10 kernel: Lustre: Skipped 1 previous similar message
Aug 20 23:01:37 dac-e-10 kernel: Lustre: fs1-OST0072: Connection restored to efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1)
Aug 20 23:01:37 dac-e-10 kernel: Lustre: Skipped 32 previous similar messages
Aug 20 23:03:01 dac-e-10 kernel: Lustre: fs1-OST0075: Client 3f28bd14-132c-e629-0b42-8ea0fdd5d2a4 (at 10.47.21.31@o2ib1) reconnecting
Aug 20 23:03:01 dac-e-10 kernel: Lustre: Skipped 2 previous similar messages
Aug 20 23:07:13 dac-e-10 kernel: Lustre: fs1-OST006d: Connection restored to efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1)
Aug 20 23:07:13 dac-e-10 kernel: Lustre: Skipped 5 previous similar messages
Aug 20 23:19:10 dac-e-10 kernel: Lustre: Failing over fs1-MDT0009
Aug 20 23:19:10 dac-e-10 kernel: LustreError: 11-0: fs1-MDT000e-osp-MDT0009: operation mds_disconnect to node 10.47.18.15@o2ib1 failed: rc = -107
Aug 20 23:19:10 dac-e-10 kernel: LustreError: 183112:0:(osp_dev.c:485:osp_disconnect()) fs1-MDT0011-osp-MDT0009: can't disconnect: rc = -19
Aug 20 23:19:10 dac-e-10 kernel: LustreError: 183112:0:(lod_dev.c:267:lod_sub_process_config()) fs1-MDT0009-mdtlov: error cleaning up LOD index 17: cmd 0xcf031 : rc = -19
Aug 20 23:19:10 dac-e-10 kernel: Lustre: server umount fs1-MDT0009 complete
Aug 20 23:19:11 dac-e-10 kernel: LustreError: 137-5: fs1-MDT0009_UUID: not available for connect from 10.47.18.5@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server.
Aug 20 23:19:13 dac-e-10 kernel: LustreError: 137-5: fs1-MDT0009_UUID: not available for connect from 10.47.18.8@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server.
Aug 20 23:19:13 dac-e-10 kernel: LustreError: Skipped 1 previous similar message
Aug 20 23:19:23 dac-e-10 kernel: Lustre: 185719:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339557/real 1566339557]  req@ffff9044b9fc3180 x1642422109237488/t0(0) o39->fs1-MDT0000-lwp-OST006c@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339563 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:25 dac-e-10 kernel: Lustre: 186233:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339559/real 1566339559]  req@ffff905d0ae78d80 x1642422109242112/t0(0) o39->fs1-MDT0000-lwp-OST0075@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339565 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:25 dac-e-10 kernel: Lustre: fs1-MDT000a-lwp-OST0077: Connection to fs1-MDT000a (at 10.47.18.11@o2ib1) was lost; in progress operations using this service will wait for recovery to complete
Aug 20 23:19:25 dac-e-10 kernel: Lustre: Failing over fs1-OST0074
Aug 20 23:19:25 dac-e-10 kernel: Lustre: server umount fs1-OST0074 complete
Aug 20 23:19:26 dac-e-10 kernel: Lustre: 186717:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339560/real 1566339560]  req@ffff9046994fb600 x1642422109242128/t0(0) o39->fs1-MDT0000-lwp-OST0077@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339566 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:26 dac-e-10 kernel: Lustre: 186717:0:(client.c:2134:ptlrpc_expire_one_request()) Skipped 276 previous similar messages
Aug 20 23:19:27 dac-e-10 kernel: Lustre: Failing over fs1-OST0076
Aug 20 23:19:27 dac-e-10 kernel: Lustre: Skipped 1 previous similar message
Aug 20 23:19:27 dac-e-10 kernel: Lustre: server umount fs1-OST0076 complete
Aug 20 23:19:27 dac-e-10 kernel: Lustre: Skipped 1 previous similar message
Aug 20 23:19:28 dac-e-10 kernel: Lustre: 188141:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339562/real 1566339562]  req@ffff905d0ae7ec00 x1642422109242160/t0(0) o39->fs1-MDT0000-lwp-OST0070@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339568 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:28 dac-e-10 kernel: Lustre: 188141:0:(client.c:2134:ptlrpc_expire_one_request()) Skipped 1 previous similar message
Aug 20 23:19:29 dac-e-10 kernel: Lustre: Failing over fs1-OST006c
Aug 20 23:19:29 dac-e-10 kernel: Lustre: Skipped 3 previous similar messages
Aug 20 23:19:30 dac-e-10 kernel: Lustre: server umount fs1-OST006c complete
Aug 20 23:19:30 dac-e-10 kernel: Lustre: Skipped 3 previous similar messages

HOSTS -------------------------------------------------------------------------
dac-e-14
-------------------------------------------------------------------------------
-- Logs begin at Wed 2019-05-08 14:07:46 BST, end at Wed 2019-08-21 11:34:55 BST. --
Aug 20 22:50:13 dac-e-14 kernel: Lustre: fs1-OST009c: Connection restored to 60f2ae35-b178-1c80-ba50-e7c416f9922a (at 10.47.20.68@o2ib1)
Aug 20 22:50:13 dac-e-14 kernel: Lustre: Skipped 10 previous similar messages
Aug 20 22:55:51 dac-e-14 kernel: Lustre: fs1-OST00a7: Connection restored to a9c07af9-d96f-6905-bdea-228af9a88046 (at 10.47.21.32@o2ib1)
Aug 20 22:55:51 dac-e-14 kernel: Lustre: Skipped 74 previous similar messages
Aug 20 22:55:53 dac-e-14 kernel: Lustre: fs1-OST00a5: Client 3f28bd14-132c-e629-0b42-8ea0fdd5d2a4 (at 10.47.21.31@o2ib1) reconnecting
Aug 20 22:55:53 dac-e-14 kernel: LustreError: 409822:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff977541253600
Aug 20 22:55:53 dac-e-14 kernel: LustreError: 409820:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff977541253600
Aug 20 22:55:54 dac-e-14 kernel: Lustre: fs1-OST00a5: Client d51aba7f-3e9b-b409-76a8-ec3ceb1b1ca5 (at 10.47.21.35@o2ib1) reconnecting
Aug 20 22:55:54 dac-e-14 kernel: LustreError: 409821:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff97766d581400
Aug 20 22:55:54 dac-e-14 kernel: LustreError: 409823:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff97766d581400
Aug 20 22:55:54 dac-e-14 kernel: LustreError: 409820:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff97766d581400
Aug 20 22:55:54 dac-e-14 kernel: LustreError: 409878:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff97766d581400
Aug 20 22:55:54 dac-e-14 kernel: LustreError: 409820:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff97766d581400
Aug 20 22:55:54 dac-e-14 kernel: LustreError: 409878:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff97766d581400
Aug 20 22:55:54 dac-e-14 kernel: LustreError: 409821:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff97766d581400
Aug 20 22:55:55 dac-e-14 kernel: LustreError: 409820:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff977541253600
Aug 20 22:55:55 dac-e-14 kernel: LustreError: 409878:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff977541253600
Aug 20 22:55:55 dac-e-14 kernel: LustreError: 409823:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff977541253600
Aug 20 22:55:55 dac-e-14 kernel: LustreError: 409822:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff977541253600
Aug 20 22:55:56 dac-e-14 kernel: LustreError: 409820:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff97766d581400
Aug 20 22:55:56 dac-e-14 kernel: LustreError: 409822:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff97766d581400
Aug 20 22:55:56 dac-e-14 kernel: LustreError: 409823:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff97766d581400
Aug 20 22:55:56 dac-e-14 kernel: LustreError: 409821:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff97766d581400
Aug 20 22:55:56 dac-e-14 kernel: LustreError: 409878:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff97766d581400
Aug 20 22:55:56 dac-e-14 kernel: LustreError: 409823:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff97766d581400
Aug 20 22:55:56 dac-e-14 kernel: LustreError: 409820:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff97766d581400
Aug 20 22:55:56 dac-e-14 kernel: LustreError: 409878:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff977541253600
Aug 20 22:55:58 dac-e-14 kernel: LNet: 409815:0:(o2iblnd_cb.c:3381:kiblnd_check_conns()) Timed out tx for 10.47.21.34@o2ib1: 1 seconds
Aug 20 22:55:58 dac-e-14 kernel: Lustre: 23437:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1566338156/real 1566338158]  req@ffff97751376f500 x1642422107586928/t0(0) o104->fs1-OST009c@10.47.21.34@o2ib1:15/16 lens 296/224 e 0 to 1 dl 1566338167 ref 1 fl Rpc:eX/0/ffffffff rc 0/-1
Aug 20 22:55:58 dac-e-14 kernel: Lustre: fs1-OST009c: Client efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1) reconnecting
Aug 20 22:55:59 dac-e-14 kernel: LNet: 409815:0:(o2iblnd_cb.c:3381:kiblnd_check_conns()) Timed out tx for 10.47.21.34@o2ib1: 0 seconds
Aug 20 22:56:00 dac-e-14 kernel: Lustre: fs1-OST009c: Client efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1) reconnecting
Aug 20 22:56:00 dac-e-14 kernel: Lustre: Skipped 2 previous similar messages
Aug 20 22:56:00 dac-e-14 kernel: LNet: 409815:0:(o2iblnd_cb.c:1495:kiblnd_reconnect_peer()) Abort reconnection of 10.47.21.34@o2ib1: connected
Aug 20 22:56:00 dac-e-14 kernel: LNet: 409815:0:(o2iblnd_cb.c:1495:kiblnd_reconnect_peer()) Skipped 1 previous similar message
Aug 20 22:56:16 dac-e-14 kernel: Lustre: fs1-OST00a4: Client efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1) reconnecting
Aug 20 22:56:16 dac-e-14 kernel: Lustre: Skipped 3 previous similar messages
Aug 20 22:56:16 dac-e-14 kernel: LustreError: 400545:0:(ldlm_lib.c:3253:target_bulk_io()) @@@ Reconnect on bulk READ  req@ffff977546bca850 x1642422175274800/t0(0) o3->efd44c75-059d-5b3e-f4e6-657896b473fc@10.47.21.34@o2ib1:82/0 lens 488/440 e 1 to 0 dl 1566338182 ref 1 fl Interpret:/0/0 rc 0/0
Aug 20 22:56:16 dac-e-14 kernel: Lustre: fs1-OST00a7: Bulk IO read error with efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1), client will retry: rc -110
Aug 20 22:56:16 dac-e-14 kernel: LustreError: 400545:0:(ldlm_lib.c:3253:target_bulk_io()) Skipped 1 previous similar message
Aug 20 23:00:46 dac-e-14 kernel: Lustre: fs1-OST009c: Connection restored to b70bd8d1-bef1-cafa-1d50-bfa93684ff22 (at 10.47.21.37@o2ib1)
Aug 20 23:00:46 dac-e-14 kernel: Lustre: Skipped 44 previous similar messages
Aug 20 23:01:14 dac-e-14 kernel: Lustre: fs1-OST009d: Client d51aba7f-3e9b-b409-76a8-ec3ceb1b1ca5 (at 10.47.21.35@o2ib1) reconnecting
Aug 20 23:01:14 dac-e-14 kernel: Lustre: Skipped 3 previous similar messages
Aug 20 23:01:41 dac-e-14 kernel: Lustre: fs1-MDT000d: Client d51aba7f-3e9b-b409-76a8-ec3ceb1b1ca5 (at 10.47.21.35@o2ib1) reconnecting
Aug 20 23:04:03 dac-e-14 kernel: Lustre: fs1-OST009f: Client d51aba7f-3e9b-b409-76a8-ec3ceb1b1ca5 (at 10.47.21.35@o2ib1) reconnecting
Aug 20 23:09:09 dac-e-14 kernel: Lustre: fs1-OST00a5: Connection restored to 60f2ae35-b178-1c80-ba50-e7c416f9922a (at 10.47.20.68@o2ib1)
Aug 20 23:09:09 dac-e-14 kernel: Lustre: Skipped 5 previous similar messages
Aug 20 23:19:10 dac-e-14 kernel: Lustre: Failing over fs1-MDT000d
Aug 20 23:19:10 dac-e-14 kernel: LustreError: 11-0: fs1-MDT0001-osp-MDT000d: operation mds_disconnect to node 10.47.18.2@o2ib1 failed: rc = -107
Aug 20 23:19:10 dac-e-14 kernel: Lustre: server umount fs1-MDT000d complete
Aug 20 23:19:11 dac-e-14 kernel: LustreError: 137-5: fs1-MDT000d_UUID: not available for connect from 10.47.18.5@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server.
Aug 20 23:19:12 dac-e-14 kernel: LustreError: 137-5: fs1-MDT000d_UUID: not available for connect from 10.47.18.9@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server.
Aug 20 23:19:24 dac-e-14 kernel: Lustre: 270758:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339558/real 1566339558]  req@ffff97753464b600 x1642422109241648/t0(0) o39->fs1-MDT0000-lwp-OST009c@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339564 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:25 dac-e-14 kernel: Lustre: 271549:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339559/real 1566339559]  req@ffff97751e7fb600 x1642422109241664/t0(0) o39->fs1-MDT0000-lwp-OST00a5@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339565 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:26 dac-e-14 kernel: Lustre: 272349:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339560/real 1566339560]  req@ffff977534648000 x1642422109241680/t0(0) o39->fs1-MDT0000-lwp-OST00a7@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339566 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:28 dac-e-14 kernel: Lustre: 273212:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339562/real 1566339562]  req@ffff977544705100 x1642422109241712/t0(0) o39->fs1-MDT0000-lwp-OST00a0@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339568 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:28 dac-e-14 kernel: Lustre: 273212:0:(client.c:2134:ptlrpc_expire_one_request()) Skipped 1 previous similar message
Aug 20 23:19:33 dac-e-14 kernel: Lustre: 273892:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339567/real 1566339567]  req@ffff9776d0d6a880 x1642422109241808/t0(0) o39->fs1-MDT0000-lwp-OST00a6@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339573 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:33 dac-e-14 kernel: Lustre: 273892:0:(client.c:2134:ptlrpc_expire_one_request()) Skipped 5 previous similar messages
Aug 20 23:19:34 dac-e-14 kernel: Lustre: fs1-MDT0015-lwp-OST009c: Connection to fs1-MDT0015 (at 10.47.18.22@o2ib1) was lost; in progress operations using this service will wait for recovery to complete
Aug 20 23:19:34 dac-e-14 kernel: Lustre: Failing over fs1-OST009d
Aug 20 23:19:34 dac-e-14 kernel: Lustre: server umount fs1-OST009d complete
Aug 20 23:19:36 dac-e-14 kernel: Lustre: Failing over fs1-OST009c
Aug 20 23:19:36 dac-e-14 kernel: Lustre: Skipped 1 previous similar message
Aug 20 23:19:36 dac-e-14 kernel: Lustre: server umount fs1-OST009c complete
Aug 20 23:19:36 dac-e-14 kernel: Lustre: Skipped 1 previous similar message
Aug 20 23:19:38 dac-e-14 kernel: Lustre: Failing over fs1-OST00a7
Aug 20 23:19:38 dac-e-14 kernel: Lustre: Skipped 5 previous similar messages
Aug 20 23:19:38 dac-e-14 kernel: Lustre: server umount fs1-OST00a7 complete
Aug 20 23:19:38 dac-e-14 kernel: Lustre: Skipped 5 previous similar messages

HOSTS -------------------------------------------------------------------------
dac-e-1
-------------------------------------------------------------------------------
-- Logs begin at Tue 2019-04-16 21:34:08 BST, end at Wed 2019-08-21 11:34:55 BST. --
Aug 20 22:50:13 dac-e-1 kernel: Lustre: fs1-OST0000: Connection restored to 1dd0df5c-469f-6643-3e7e-a97b4a721aca (at 10.47.20.68@o2ib1)
Aug 20 22:50:13 dac-e-1 kernel: Lustre: Skipped 10 previous similar messages
Aug 20 22:55:51 dac-e-1 kernel: Lustre: fs1-OST000a: Connection restored to 04d1253b-42fc-c7bf-5c37-ae05806b8c39 (at 10.47.20.69@o2ib1)
Aug 20 22:55:51 dac-e-1 kernel: Lustre: Skipped 82 previous similar messages
Aug 20 22:55:55 dac-e-1 kernel: Lustre: fs1-OST0008: Client efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1) reconnecting
Aug 20 22:56:23 dac-e-1 kernel: Lustre: fs1-OST0003: Client d51aba7f-3e9b-b409-76a8-ec3ceb1b1ca5 (at 10.47.21.35@o2ib1) reconnecting
Aug 20 23:03:07 dac-e-1 kernel: Lustre: fs1-OST000b: Connection restored to bfa3419e-5f60-2112-6155-b28d2a4d1c73 (at 10.47.21.38@o2ib1)
Aug 20 23:03:07 dac-e-1 kernel: Lustre: Skipped 28 previous similar messages
Aug 20 23:03:24 dac-e-1 kernel: Lustre: fs1-OST0004: Client 3f28bd14-132c-e629-0b42-8ea0fdd5d2a4 (at 10.47.21.31@o2ib1) reconnecting
Aug 20 23:07:13 dac-e-1 kernel: Lustre: fs1-OST0001: Connection restored to 954dafb5-f1aa-2bcb-9b19-73924f785d18 (at 10.47.21.34@o2ib1)
Aug 20 23:07:13 dac-e-1 kernel: Lustre: Skipped 2 previous similar messages
Aug 20 23:19:10 dac-e-1 kernel: Lustre: Failing over fs1-MDT0000
Aug 20 23:19:10 dac-e-1 kernel: LustreError: 11-0: fs1-MDT0001-osp-MDT0000: operation mds_disconnect to node 10.47.18.2@o2ib1 failed: rc = -107
Aug 20 23:19:10 dac-e-1 kernel: LustreError: 312347:0:(osp_dev.c:485:osp_disconnect()) fs1-MDT0015-osp-MDT0000: can't disconnect: rc = -19
Aug 20 23:19:10 dac-e-1 kernel: LustreError: 312347:0:(lod_dev.c:267:lod_sub_process_config()) fs1-MDT0000-mdtlov: error cleaning up LOD index 21: cmd 0xcf031 : rc = -19
Aug 20 23:19:10 dac-e-1 kernel: Lustre: server umount fs1-MDT0000 complete
Aug 20 23:19:11 dac-e-1 kernel: LustreError: 137-5: fs1-MDT0000_UUID: not available for connect from 10.47.18.9@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server.
Aug 20 23:19:23 dac-e-1 kernel: Lustre: 314693:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339557/real 1566339557]  req@ffff8abfebb51b00 x1642422108305392/t0(0) o39->fs1-MDT0000-lwp-OST0000@0@lo:12/10 lens 224/224 e 0 to 1 dl 1566339563 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:25 dac-e-1 kernel: Lustre: 314718:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339559/real 1566339559]  req@ffff8ad73c3ff080 x1642422108305408/t0(0) o39->fs1-MDT0000-lwp-OST0009@0@lo:12/10 lens 224/224 e 0 to 1 dl 1566339565 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:26 dac-e-1 kernel: Lustre: 314743:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339560/real 1566339560]  req@ffff8ad73c3fda00 x1642422108305424/t0(0) o39->fs1-MDT0000-lwp-OST000b@0@lo:12/10 lens 224/224 e 0 to 1 dl 1566339566 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:28 dac-e-1 kernel: Lustre: 314796:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339562/real 1566339562]  req@ffff8abfebb50000 x1642422108305456/t0(0) o39->fs1-MDT0000-lwp-OST0004@0@lo:12/10 lens 224/224 e 0 to 1 dl 1566339568 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:28 dac-e-1 kernel: Lustre: 314796:0:(client.c:2134:ptlrpc_expire_one_request()) Skipped 1 previous similar message
Aug 20 23:19:30 dac-e-1 kernel: Lustre: fs1-MDT0008-lwp-OST0005: Connection to fs1-MDT0008 (at 10.47.18.9@o2ib1) was lost; in progress operations using this service will wait for recovery to complete
Aug 20 23:19:31 dac-e-1 kernel: Lustre: Failing over fs1-OST0009
Aug 20 23:19:31 dac-e-1 kernel: Lustre: server umount fs1-OST0009 complete
Aug 20 23:19:32 dac-e-1 kernel: Lustre: Failing over fs1-OST000b
Aug 20 23:19:32 dac-e-1 kernel: Lustre: Skipped 2 previous similar messages
Aug 20 23:19:32 dac-e-1 kernel: Lustre: server umount fs1-OST000b complete
Aug 20 23:19:32 dac-e-1 kernel: Lustre: Skipped 2 previous similar messages
Aug 20 23:19:33 dac-e-1 kernel: Lustre: 316195:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339567/real 1566339567]  req@ffff8ad73c3f9200 x1642422108310096/t0(0) o39->fs1-MDT0000-lwp-OST000a@0@lo:12/10 lens 224/224 e 0 to 1 dl 1566339573 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:33 dac-e-1 kernel: Lustre: 316195:0:(client.c:2134:ptlrpc_expire_one_request()) Skipped 270 previous similar messages
Aug 20 23:19:34 dac-e-1 kernel: Lustre: Failing over fs1-OST0001
Aug 20 23:19:34 dac-e-1 kernel: Lustre: Skipped 3 previous similar messages
Aug 20 23:19:34 dac-e-1 kernel: Lustre: server umount fs1-OST0001 complete
Aug 20 23:19:34 dac-e-1 kernel: Lustre: Skipped 3 previous similar messages
Aug 20 23:19:53 dac-e-1 kernel: Lustre: 322695:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339587/real 1566339587]  req@ffff8ad823410480 x1642422108314624/t0(0) o251->MGC10.47.18.1@o2ib1@0@lo:26/25 lens 224/224 e 0 to 1 dl 1566339593 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:53 dac-e-1 kernel: Lustre: 322695:0:(client.c:2134:ptlrpc_expire_one_request()) Skipped 6 previous similar messages
Aug 20 23:19:53 dac-e-1 kernel: Lustre: server umount MGS complete
Aug 20 23:19:53 dac-e-1 kernel: Lustre: Skipped 4 previous similar messages

HOSTS -------------------------------------------------------------------------
dac-e-11
-------------------------------------------------------------------------------
-- Logs begin at Thu 2019-03-21 15:42:22 GMT, end at Wed 2019-08-21 11:34:55 BST. --
Aug 20 22:55:51 dac-e-11 kernel: Lustre: fs1-OST007d: Connection restored to b70bd8d1-bef1-cafa-1d50-bfa93684ff22 (at 10.47.21.37@o2ib1)
Aug 20 22:55:51 dac-e-11 kernel: Lustre: fs1-OST0083: Connection restored to a9c07af9-d96f-6905-bdea-228af9a88046 (at 10.47.21.32@o2ib1)
Aug 20 22:55:51 dac-e-11 kernel: Lustre: Skipped 93 previous similar messages
Aug 20 22:55:51 dac-e-11 kernel: Lustre: Skipped 1 previous similar message
Aug 20 22:55:55 dac-e-11 kernel: Lustre: fs1-OST0078: Client efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1) reconnecting
Aug 20 22:55:56 dac-e-11 kernel: Lustre: 140799:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1566338154/real 1566338156]  req@ffff9847013df980 x1642422107580496/t0(0) o13->fs1-OST00d6-osc-MDT000a@10.47.18.18@o2ib1:7/4 lens 224/368 e 0 to 1 dl 1566338161 ref 1 fl Rpc:eX/0/ffffffff rc 0/-1
Aug 20 22:55:56 dac-e-11 kernel: Lustre: fs1-OST00d6-osc-MDT000a: Connection to fs1-OST00d6 (at 10.47.18.18@o2ib1) was lost; in progress operations using this service will wait for recovery to complete
Aug 20 22:55:57 dac-e-11 kernel: Lustre: fs1-OST007f: Client efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1) reconnecting
Aug 20 22:55:59 dac-e-11 kernel: Lustre: fs1-OST0083: Client efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1) reconnecting
Aug 20 22:55:59 dac-e-11 kernel: Lustre: 140777:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566338152/real 1566338152]  req@ffff984759bd7500 x1642422107578848/t0(0) o13->fs1-OST00d5-osc-MDT000a@10.47.18.18@o2ib1:7/4 lens 224/368 e 0 to 1 dl 1566338159 ref 1 fl Rpc:X/0/ffffffff rc 0/-1
Aug 20 22:55:59 dac-e-11 kernel: Lustre: fs1-OST00d5-osc-MDT000a: Connection to fs1-OST00d5 (at 10.47.18.18@o2ib1) was lost; in progress operations using this service will wait for recovery to complete
Aug 20 22:56:02 dac-e-11 kernel: Lustre: fs1-MDT000a: Client fs1-MDT000a-lwp-OST00d3_UUID (at 10.47.18.18@o2ib1) reconnecting
Aug 20 22:56:02 dac-e-11 kernel: Lustre: Skipped 1 previous similar message
Aug 20 22:56:16 dac-e-11 kernel: Lustre: fs1-OST0078: Client efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1) reconnecting
Aug 20 22:56:41 dac-e-11 kernel: Lustre: fs1-OST007e: Client efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1) reconnecting
Aug 20 22:56:41 dac-e-11 kernel: Lustre: Skipped 1 previous similar message
Aug 20 22:57:14 dac-e-11 kernel: LustreError: 235256:0:(ldlm_lib.c:3268:target_bulk_io()) @@@ truncated bulk READ 14680064(16777216)  req@ffff984827753050 x1642422175286176/t0(0) o3->efd44c75-059d-5b3e-f4e6-657896b473fc@10.47.21.34@o2ib1:152/0 lens 488/440 e 2 to 0 dl 1566338252 ref 1 fl Interpret:/0/0 rc 0/0
Aug 20 22:57:14 dac-e-11 kernel: Lustre: fs1-OST007a: Bulk IO read error with efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1), client will retry: rc -110
Aug 20 23:03:33 dac-e-11 kernel: Lustre: fs1-OST007a: Client efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1) reconnecting
Aug 20 23:07:13 dac-e-11 kernel: Lustre: fs1-OST0079: Connection restored to efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1)
Aug 20 23:07:13 dac-e-11 kernel: Lustre: Skipped 37 previous similar messages
Aug 20 23:19:10 dac-e-11 kernel: Lustre: Failing over fs1-MDT000a
Aug 20 23:19:10 dac-e-11 kernel: LustreError: 11-0: fs1-MDT0001-osp-MDT000a: operation mds_disconnect to node 10.47.18.2@o2ib1 failed: rc = -107
Aug 20 23:19:10 dac-e-11 kernel: Lustre: server umount fs1-MDT000a complete
Aug 20 23:19:11 dac-e-11 kernel: LustreError: 137-5: fs1-MDT000a_UUID: not available for connect from 10.47.18.9@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server.
Aug 20 23:19:23 dac-e-11 kernel: Lustre: 245494:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339557/real 1566339557]  req@ffff98474772e780 x1642422109238192/t0(0) o39->fs1-MDT0000-lwp-OST0078@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339563 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:25 dac-e-11 kernel: Lustre: 246276:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339559/real 1566339559]  req@ffff9830e5a4ad00 x1642422109238208/t0(0) o39->fs1-MDT0000-lwp-OST0081@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339565 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:26 dac-e-11 kernel: Lustre: 247081:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339560/real 1566339560]  req@ffff9830cae40000 x1642422109238224/t0(0) o39->fs1-MDT0000-lwp-OST0083@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339566 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:28 dac-e-11 kernel: Lustre: 247918:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339562/real 1566339562]  req@ffff9830cae40480 x1642422109238256/t0(0) o39->fs1-MDT0000-lwp-OST007c@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339568 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:28 dac-e-11 kernel: Lustre: 247918:0:(client.c:2134:ptlrpc_expire_one_request()) Skipped 1 previous similar message
Aug 20 23:19:30 dac-e-11 kernel: Lustre: fs1-MDT0010-lwp-OST007f: Connection to fs1-MDT0010 (at 10.47.18.17@o2ib1) was lost; in progress operations using this service will wait for recovery to complete
Aug 20 23:19:31 dac-e-11 kernel: Lustre: Failing over fs1-OST0081
Aug 20 23:19:31 dac-e-11 kernel: Lustre: server umount fs1-OST0081 complete
Aug 20 23:19:32 dac-e-11 kernel: Lustre: Failing over fs1-OST0083
Aug 20 23:19:32 dac-e-11 kernel: Lustre: Skipped 2 previous similar messages
Aug 20 23:19:32 dac-e-11 kernel: Lustre: server umount fs1-OST0083 complete
Aug 20 23:19:32 dac-e-11 kernel: Lustre: Skipped 2 previous similar messages
Aug 20 23:19:33 dac-e-11 kernel: Lustre: 247997:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339567/real 1566339567]  req@ffff984740536300 x1642422109242896/t0(0) o39->fs1-MDT0000-lwp-OST0082@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339573 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:33 dac-e-11 kernel: Lustre: 247997:0:(client.c:2134:ptlrpc_expire_one_request()) Skipped 275 previous similar messages
Aug 20 23:19:34 dac-e-11 kernel: Lustre: Failing over fs1-OST0079
Aug 20 23:19:34 dac-e-11 kernel: Lustre: Skipped 3 previous similar messages
Aug 20 23:19:34 dac-e-11 kernel: Lustre: server umount fs1-OST0079 complete
Aug 20 23:19:34 dac-e-11 kernel: Lustre: Skipped 3 previous similar messages

HOSTS -------------------------------------------------------------------------
dac-e-13
-------------------------------------------------------------------------------
-- Logs begin at Wed 2019-05-08 14:07:45 BST, end at Wed 2019-08-21 11:34:55 BST. --
Aug 20 22:50:13 dac-e-13 kernel: Lustre: fs1-OST0090: Connection restored to  (at 10.47.20.68@o2ib1)
Aug 20 22:50:13 dac-e-13 kernel: Lustre: Skipped 9 previous similar messages
Aug 20 22:55:51 dac-e-13 kernel: Lustre: fs1-OST009a: Connection restored to  (at 10.47.21.35@o2ib1)
Aug 20 22:55:51 dac-e-13 kernel: Lustre: Skipped 81 previous similar messages
Aug 20 22:55:53 dac-e-13 kernel: Lustre: 154918:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1566338151/real 1566338153]  req@ffff9a6588f48900 x1642422107578016/t0(0) o104->fs1-OST009b@10.47.21.35@o2ib1:15/16 lens 296/224 e 0 to 1 dl 1566338158 ref 1 fl Rpc:eX/0/ffffffff rc 0/-1
Aug 20 22:55:53 dac-e-13 kernel: LustreError: 49914:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9a4d5f311a00
Aug 20 22:55:53 dac-e-13 kernel: LustreError: 49913:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9a4d5f311a00
Aug 20 22:55:53 dac-e-13 kernel: LustreError: 49916:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9a4d5f311a00
Aug 20 22:55:53 dac-e-13 kernel: LustreError: 49915:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9a4d5f311a00
Aug 20 22:55:55 dac-e-13 kernel: Lustre: fs1-OST0090: Client efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1) reconnecting
Aug 20 22:55:56 dac-e-13 kernel: Lustre: 299250:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1566338154/real 1566338156]  req@ffff9a4d96b4c380 x1642422107582448/t0(0) o105->fs1-OST009b@10.47.21.34@o2ib1:15/16 lens 360/224 e 0 to 1 dl 1566338161 ref 1 fl Rpc:eX/0/ffffffff rc 0/-1
Aug 20 22:55:56 dac-e-13 kernel: LNet: 49915:0:(o2iblnd_cb.c:413:kiblnd_handle_rx()) PUT_NACK from 10.47.21.37@o2ib1
Aug 20 22:55:56 dac-e-13 kernel: LNet: 49915:0:(o2iblnd_cb.c:413:kiblnd_handle_rx()) Skipped 1 previous similar message
Aug 20 22:55:57 dac-e-13 kernel: Lustre: fs1-OST0096: Client b70bd8d1-bef1-cafa-1d50-bfa93684ff22 (at 10.47.21.37@o2ib1) reconnecting
Aug 20 22:55:57 dac-e-13 kernel: Lustre: Skipped 2 previous similar messages
Aug 20 22:55:57 dac-e-13 kernel: LustreError: 180360:0:(ldlm_lib.c:3253:target_bulk_io()) @@@ Reconnect on bulk READ  req@ffff9a66a3264050 x1642422175279136/t0(0) o3->efd44c75-059d-5b3e-f4e6-657896b473fc@10.47.21.34@o2ib1:87/0 lens 488/440 e 1 to 0 dl 1566338187 ref 1 fl Interpret:/0/0 rc 0/0
Aug 20 22:55:57 dac-e-13 kernel: Lustre: fs1-OST0099: Bulk IO read error with efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1), client will retry: rc -110
Aug 20 22:55:58 dac-e-13 kernel: Lustre: fs1-OST009b: Client efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1) reconnecting
Aug 20 22:55:58 dac-e-13 kernel: Lustre: Skipped 1 previous similar message
Aug 20 22:56:16 dac-e-13 kernel: Lustre: fs1-OST0090: Client efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1) reconnecting
Aug 20 22:56:22 dac-e-13 kernel: Lustre: fs1-OST0093: Client b70bd8d1-bef1-cafa-1d50-bfa93684ff22 (at 10.47.21.37@o2ib1) reconnecting
Aug 20 22:56:22 dac-e-13 kernel: Lustre: Skipped 3 previous similar messages
Aug 20 22:56:22 dac-e-13 kernel: LustreError: 180356:0:(ldlm_lib.c:3253:target_bulk_io()) @@@ Reconnect on bulk READ  req@ffff9a4ece400050 x1642422175276912/t0(0) o3->b70bd8d1-bef1-cafa-1d50-bfa93684ff22@10.47.21.37@o2ib1:94/0 lens 488/440 e 0 to 0 dl 1566338194 ref 1 fl Interpret:/0/0 rc 0/0
Aug 20 22:56:22 dac-e-13 kernel: Lustre: fs1-OST0093: Bulk IO read error with b70bd8d1-bef1-cafa-1d50-bfa93684ff22 (at 10.47.21.37@o2ib1), client will retry: rc -110
Aug 20 22:56:49 dac-e-13 kernel: LustreError: 258104:0:(ldlm_lib.c:3268:target_bulk_io()) @@@ truncated bulk READ 15728640(16777216)  req@ffff9a4ecda5a850 x1642422175276848/t0(0) o3->b70bd8d1-bef1-cafa-1d50-bfa93684ff22@10.47.21.37@o2ib1:119/0 lens 488/440 e 1 to 0 dl 1566338219 ref 1 fl Interpret:/0/0 rc 0/0
Aug 20 22:56:49 dac-e-13 kernel: LustreError: 254101:0:(ldlm_lib.c:3268:target_bulk_io()) @@@ truncated bulk READ 15728640(16777216)  req@ffff9a4ecda5c050 x1642422175276432/t0(0) o3->b70bd8d1-bef1-cafa-1d50-bfa93684ff22@10.47.21.37@o2ib1:119/0 lens 488/440 e 1 to 0 dl 1566338219 ref 1 fl Interpret:/0/0 rc 0/0
Aug 20 22:56:49 dac-e-13 kernel: Lustre: fs1-OST0090: Bulk IO read error with b70bd8d1-bef1-cafa-1d50-bfa93684ff22 (at 10.47.21.37@o2ib1), client will retry: rc -110
Aug 20 23:00:46 dac-e-13 kernel: Lustre: fs1-OST0090: Client b70bd8d1-bef1-cafa-1d50-bfa93684ff22 (at 10.47.21.37@o2ib1) reconnecting
Aug 20 23:00:46 dac-e-13 kernel: Lustre: fs1-OST0090: Connection restored to  (at 10.47.21.37@o2ib1)
Aug 20 23:00:46 dac-e-13 kernel: Lustre: Skipped 40 previous similar messages
Aug 20 23:03:01 dac-e-13 kernel: Lustre: fs1-OST0095: Client 3f28bd14-132c-e629-0b42-8ea0fdd5d2a4 (at 10.47.21.31@o2ib1) reconnecting
Aug 20 23:03:01 dac-e-13 kernel: Lustre: fs1-OST0095: Connection restored to  (at 10.47.21.31@o2ib1)
Aug 20 23:03:01 dac-e-13 kernel: Lustre: Skipped 2 previous similar messages
Aug 20 23:09:09 dac-e-13 kernel: Lustre: fs1-OST009b: Connection restored to  (at 10.47.20.68@o2ib1)
Aug 20 23:19:10 dac-e-13 kernel: LustreError: 11-0: fs1-MDT0000-lwp-MDT000c: operation mds_disconnect to node 10.47.18.1@o2ib1 failed: rc = -107
Aug 20 23:19:10 dac-e-13 kernel: Lustre: Failing over fs1-MDT000c
Aug 20 23:19:10 dac-e-13 kernel: LustreError: 140444:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED   req@ffff9a654ffb9f80 x1642422109237536/t0(0) o41->fs1-MDT0006-osp-MDT000c@10.47.18.7@o2ib1:24/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1
Aug 20 23:19:10 dac-e-13 kernel: Lustre: server umount fs1-MDT000c complete
Aug 20 23:19:11 dac-e-13 kernel: LustreError: 137-5: fs1-MDT000c_UUID: not available for connect from 10.47.18.9@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server.
Aug 20 23:19:11 dac-e-13 kernel: LustreError: Skipped 1 previous similar message
Aug 20 23:19:12 dac-e-13 kernel: LustreError: 137-5: fs1-MDT000c_UUID: not available for connect from 10.47.18.8@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server.
Aug 20 23:19:23 dac-e-13 kernel: Lustre: 246051:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339557/real 1566339557]  req@ffff9a65073dba80 x1642422109238208/t0(0) o39->fs1-MDT0000-lwp-OST0090@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339563 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:25 dac-e-13 kernel: Lustre: 247425:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339559/real 1566339559]  req@ffff9a656351de80 x1642422109238224/t0(0) o39->fs1-MDT0000-lwp-OST0099@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339565 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:26 dac-e-13 kernel: Lustre: 247453:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339560/real 1566339560]  req@ffff9a4d6bb34800 x1642422109242832/t0(0) o39->fs1-MDT0000-lwp-OST009b@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339566 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:27 dac-e-13 kernel: Lustre: fs1-MDT0007-lwp-OST0095: Connection to fs1-MDT0007 (at 10.47.18.8@o2ib1) was lost; in progress operations using this service will wait for recovery to complete
Aug 20 23:19:27 dac-e-13 kernel: Lustre: Skipped 1 previous similar message
Aug 20 23:19:27 dac-e-13 kernel: Lustre: Failing over fs1-OST009a
Aug 20 23:19:27 dac-e-13 kernel: Lustre: server umount fs1-OST009a complete
Aug 20 23:19:28 dac-e-13 kernel: Lustre: Failing over fs1-OST0091
Aug 20 23:19:28 dac-e-13 kernel: Lustre: Skipped 1 previous similar message
Aug 20 23:19:28 dac-e-13 kernel: Lustre: server umount fs1-OST0091 complete
Aug 20 23:19:28 dac-e-13 kernel: Lustre: Skipped 1 previous similar message
Aug 20 23:19:28 dac-e-13 kernel: Lustre: 247506:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339562/real 1566339562]  req@ffff9a4d6bb30480 x1642422109242864/t0(0) o39->fs1-MDT0000-lwp-OST0094@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339568 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:28 dac-e-13 kernel: Lustre: 247506:0:(client.c:2134:ptlrpc_expire_one_request()) Skipped 276 previous similar messages
Aug 20 23:19:31 dac-e-13 kernel: Lustre: Failing over fs1-OST0099
Aug 20 23:19:31 dac-e-13 kernel: Lustre: Skipped 4 previous similar messages
Aug 20 23:19:31 dac-e-13 kernel: Lustre: server umount fs1-OST0099 complete
Aug 20 23:19:31 dac-e-13 kernel: Lustre: Skipped 4 previous similar messages

HOSTS -------------------------------------------------------------------------
dac-e-12
-------------------------------------------------------------------------------
-- Logs begin at Wed 2019-05-08 14:07:45 BST, end at Wed 2019-08-21 11:34:55 BST. --
Aug 20 22:50:13 dac-e-12 kernel: Lustre: fs1-OST0084: Connection restored to 60f2ae35-b178-1c80-ba50-e7c416f9922a (at 10.47.20.68@o2ib1)
Aug 20 22:50:13 dac-e-12 kernel: Lustre: Skipped 10 previous similar messages
Aug 20 22:55:51 dac-e-12 kernel: Lustre: fs1-OST008f: Connection restored to efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1)
Aug 20 22:55:51 dac-e-12 kernel: Lustre: Skipped 80 previous similar messages
Aug 20 22:55:53 dac-e-12 kernel: Lustre: fs1-OST008f: Client 4f2faf4f-1754-0923-7bb5-26c935576df5 (at 10.47.21.36@o2ib1) reconnecting
Aug 20 22:55:54 dac-e-12 kernel: LustreError: 238117:0:(ldlm_lib.c:3253:target_bulk_io()) @@@ Reconnect on bulk READ  req@ffff9d2f245bc850 x1642422175274784/t0(0) o3->4f2faf4f-1754-0923-7bb5-26c935576df5@10.47.21.36@o2ib1:62/0 lens 488/440 e 0 to 0 dl 1566338162 ref 1 fl Interpret:/0/0 rc 0/0
Aug 20 22:55:55 dac-e-12 kernel: Lustre: fs1-OST0085: Client efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1) reconnecting
Aug 20 22:55:55 dac-e-12 kernel: Lustre: Skipped 1 previous similar message
Aug 20 22:55:56 dac-e-12 kernel: LustreError: 50963:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9d304fbb1200
Aug 20 22:55:56 dac-e-12 kernel: LustreError: 50959:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9d17c3788e00
Aug 20 22:55:56 dac-e-12 kernel: LustreError: 50967:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9d17c3788e00
Aug 20 22:55:56 dac-e-12 kernel: LustreError: 50958:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9d17c3788e00
Aug 20 22:55:56 dac-e-12 kernel: LustreError: 50956:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9d17c3788e00
Aug 20 22:55:56 dac-e-12 kernel: LustreError: 50961:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9d304fbb1200
Aug 20 22:55:56 dac-e-12 kernel: LustreError: 50962:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9d304fbb1200
Aug 20 22:55:56 dac-e-12 kernel: Lustre: fs1-OST008e: Bulk IO read error with 4f2faf4f-1754-0923-7bb5-26c935576df5 (at 10.47.21.36@o2ib1), client will retry: rc -110
Aug 20 22:55:56 dac-e-12 kernel: LustreError: 238125:0:(ldlm_lib.c:3253:target_bulk_io()) @@@ Reconnect on bulk READ  req@ffff9d17f7c68050 x1642422175274880/t0(0) o3->d51aba7f-3e9b-b409-76a8-ec3ceb1b1ca5@10.47.21.35@o2ib1:94/0 lens 488/440 e 0 to 0 dl 1566338194 ref 1 fl Interpret:/0/0 rc 0/0
Aug 20 23:01:41 dac-e-12 kernel: Lustre: fs1-OST0089: Connection restored to d51aba7f-3e9b-b409-76a8-ec3ceb1b1ca5 (at 10.47.21.35@o2ib1)
Aug 20 23:01:41 dac-e-12 kernel: Lustre: Skipped 31 previous similar messages
Aug 20 23:09:09 dac-e-12 kernel: Lustre: fs1-OST008f: Connection restored to 60f2ae35-b178-1c80-ba50-e7c416f9922a (at 10.47.20.68@o2ib1)
Aug 20 23:09:09 dac-e-12 kernel: Lustre: Skipped 1 previous similar message
Aug 20 23:19:10 dac-e-12 kernel: Lustre: Failing over fs1-MDT000b
Aug 20 23:19:10 dac-e-12 kernel: LustreError: 11-0: fs1-MDT0001-osp-MDT000b: operation mds_disconnect to node 10.47.18.2@o2ib1 failed: rc = -107
Aug 20 23:19:10 dac-e-12 kernel: LustreError: 141438:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED   req@ffff9d17216f4380 x1642422109237008/t0(0) o41->fs1-MDT0003-osp-MDT000b@10.47.18.4@o2ib1:24/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1
Aug 20 23:19:10 dac-e-12 kernel: Lustre: server umount fs1-MDT000b complete
Aug 20 23:19:11 dac-e-12 kernel: LustreError: 137-5: fs1-MDT000b_UUID: not available for connect from 10.47.18.5@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server.
Aug 20 23:19:12 dac-e-12 kernel: LustreError: 137-5: fs1-MDT000b_UUID: not available for connect from 10.47.18.8@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server.
Aug 20 23:19:13 dac-e-12 kernel: LustreError: 137-5: fs1-MDT000b_UUID: not available for connect from 10.47.18.9@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server.
Aug 20 23:19:23 dac-e-12 kernel: Lustre: 249717:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339557/real 1566339557]  req@ffff9d2ec15f8d80 x1642422109237056/t0(0) o39->fs1-MDT0000-lwp-OST0084@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339563 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:25 dac-e-12 kernel: Lustre: 250506:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339559/real 1566339559]  req@ffff9d2ea0751b00 x1642422109237072/t0(0) o39->fs1-MDT0000-lwp-OST008d@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339565 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:26 dac-e-12 kernel: Lustre: 251308:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339560/real 1566339560]  req@ffff9d2ec15fba80 x1642422109237088/t0(0) o39->fs1-MDT0000-lwp-OST008f@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339566 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:28 dac-e-12 kernel: Lustre: 252767:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339562/real 1566339562]  req@ffff9d2ec15f9f80 x1642422109237120/t0(0) o39->fs1-MDT0000-lwp-OST0088@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339568 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:28 dac-e-12 kernel: Lustre: 252767:0:(client.c:2134:ptlrpc_expire_one_request()) Skipped 1 previous similar message
Aug 20 23:19:33 dac-e-12 kernel: Lustre: 253509:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339567/real 1566339567]  req@ffff9d2ea0755100 x1642422109241696/t0(0) o39->fs1-MDT0000-lwp-OST008e@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339573 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:33 dac-e-12 kernel: Lustre: 253509:0:(client.c:2134:ptlrpc_expire_one_request()) Skipped 5 previous similar messages
Aug 20 23:19:33 dac-e-12 kernel: Lustre: fs1-MDT0002-lwp-OST008b: Connection to fs1-MDT0002 (at 10.47.18.3@o2ib1) was lost; in progress operations using this service will wait for recovery to complete
Aug 20 23:19:33 dac-e-12 kernel: Lustre: Failing over fs1-OST0086
Aug 20 23:19:33 dac-e-12 kernel: Lustre: server umount fs1-OST0086 complete
Aug 20 23:19:34 dac-e-12 kernel: Lustre: Failing over fs1-OST0085
Aug 20 23:19:34 dac-e-12 kernel: Lustre: server umount fs1-OST0085 complete
Aug 20 23:19:37 dac-e-12 kernel: Lustre: Failing over fs1-OST008d
Aug 20 23:19:37 dac-e-12 kernel: Lustre: Skipped 4 previous similar messages
Aug 20 23:19:37 dac-e-12 kernel: Lustre: server umount fs1-OST008d complete
Aug 20 23:19:37 dac-e-12 kernel: Lustre: Skipped 4 previous similar messages

HOSTS -------------------------------------------------------------------------
dac-e-16
-------------------------------------------------------------------------------
-- Logs begin at Wed 2019-05-08 14:07:46 BST, end at Wed 2019-08-21 11:34:55 BST. --
Aug 20 22:50:13 dac-e-16 kernel: Lustre: fs1-OST00b4: Connection restored to 60f2ae35-b178-1c80-ba50-e7c416f9922a (at 10.47.20.68@o2ib1)
Aug 20 22:50:13 dac-e-16 kernel: Lustre: Skipped 8 previous similar messages
Aug 20 22:55:51 dac-e-16 kernel: Lustre: fs1-OST00b6: Connection restored to ba6d10d8-29ae-af16-bb29-fe0009074454 (at 10.47.21.33@o2ib1)
Aug 20 22:55:51 dac-e-16 kernel: Lustre: Skipped 74 previous similar messages
Aug 20 22:55:53 dac-e-16 kernel: LustreError: 65244:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f900ea71800
Aug 20 22:55:53 dac-e-16 kernel: LustreError: 65242:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f900ea71800
Aug 20 22:55:53 dac-e-16 kernel: LustreError: 65243:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f900ea71800
Aug 20 22:55:53 dac-e-16 kernel: LustreError: 65252:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8f900ea71800
Aug 20 22:55:53 dac-e-16 kernel: Lustre: fs1-OST00be: Client 3f28bd14-132c-e629-0b42-8ea0fdd5d2a4 (at 10.47.21.31@o2ib1) reconnecting
Aug 20 22:55:53 dac-e-16 kernel: LustreError: 65248:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8fa871be1a00
Aug 20 22:55:53 dac-e-16 kernel: LustreError: 65246:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8fa871be1a00
Aug 20 22:55:53 dac-e-16 kernel: LustreError: 65247:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8fa871be1a00
Aug 20 22:55:53 dac-e-16 kernel: LustreError: 65245:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8fa871be1a00
Aug 20 22:55:53 dac-e-16 kernel: Lustre: 175820:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1566338151/real 1566338153]  req@ffff8fa7e5a5cc80 x1642422107578000/t0(0) o104->fs1-OST00b4@10.47.21.34@o2ib1:15/16 lens 296/224 e 0 to 1 dl 1566338158 ref 1 fl Rpc:eX/0/ffffffff rc 0/-1
Aug 20 22:55:53 dac-e-16 kernel: LustreError: 65247:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8fa845970000
Aug 20 22:55:53 dac-e-16 kernel: LustreError: 65246:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8fa845970000
Aug 20 22:55:53 dac-e-16 kernel: LustreError: 65248:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8fa845970000
Aug 20 22:55:53 dac-e-16 kernel: LustreError: 65245:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8fa845970000
Aug 20 22:55:54 dac-e-16 kernel: Lustre: fs1-OST00b7: Client d51aba7f-3e9b-b409-76a8-ec3ceb1b1ca5 (at 10.47.21.35@o2ib1) reconnecting
Aug 20 22:55:54 dac-e-16 kernel: Lustre: Skipped 1 previous similar message
Aug 20 22:55:54 dac-e-16 kernel: LustreError: 194465:0:(ldlm_lib.c:3253:target_bulk_io()) @@@ Reconnect on bulk READ  req@ffff8f90a5b72850 x1642422175274320/t0(0) o3->d51aba7f-3e9b-b409-76a8-ec3ceb1b1ca5@10.47.21.35@o2ib1:94/0 lens 488/440 e 0 to 0 dl 1566338194 ref 1 fl Interpret:/0/0 rc 0/0
Aug 20 22:55:54 dac-e-16 kernel: Lustre: fs1-OST00bb: Bulk IO read error with d51aba7f-3e9b-b409-76a8-ec3ceb1b1ca5 (at 10.47.21.35@o2ib1), client will retry: rc -110
Aug 20 22:55:55 dac-e-16 kernel: Lustre: fs1-OST00b8: Client efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1) reconnecting
Aug 20 22:55:55 dac-e-16 kernel: Lustre: Skipped 2 previous similar messages
Aug 20 22:55:55 dac-e-16 kernel: LustreError: 248250:0:(ldlm_lib.c:3253:target_bulk_io()) @@@ Reconnect on bulk READ  req@ffff8fa7dd0cb850 x1642422175274096/t0(0) o3->efd44c75-059d-5b3e-f4e6-657896b473fc@10.47.21.34@o2ib1:82/0 lens 488/440 e 1 to 0 dl 1566338182 ref 1 fl Interpret:/0/0 rc 0/0
Aug 20 22:55:55 dac-e-16 kernel: Lustre: fs1-OST00be: Bulk IO read error with efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1), client will retry: rc -110
Aug 20 22:55:55 dac-e-16 kernel: Lustre: 175820:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1566338153/real 1566338155]  req@ffff8fa7e5a5cc80 x1642422107578000/t0(0) o104->fs1-OST00b4@10.47.21.34@o2ib1:15/16 lens 296/224 e 0 to 1 dl 1566338160 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1
Aug 20 22:55:57 dac-e-16 kernel: Lustre: fs1-OST00b4: Client efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1) reconnecting
Aug 20 22:55:57 dac-e-16 kernel: Lustre: Skipped 2 previous similar messages
Aug 20 22:56:16 dac-e-16 kernel: Lustre: fs1-OST00b8: Client efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1) reconnecting
Aug 20 22:56:16 dac-e-16 kernel: Lustre: Skipped 1 previous similar message
Aug 20 22:56:49 dac-e-16 kernel: LustreError: 279548:0:(ldlm_lib.c:3259:target_bulk_io()) @@@ network error on bulk READ  req@ffff8fa81c346850 x1642422175278480/t0(0) o3->3f28bd14-132c-e629-0b42-8ea0fdd5d2a4@10.47.21.31@o2ib1:134/0 lens 488/440 e 3 to 0 dl 1566338234 ref 1 fl Interpret:/0/0 rc 0/0
Aug 20 22:56:49 dac-e-16 kernel: Lustre: fs1-OST00b8: Bulk IO read error with 3f28bd14-132c-e629-0b42-8ea0fdd5d2a4 (at 10.47.21.31@o2ib1), client will retry: rc -110
Aug 20 23:01:08 dac-e-16 kernel: Lustre: fs1-OST00bd: Client 3f28bd14-132c-e629-0b42-8ea0fdd5d2a4 (at 10.47.21.31@o2ib1) reconnecting
Aug 20 23:01:08 dac-e-16 kernel: Lustre: Skipped 2 previous similar messages
Aug 20 23:01:08 dac-e-16 kernel: Lustre: fs1-OST00bd: Connection restored to 3f28bd14-132c-e629-0b42-8ea0fdd5d2a4 (at 10.47.21.31@o2ib1)
Aug 20 23:01:08 dac-e-16 kernel: Lustre: Skipped 41 previous similar messages
Aug 20 23:03:25 dac-e-16 kernel: Lustre: fs1-OST00b8: Client 3f28bd14-132c-e629-0b42-8ea0fdd5d2a4 (at 10.47.21.31@o2ib1) reconnecting
Aug 20 23:03:25 dac-e-16 kernel: Lustre: fs1-OST00b8: Connection restored to 3f28bd14-132c-e629-0b42-8ea0fdd5d2a4 (at 10.47.21.31@o2ib1)
Aug 20 23:03:25 dac-e-16 kernel: Lustre: Skipped 2 previous similar messages
Aug 20 23:09:09 dac-e-16 kernel: Lustre: fs1-OST00bd: Connection restored to 60f2ae35-b178-1c80-ba50-e7c416f9922a (at 10.47.20.68@o2ib1)
Aug 20 23:19:10 dac-e-16 kernel: LustreError: 11-0: fs1-MDT0000-lwp-MDT000f: operation mds_disconnect to node 10.47.18.1@o2ib1 failed: rc = -107
Aug 20 23:19:10 dac-e-16 kernel: Lustre: Failing over fs1-MDT000f
Aug 20 23:19:10 dac-e-16 kernel: Lustre: server umount fs1-MDT000f complete
Aug 20 23:19:12 dac-e-16 kernel: LustreError: 137-5: fs1-MDT000f_UUID: not available for connect from 10.47.18.9@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server.
Aug 20 23:19:13 dac-e-16 kernel: LustreError: 137-5: fs1-MDT000f_UUID: not available for connect from 10.47.18.8@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server.
Aug 20 23:19:13 dac-e-16 kernel: LustreError: Skipped 1 previous similar message
Aug 20 23:19:24 dac-e-16 kernel: Lustre: 262053:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339558/real 1566339558]  req@ffff8fa95777b180 x1642422109237680/t0(0) o39->fs1-MDT0000-lwp-OST00b4@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339564 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:25 dac-e-16 kernel: Lustre: 262165:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339559/real 1566339559]  req@ffff8fa955b62400 x1642422109237696/t0(0) o39->fs1-MDT0000-lwp-OST00bd@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339565 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:26 dac-e-16 kernel: Lustre: 262191:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339560/real 1566339560]  req@ffff8fa955b61680 x1642422109237712/t0(0) o39->fs1-MDT0000-lwp-OST00bf@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339566 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:28 dac-e-16 kernel: Lustre: 262250:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339562/real 1566339562]  req@ffff8fa955b64800 x1642422109242304/t0(0) o39->fs1-MDT0000-lwp-OST00b8@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339568 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:28 dac-e-16 kernel: Lustre: 262250:0:(client.c:2134:ptlrpc_expire_one_request()) Skipped 1 previous similar message
Aug 20 23:19:28 dac-e-16 kernel: Lustre: fs1-MDT0003-lwp-OST00b9: Connection to fs1-MDT0003 (at 10.47.18.4@o2ib1) was lost; in progress operations using this service will wait for recovery to complete
Aug 20 23:19:30 dac-e-16 kernel: Lustre: Failing over fs1-OST00b4
Aug 20 23:19:30 dac-e-16 kernel: Lustre: server umount fs1-OST00ba complete
Aug 20 23:19:31 dac-e-16 kernel: Lustre: Failing over fs1-OST00bd
Aug 20 23:19:31 dac-e-16 kernel: Lustre: Skipped 2 previous similar messages
Aug 20 23:19:31 dac-e-16 kernel: Lustre: server umount fs1-OST00bd complete
Aug 20 23:19:31 dac-e-16 kernel: Lustre: Skipped 2 previous similar messages
Aug 20 23:19:33 dac-e-16 kernel: Lustre: 262927:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339567/real 1566339567]  req@ffff8fa955b66780 x1642422109242400/t0(0) o39->fs1-MDT0000-lwp-OST00be@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339573 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:33 dac-e-16 kernel: Lustre: 262927:0:(client.c:2134:ptlrpc_expire_one_request()) Skipped 278 previous similar messages
Aug 20 23:19:33 dac-e-16 kernel: Lustre: Failing over fs1-OST00b6
Aug 20 23:19:33 dac-e-16 kernel: Lustre: Skipped 5 previous similar messages
Aug 20 23:19:33 dac-e-16 kernel: Lustre: server umount fs1-OST00b6 complete
Aug 20 23:19:33 dac-e-16 kernel: Lustre: Skipped 5 previous similar messages

HOSTS -------------------------------------------------------------------------
dac-e-15
-------------------------------------------------------------------------------
-- Logs begin at Wed 2019-05-08 14:07:46 BST, end at Wed 2019-08-21 11:34:55 BST. --
Aug 20 22:50:13 dac-e-15 kernel: Lustre: fs1-OST00a8: Connection restored to 60f2ae35-b178-1c80-ba50-e7c416f9922a (at 10.47.20.68@o2ib1)
Aug 20 22:50:13 dac-e-15 kernel: Lustre: Skipped 10 previous similar messages
Aug 20 22:55:51 dac-e-15 kernel: Lustre: fs1-OST00b0: Connection restored to ba6d10d8-29ae-af16-bb29-fe0009074454 (at 10.47.21.33@o2ib1)
Aug 20 22:55:51 dac-e-15 kernel: Lustre: Skipped 75 previous similar messages
Aug 20 22:55:55 dac-e-15 kernel: Lustre: fs1-OST00af: Client efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1) reconnecting
Aug 20 22:56:07 dac-e-15 kernel: Lustre: 16523:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566338156/real 1566338156]  req@ffff996a476b3180 x1642422107585072/t0(0) o104->fs1-OST00ac@10.47.21.34@o2ib1:15/16 lens 296/224 e 0 to 1 dl 1566338167 ref 1 fl Rpc:X/0/ffffffff rc 0/-1
Aug 20 22:56:16 dac-e-15 kernel: Lustre: fs1-OST00ad: Client efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1) reconnecting
Aug 20 22:56:16 dac-e-15 kernel: Lustre: Skipped 1 previous similar message
Aug 20 22:56:49 dac-e-15 kernel: LustreError: 436887:0:(ldlm_lib.c:3268:target_bulk_io()) @@@ truncated bulk READ 15728640(16777216)  req@ffff996b9828f850 x1642422175280432/t0(0) o3->efd44c75-059d-5b3e-f4e6-657896b473fc@10.47.21.34@o2ib1:118/0 lens 488/440 e 2 to 0 dl 1566338218 ref 1 fl Interpret:/0/0 rc 0/0
Aug 20 22:56:49 dac-e-15 kernel: Lustre: fs1-OST00ae: Bulk IO read error with efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1), client will retry: rc -110
Aug 20 23:01:14 dac-e-15 kernel: Lustre: fs1-OST00a9: Connection restored to d51aba7f-3e9b-b409-76a8-ec3ceb1b1ca5 (at 10.47.21.35@o2ib1)
Aug 20 23:01:14 dac-e-15 kernel: Lustre: Skipped 36 previous similar messages
Aug 20 23:01:59 dac-e-15 kernel: Lustre: fs1-OST00ae: Client efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1) reconnecting
Aug 20 23:09:09 dac-e-15 kernel: Lustre: fs1-OST00b1: Connection restored to 60f2ae35-b178-1c80-ba50-e7c416f9922a (at 10.47.20.68@o2ib1)
Aug 20 23:09:09 dac-e-15 kernel: Lustre: Skipped 2 previous similar messages
Aug 20 23:19:10 dac-e-15 kernel: Lustre: Failing over fs1-MDT000e
Aug 20 23:19:10 dac-e-15 kernel: LustreError: 11-0: fs1-MDT0009-osp-MDT000e: operation mds_disconnect to node 10.47.18.10@o2ib1 failed: rc = -107
Aug 20 23:19:10 dac-e-15 kernel: Lustre: server umount fs1-MDT000e complete
Aug 20 23:19:11 dac-e-15 kernel: LustreError: 137-5: fs1-MDT000e_UUID: not available for connect from 10.47.18.9@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server.
Aug 20 23:19:23 dac-e-15 kernel: Lustre: 50806:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339557/real 1566339557]  req@ffff996bd2a21b00 x1642422109238432/t0(0) o39->fs1-MDT0000-lwp-OST00a8@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339563 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:25 dac-e-15 kernel: Lustre: 50831:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339559/real 1566339559]  req@ffff99521cb49b00 x1642422109238448/t0(0) o39->fs1-MDT0000-lwp-OST00b1@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339565 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:26 dac-e-15 kernel: Lustre: 50856:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339560/real 1566339560]  req@ffff996bd2a25a00 x1642422109238464/t0(0) o39->fs1-MDT0000-lwp-OST00b3@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339566 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:28 dac-e-15 kernel: Lustre: 50915:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339562/real 1566339562]  req@ffff996bd2a21680 x1642422109243056/t0(0) o39->fs1-MDT0000-lwp-OST00ac@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339568 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:28 dac-e-15 kernel: Lustre: 50915:0:(client.c:2134:ptlrpc_expire_one_request()) Skipped 1 previous similar message
Aug 20 23:19:29 dac-e-15 kernel: Lustre: fs1-MDT000a-lwp-OST00af: Connection to fs1-MDT000a (at 10.47.18.11@o2ib1) was lost; in progress operations using this service will wait for recovery to complete
Aug 20 23:19:29 dac-e-15 kernel: Lustre: Failing over fs1-OST00a8
Aug 20 23:19:30 dac-e-15 kernel: Lustre: server umount fs1-OST00a8 complete
Aug 20 23:19:31 dac-e-15 kernel: Lustre: Failing over fs1-OST00b1
Aug 20 23:19:31 dac-e-15 kernel: Lustre: Skipped 2 previous similar messages
Aug 20 23:19:31 dac-e-15 kernel: Lustre: server umount fs1-OST00b1 complete
Aug 20 23:19:31 dac-e-15 kernel: Lustre: Skipped 2 previous similar messages
Aug 20 23:19:33 dac-e-15 kernel: Lustre: 51593:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339567/real 1566339567]  req@ffff996bd2a22400 x1642422109243152/t0(0) o39->fs1-MDT0000-lwp-OST00b2@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339573 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:33 dac-e-15 kernel: Lustre: 51593:0:(client.c:2134:ptlrpc_expire_one_request()) Skipped 274 previous similar messages
Aug 20 23:19:33 dac-e-15 kernel: Lustre: Failing over fs1-OST00aa
Aug 20 23:19:33 dac-e-15 kernel: Lustre: Skipped 5 previous similar messages
Aug 20 23:19:33 dac-e-15 kernel: Lustre: server umount fs1-OST00aa complete
Aug 20 23:19:33 dac-e-15 kernel: Lustre: Skipped 5 previous similar messages

HOSTS -------------------------------------------------------------------------
dac-e-17
-------------------------------------------------------------------------------
-- Logs begin at Wed 2019-05-08 14:07:47 BST, end at Wed 2019-08-21 11:34:55 BST. --
Aug 20 22:50:13 dac-e-17 kernel: Lustre: fs1-OST00c0: Connection restored to 60f2ae35-b178-1c80-ba50-e7c416f9922a (at 10.47.20.68@o2ib1)
Aug 20 22:50:13 dac-e-17 kernel: Lustre: Skipped 10 previous similar messages
Aug 20 22:55:51 dac-e-17 kernel: Lustre: fs1-OST00c8: Connection restored to 4f2faf4f-1754-0923-7bb5-26c935576df5 (at 10.47.21.36@o2ib1)
Aug 20 22:55:51 dac-e-17 kernel: Lustre: fs1-OST00c1: Connection restored to 4f2faf4f-1754-0923-7bb5-26c935576df5 (at 10.47.21.36@o2ib1)
Aug 20 22:55:51 dac-e-17 kernel: Lustre: Skipped 79 previous similar messages
Aug 20 22:55:53 dac-e-17 kernel: Lustre: fs1-OST00c9: Client 3f28bd14-132c-e629-0b42-8ea0fdd5d2a4 (at 10.47.21.31@o2ib1) reconnecting
Aug 20 22:55:53 dac-e-17 kernel: LustreError: 126064:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9f64cce33c00
Aug 20 22:55:53 dac-e-17 kernel: LustreError: 126063:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9f64cce33c00
Aug 20 22:55:53 dac-e-17 kernel: LustreError: 126066:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9f64cce33c00
Aug 20 22:55:53 dac-e-17 kernel: LNet: 126066:0:(o2iblnd_cb.c:413:kiblnd_handle_rx()) PUT_NACK from 10.47.21.31@o2ib1
Aug 20 22:55:53 dac-e-17 kernel: LustreError: 311136:0:(ldlm_lib.c:3253:target_bulk_io()) @@@ Reconnect on bulk READ  req@ffff9f7d0bd5c850 x1642422175277424/t0(0) o3->3f28bd14-132c-e629-0b42-8ea0fdd5d2a4@10.47.21.31@o2ib1:82/0 lens 488/440 e 1 to 0 dl 1566338182 ref 1 fl Interpret:/0/0 rc 0/0
Aug 20 22:55:53 dac-e-17 kernel: Lustre: fs1-OST00c9: Bulk IO read error with 3f28bd14-132c-e629-0b42-8ea0fdd5d2a4 (at 10.47.21.31@o2ib1), client will retry: rc -110
Aug 20 22:55:55 dac-e-17 kernel: Lustre: fs1-OST00c5: Client efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1) reconnecting
Aug 20 22:55:55 dac-e-17 kernel: Lustre: 232926:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1566338153/real 1566338155]  req@ffff9f7c623d2400 x1642422107582528/t0(0) o104->fs1-OST00c0@10.47.21.34@o2ib1:15/16 lens 296/224 e 0 to 1 dl 1566338164 ref 1 fl Rpc:eX/0/ffffffff rc 0/-1
Aug 20 22:55:55 dac-e-17 kernel: LustreError: 126063:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9f7ca8f41e00
Aug 20 22:55:55 dac-e-17 kernel: LustreError: 126065:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9f7ca8f41e00
Aug 20 22:55:55 dac-e-17 kernel: LustreError: 126065:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9f7ca8f41e00
Aug 20 22:55:55 dac-e-17 kernel: LustreError: 126066:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9f7ca8f41e00
Aug 20 22:55:56 dac-e-17 kernel: LustreError: 126060:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9f64d7e03e00
Aug 20 22:55:56 dac-e-17 kernel: LustreError: 126059:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9f64d7e03e00
Aug 20 22:55:56 dac-e-17 kernel: LustreError: 126061:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9f64d7e03e00
Aug 20 22:55:56 dac-e-17 kernel: LustreError: 126116:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9f64d7e03e00
Aug 20 22:55:56 dac-e-17 kernel: LNet: 126059:0:(o2iblnd_cb.c:413:kiblnd_handle_rx()) PUT_NACK from 10.47.21.38@o2ib1
Aug 20 22:55:56 dac-e-17 kernel: LNet: 126059:0:(o2iblnd_cb.c:413:kiblnd_handle_rx()) Skipped 6 previous similar messages
Aug 20 22:55:56 dac-e-17 kernel: LustreError: 255373:0:(ldlm_lib.c:3253:target_bulk_io()) @@@ Reconnect on bulk READ  req@ffff9f7d9ee27850 x1642422175274304/t0(0) o3->efd44c75-059d-5b3e-f4e6-657896b473fc@10.47.21.34@o2ib1:62/0 lens 488/440 e 0 to 0 dl 1566338162 ref 1 fl Interpret:/0/0 rc 0/0
Aug 20 22:55:56 dac-e-17 kernel: Lustre: fs1-OST00c6: Bulk IO read error with efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1), client will retry: rc -110
Aug 20 22:56:16 dac-e-17 kernel: Lustre: fs1-OST00c0: Client efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1) reconnecting
Aug 20 22:56:16 dac-e-17 kernel: Lustre: Skipped 6 previous similar messages
Aug 20 22:56:49 dac-e-17 kernel: LustreError: 311137:0:(ldlm_lib.c:3268:target_bulk_io()) @@@ truncated bulk READ 15728640(16777216)  req@ffff9f7d0bd5e050 x1642422175279040/t0(0) o3->efd44c75-059d-5b3e-f4e6-657896b473fc@10.47.21.34@o2ib1:138/0 lens 488/440 e 3 to 0 dl 1566338238 ref 1 fl Interpret:/0/0 rc 0/0
Aug 20 22:56:49 dac-e-17 kernel: Lustre: fs1-OST00c6: Bulk IO read error with efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1), client will retry: rc -110
Aug 20 22:56:49 dac-e-17 kernel: Lustre: Skipped 1 previous similar message
Aug 20 22:56:49 dac-e-17 kernel: LustreError: 311137:0:(ldlm_lib.c:3268:target_bulk_io()) Skipped 1 previous similar message
Aug 20 23:01:42 dac-e-17 kernel: Lustre: fs1-OST00c4: Connection restored to d51aba7f-3e9b-b409-76a8-ec3ceb1b1ca5 (at 10.47.21.35@o2ib1)
Aug 20 23:01:42 dac-e-17 kernel: Lustre: Skipped 39 previous similar messages
Aug 20 23:03:33 dac-e-17 kernel: Lustre: fs1-OST00c6: Client efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1) reconnecting
Aug 20 23:03:33 dac-e-17 kernel: Lustre: Skipped 2 previous similar messages
Aug 20 23:09:09 dac-e-17 kernel: Lustre: fs1-OST00c9: Connection restored to 60f2ae35-b178-1c80-ba50-e7c416f9922a (at 10.47.20.68@o2ib1)
Aug 20 23:09:09 dac-e-17 kernel: Lustre: Skipped 3 previous similar messages
Aug 20 23:19:10 dac-e-17 kernel: LustreError: 11-0: fs1-MDT0000-lwp-MDT0010: operation mds_disconnect to node 10.47.18.1@o2ib1 failed: rc = -107
Aug 20 23:19:10 dac-e-17 kernel: Lustre: Failing over fs1-MDT0010
Aug 20 23:19:10 dac-e-17 kernel: LustreError: 215574:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED   req@ffff9f7c37281b00 x1642422109233456/t0(0) o41->fs1-MDT0014-osp-MDT0010@10.47.18.21@o2ib1:24/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1
Aug 20 23:19:10 dac-e-17 kernel: Lustre: server umount fs1-MDT0010 complete
Aug 20 23:19:11 dac-e-17 kernel: LustreError: 137-5: fs1-MDT0010_UUID: not available for connect from 10.47.18.5@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server.
Aug 20 23:19:12 dac-e-17 kernel: LustreError: 137-5: fs1-MDT0010_UUID: not available for connect from 10.47.18.9@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server.
Aug 20 23:19:13 dac-e-17 kernel: LustreError: 137-5: fs1-MDT0010_UUID: not available for connect from 10.47.18.8@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server.
Aug 20 23:19:24 dac-e-17 kernel: Lustre: 326804:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339558/real 1566339558]  req@ffff9f7c98d75580 x1642422109237584/t0(0) o39->fs1-MDT0000-lwp-OST00c0@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339564 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:24 dac-e-17 kernel: Lustre: 326804:0:(client.c:2134:ptlrpc_expire_one_request()) Skipped 1 previous similar message
Aug 20 23:19:25 dac-e-17 kernel: Lustre: 326840:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339559/real 1566339559]  req@ffff9f7c98d72880 x1642422109237600/t0(0) o39->fs1-MDT0000-lwp-OST00c9@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339565 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:26 dac-e-17 kernel: Lustre: 327675:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339560/real 1566339560]  req@ffff9f660a6b4800 x1642422109237616/t0(0) o39->fs1-MDT0000-lwp-OST00cb@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339566 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:28 dac-e-17 kernel: Lustre: 329221:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339562/real 1566339562]  req@ffff9f66184bbf00 x1642422109237648/t0(0) o39->fs1-MDT0000-lwp-OST00c4@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339568 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:28 dac-e-17 kernel: Lustre: 329221:0:(client.c:2134:ptlrpc_expire_one_request()) Skipped 1 previous similar message
Aug 20 23:19:30 dac-e-17 kernel: Lustre: fs1-MDT0012-lwp-OST00c7: Connection to fs1-MDT0012 (at 10.47.18.19@o2ib1) was lost; in progress operations using this service will wait for recovery to complete
Aug 20 23:19:31 dac-e-17 kernel: Lustre: Failing over fs1-OST00c9
Aug 20 23:19:31 dac-e-17 kernel: Lustre: server umount fs1-OST00c9 complete
Aug 20 23:19:32 dac-e-17 kernel: Lustre: Failing over fs1-OST00cb
Aug 20 23:19:32 dac-e-17 kernel: Lustre: Skipped 2 previous similar messages
Aug 20 23:19:32 dac-e-17 kernel: Lustre: server umount fs1-OST00cb complete
Aug 20 23:19:32 dac-e-17 kernel: Lustre: Skipped 2 previous similar messages
Aug 20 23:19:33 dac-e-17 kernel: Lustre: 332118:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339567/real 1566339567]  req@ffff9f64fc249200 x1642422109242288/t0(0) o39->fs1-MDT0000-lwp-OST00ca@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339573 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:33 dac-e-17 kernel: Lustre: 332118:0:(client.c:2134:ptlrpc_expire_one_request()) Skipped 275 previous similar messages
Aug 20 23:19:34 dac-e-17 kernel: Lustre: Failing over fs1-OST00c1
Aug 20 23:19:34 dac-e-17 kernel: Lustre: Skipped 3 previous similar messages
Aug 20 23:19:34 dac-e-17 kernel: Lustre: server umount fs1-OST00c1 complete
Aug 20 23:19:34 dac-e-17 kernel: Lustre: Skipped 3 previous similar messages

HOSTS -------------------------------------------------------------------------
dac-e-18
-------------------------------------------------------------------------------
-- Logs begin at Wed 2019-05-08 14:07:46 BST, end at Wed 2019-08-21 11:34:55 BST. --
Aug 20 22:55:51 dac-e-18 kernel: Lustre: fs1-OST00ce: Connection restored to 60f2ae35-b178-1c80-ba50-e7c416f9922a (at 10.47.20.68@o2ib1)
Aug 20 22:55:51 dac-e-18 kernel: Lustre: Skipped 87 previous similar messages
Aug 20 22:55:53 dac-e-18 kernel: LustreError: 157123:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff96bc9adc2800
Aug 20 22:55:53 dac-e-18 kernel: Lustre: fs1-OST00d6: Client d51aba7f-3e9b-b409-76a8-ec3ceb1b1ca5 (at 10.47.21.35@o2ib1) reconnecting
Aug 20 22:55:53 dac-e-18 kernel: LustreError: 351018:0:(ldlm_lib.c:3253:target_bulk_io()) @@@ Reconnect on bulk READ  req@ffff96bb59d15850 x1642422175274112/t0(0) o3->d51aba7f-3e9b-b409-76a8-ec3ceb1b1ca5@10.47.21.35@o2ib1:94/0 lens 488/440 e 0 to 0 dl 1566338194 ref 1 fl Interpret:/0/0 rc 0/0
Aug 20 22:55:53 dac-e-18 kernel: Lustre: fs1-OST00d6: Bulk IO read error with d51aba7f-3e9b-b409-76a8-ec3ceb1b1ca5 (at 10.47.21.35@o2ib1), client will retry: rc -110
Aug 20 22:55:53 dac-e-18 kernel: Lustre: 253695:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1566338151/real 1566338153]  req@ffff96d35acbf980 x1642422107578992/t0(0) o104->fs1-OST00d4@10.47.21.34@o2ib1:15/16 lens 296/224 e 0 to 1 dl 1566338162 ref 1 fl Rpc:eX/0/ffffffff rc 0/-1
Aug 20 22:55:54 dac-e-18 kernel: LustreError: 157124:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff96bb0069e600
Aug 20 22:55:54 dac-e-18 kernel: LustreError: 157180:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff96bb0069e600
Aug 20 22:55:54 dac-e-18 kernel: LustreError: 249420:0:(ldlm_lib.c:3253:target_bulk_io()) @@@ Reconnect on bulk READ  req@ffff96d42308d050 x1642422175275152/t0(0) o3->efd44c75-059d-5b3e-f4e6-657896b473fc@10.47.21.34@o2ib1:82/0 lens 488/440 e 1 to 0 dl 1566338182 ref 1 fl Interpret:/0/0 rc 0/0
Aug 20 22:55:54 dac-e-18 kernel: LustreError: 157181:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff96d319bdf400
Aug 20 22:55:54 dac-e-18 kernel: LustreError: 157128:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff96d319bdf400
Aug 20 22:55:54 dac-e-18 kernel: LustreError: 157127:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff96d319bdf400
Aug 20 22:55:54 dac-e-18 kernel: LustreError: 157126:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff96d319bdf400
Aug 20 22:55:55 dac-e-18 kernel: Lustre: fs1-OST00cc: Client efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1) reconnecting
Aug 20 22:55:55 dac-e-18 kernel: Lustre: Skipped 3 previous similar messages
Aug 20 22:55:56 dac-e-18 kernel: LustreError: 157129:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff96d319bdf400
Aug 20 22:55:56 dac-e-18 kernel: LustreError: 157126:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff96d319bdf400
Aug 20 22:55:56 dac-e-18 kernel: LustreError: 157181:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff96d319bdf400
Aug 20 22:55:56 dac-e-18 kernel: LustreError: 157127:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff96d319bdf400
Aug 20 22:55:56 dac-e-18 kernel: Lustre: 257975:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1566338151/real 1566338156]  req@ffff96d2d32bd580 x1642422107579024/t0(0) o104->fs1-OST00d4@10.47.21.34@o2ib1:15/16 lens 296/224 e 0 to 1 dl 1566338162 ref 1 fl Rpc:eXS/0/ffffffff rc -11/-1
Aug 20 22:55:56 dac-e-18 kernel: LustreError: 157128:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff96d319bdf400
Aug 20 22:55:56 dac-e-18 kernel: LustreError: 157126:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff96d319bdf400
Aug 20 22:55:56 dac-e-18 kernel: LustreError: 157127:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff96d319bdf400
Aug 20 22:55:56 dac-e-18 kernel: LustreError: 157181:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff96d319bdf400
Aug 20 22:55:56 dac-e-18 kernel: LustreError: 157129:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff96d319bdf400
Aug 20 22:55:56 dac-e-18 kernel: LustreError: 157128:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff96d319bdf400
Aug 20 22:55:56 dac-e-18 kernel: LustreError: 157126:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff96d319bdf400
Aug 20 22:55:56 dac-e-18 kernel: LustreError: 157129:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff96d319bdf400
Aug 20 22:55:56 dac-e-18 kernel: Lustre: fs1-OST00cd: Bulk IO read error with efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1), client will retry: rc -110
Aug 20 22:55:57 dac-e-18 kernel: Lustre: fs1-OST00d1: Client b70bd8d1-bef1-cafa-1d50-bfa93684ff22 (at 10.47.21.37@o2ib1) reconnecting
Aug 20 22:55:57 dac-e-18 kernel: Lustre: Skipped 3 previous similar messages
Aug 20 22:55:57 dac-e-18 kernel: LustreError: 351010:0:(ldlm_lib.c:3253:target_bulk_io()) @@@ Reconnect on bulk READ  req@ffff96bc2d75a850 x1642422175276096/t0(0) o3->b70bd8d1-bef1-cafa-1d50-bfa93684ff22@10.47.21.37@o2ib1:94/0 lens 488/440 e 0 to 0 dl 1566338194 ref 1 fl Interpret:/0/0 rc 0/0
Aug 20 22:55:57 dac-e-18 kernel: Lustre: fs1-OST00d7: Bulk IO read error with b70bd8d1-bef1-cafa-1d50-bfa93684ff22 (at 10.47.21.37@o2ib1), client will retry: rc -110
Aug 20 22:55:59 dac-e-18 kernel: Lustre: fs1-OST00d5: Client fs1-MDT000a-mdtlov_UUID (at 10.47.18.11@o2ib1) reconnecting
Aug 20 22:55:59 dac-e-18 kernel: Lustre: Skipped 2 previous similar messages
Aug 20 22:56:02 dac-e-18 kernel: Lustre: 247080:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566338154/real 1566338154]  req@ffff96d402e2ad00 x1642422107586832/t0(0) o400->fs1-MDT000a-lwp-OST00d3@10.47.18.11@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566338161 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 22:56:02 dac-e-18 kernel: Lustre: 247080:0:(client.c:2134:ptlrpc_expire_one_request()) Skipped 1 previous similar message
Aug 20 22:56:02 dac-e-18 kernel: Lustre: fs1-MDT000a-lwp-OST00d3: Connection to fs1-MDT000a (at 10.47.18.11@o2ib1) was lost; in progress operations using this service will wait for recovery to complete
Aug 20 22:56:16 dac-e-18 kernel: Lustre: fs1-OST00ce: Client efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1) reconnecting
Aug 20 22:56:49 dac-e-18 kernel: LustreError: 249417:0:(ldlm_lib.c:3268:target_bulk_io()) @@@ truncated bulk READ 15728640(16777216)  req@ffff96bb59d1e050 x1642422175276112/t0(0) o3->b70bd8d1-bef1-cafa-1d50-bfa93684ff22@10.47.21.37@o2ib1:119/0 lens 488/440 e 1 to 0 dl 1566338219 ref 1 fl Interpret:/0/0 rc 0/0
Aug 20 22:56:49 dac-e-18 kernel: Lustre: fs1-OST00d0: Bulk IO read error with b70bd8d1-bef1-cafa-1d50-bfa93684ff22 (at 10.47.21.37@o2ib1), client will retry: rc -110
Aug 20 22:56:49 dac-e-18 kernel: LustreError: 249417:0:(ldlm_lib.c:3268:target_bulk_io()) Skipped 2 previous similar messages
Aug 20 23:00:46 dac-e-18 kernel: Lustre: fs1-OST00cc: Client b70bd8d1-bef1-cafa-1d50-bfa93684ff22 (at 10.47.21.37@o2ib1) reconnecting
Aug 20 23:00:46 dac-e-18 kernel: Lustre: Skipped 2 previous similar messages
Aug 20 23:02:54 dac-e-18 kernel: Lustre: fs1-OST00d0: Client b70bd8d1-bef1-cafa-1d50-bfa93684ff22 (at 10.47.21.37@o2ib1) reconnecting
Aug 20 23:02:54 dac-e-18 kernel: Lustre: Skipped 1 previous similar message
Aug 20 23:19:10 dac-e-18 kernel: Lustre: Failing over fs1-MDT0011
Aug 20 23:19:10 dac-e-18 kernel: LustreError: 11-0: fs1-MDT0001-osp-MDT0011: operation mds_disconnect to node 10.47.18.2@o2ib1 failed: rc = -107
Aug 20 23:19:10 dac-e-18 kernel: Lustre: fs1-MDT0011: Not available for connect from 10.47.18.23@o2ib1 (stopping)
Aug 20 23:19:10 dac-e-18 kernel: Lustre: server umount fs1-MDT0011 complete
Aug 20 23:19:10 dac-e-18 kernel: LustreError: 137-5: fs1-MDT0011_UUID: not available for connect from 10.47.18.8@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server.
Aug 20 23:19:11 dac-e-18 kernel: LustreError: 137-5: fs1-MDT0011_UUID: not available for connect from 10.47.18.5@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server.
Aug 20 23:19:11 dac-e-18 kernel: LustreError: Skipped 1 previous similar message
Aug 20 23:19:24 dac-e-18 kernel: Lustre: 363257:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339558/real 1566339558]  req@ffff96d2fefc2d00 x1642422109238144/t0(0) o39->fs1-MDT0000-lwp-OST00cc@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339564 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:25 dac-e-18 kernel: Lustre: 363283:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339559/real 1566339559]  req@ffff96d2fefc6780 x1642422109238160/t0(0) o39->fs1-MDT0000-lwp-OST00d5@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339565 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:26 dac-e-18 kernel: Lustre: 363694:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339560/real 1566339560]  req@ffff96bc8ffbf980 x1642422109238176/t0(0) o39->fs1-MDT0000-lwp-OST00d7@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339566 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:28 dac-e-18 kernel: Lustre: 365219:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339562/real 1566339562]  req@ffff96d4130c8900 x1642422109238208/t0(0) o39->fs1-MDT0000-lwp-OST00d0@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339568 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:28 dac-e-18 kernel: Lustre: 365219:0:(client.c:2134:ptlrpc_expire_one_request()) Skipped 1 previous similar message
Aug 20 23:19:31 dac-e-18 kernel: Lustre: fs1-MDT0004-lwp-OST00d1: Connection to fs1-MDT0004 (at 10.47.18.5@o2ib1) was lost; in progress operations using this service will wait for recovery to complete
Aug 20 23:19:32 dac-e-18 kernel: Lustre: Failing over fs1-OST00d7
Aug 20 23:19:32 dac-e-18 kernel: Lustre: server umount fs1-OST00d7 complete
Aug 20 23:19:33 dac-e-18 kernel: Lustre: 368498:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339567/real 1566339567]  req@ffff96bc8ffb9680 x1642422109242816/t0(0) o39->fs1-MDT0000-lwp-OST00d6@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339573 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:33 dac-e-18 kernel: Lustre: 368498:0:(client.c:2134:ptlrpc_expire_one_request()) Skipped 268 previous similar messages
Aug 20 23:19:33 dac-e-18 kernel: Lustre: Failing over fs1-OST00ce
Aug 20 23:19:33 dac-e-18 kernel: Lustre: Skipped 2 previous similar messages
Aug 20 23:19:33 dac-e-18 kernel: Lustre: server umount fs1-OST00ce complete
Aug 20 23:19:33 dac-e-18 kernel: Lustre: Skipped 2 previous similar messages
Aug 20 23:19:36 dac-e-18 kernel: Lustre: Failing over fs1-OST00cc
Aug 20 23:19:36 dac-e-18 kernel: Lustre: Skipped 2 previous similar messages
Aug 20 23:19:36 dac-e-18 kernel: Lustre: server umount fs1-OST00cc complete
Aug 20 23:19:36 dac-e-18 kernel: Lustre: Skipped 2 previous similar messages

HOSTS -------------------------------------------------------------------------
dac-e-2
-------------------------------------------------------------------------------
-- Logs begin at Wed 2019-05-08 14:07:46 BST, end at Wed 2019-08-21 11:34:55 BST. --
Aug 20 22:50:13 dac-e-2 kernel: Lustre: fs1-OST000c: Connection restored to 60f2ae35-b178-1c80-ba50-e7c416f9922a (at 10.47.20.68@o2ib1)
Aug 20 22:50:13 dac-e-2 kernel: Lustre: Skipped 10 previous similar messages
Aug 20 22:55:51 dac-e-2 kernel: Lustre: fs1-OST0012: Connection restored to a9c07af9-d96f-6905-bdea-228af9a88046 (at 10.47.21.32@o2ib1)
Aug 20 22:55:51 dac-e-2 kernel: Lustre: Skipped 78 previous similar messages
Aug 20 22:55:55 dac-e-2 kernel: Lustre: fs1-OST0013: Client efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1) reconnecting
Aug 20 22:55:58 dac-e-2 kernel: Lustre: fs1-OST0015: Client 3f28bd14-132c-e629-0b42-8ea0fdd5d2a4 (at 10.47.21.31@o2ib1) reconnecting
Aug 20 22:55:58 dac-e-2 kernel: Lustre: Skipped 2 previous similar messages
Aug 20 22:56:01 dac-e-2 kernel: Lustre: fs1-OST0016: Client efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1) reconnecting
Aug 20 22:56:07 dac-e-2 kernel: Lustre: 307721:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566338156/real 1566338156]  req@ffff8e275bd09b00 x1642422107587728/t0(0) o104->fs1-OST0014@10.47.21.34@o2ib1:15/16 lens 296/224 e 0 to 1 dl 1566338167 ref 1 fl Rpc:X/0/ffffffff rc 0/-1
Aug 20 22:56:16 dac-e-2 kernel: Lustre: fs1-OST0017: Client efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1) reconnecting
Aug 20 22:56:23 dac-e-2 kernel: Lustre: fs1-OST0016: Client d51aba7f-3e9b-b409-76a8-ec3ceb1b1ca5 (at 10.47.21.35@o2ib1) reconnecting
Aug 20 22:56:23 dac-e-2 kernel: Lustre: Skipped 2 previous similar messages
Aug 20 23:00:46 dac-e-2 kernel: Lustre: fs1-OST0016: Connection restored to b70bd8d1-bef1-cafa-1d50-bfa93684ff22 (at 10.47.21.37@o2ib1)
Aug 20 23:00:46 dac-e-2 kernel: Lustre: Skipped 40 previous similar messages
Aug 20 23:01:42 dac-e-2 kernel: Lustre: fs1-OST0012: Client d51aba7f-3e9b-b409-76a8-ec3ceb1b1ca5 (at 10.47.21.35@o2ib1) reconnecting
Aug 20 23:02:55 dac-e-2 kernel: Lustre: fs1-OST0010: Connection restored to b70bd8d1-bef1-cafa-1d50-bfa93684ff22 (at 10.47.21.37@o2ib1)
Aug 20 23:02:55 dac-e-2 kernel: Lustre: Skipped 4 previous similar messages
Aug 20 23:09:09 dac-e-2 kernel: Lustre: fs1-OST000c: Connection restored to 60f2ae35-b178-1c80-ba50-e7c416f9922a (at 10.47.20.68@o2ib1)
Aug 20 23:19:10 dac-e-2 kernel: Lustre: Failing over fs1-MDT0001
Aug 20 23:19:10 dac-e-2 kernel: LustreError: 11-0: fs1-MDT0009-osp-MDT0001: operation mds_disconnect to node 10.47.18.10@o2ib1 failed: rc = -107
Aug 20 23:19:10 dac-e-2 kernel: Lustre: server umount fs1-MDT0001 complete
Aug 20 23:19:11 dac-e-2 kernel: LustreError: 137-5: fs1-MDT0001_UUID: not available for connect from 10.47.18.9@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server.
Aug 20 23:19:12 dac-e-2 kernel: LustreError: 137-5: fs1-MDT0001_UUID: not available for connect from 10.47.18.5@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server.
Aug 20 23:19:24 dac-e-2 kernel: Lustre: 369756:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339558/real 1566339558]  req@ffff8e0ee3321f80 x1642422109237472/t0(0) o39->fs1-MDT0000-lwp-OST000c@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339564 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:25 dac-e-2 kernel: Lustre: 370507:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339559/real 1566339559]  req@ffff8e275c588900 x1642422109237488/t0(0) o39->fs1-MDT0000-lwp-OST0015@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339565 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:26 dac-e-2 kernel: Lustre: 370559:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339560/real 1566339560]  req@ffff8e27447b8900 x1642422109242096/t0(0) o39->fs1-MDT0000-lwp-OST0017@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339566 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:26 dac-e-2 kernel: Lustre: fs1-MDT0001-lwp-OST0011: Connection to fs1-MDT0001 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete
Aug 20 23:19:26 dac-e-2 kernel: Lustre: Skipped 2 previous similar messages
Aug 20 23:19:27 dac-e-2 kernel: Lustre: Failing over fs1-OST0016
Aug 20 23:19:27 dac-e-2 kernel: Lustre: server umount fs1-OST0016 complete
Aug 20 23:19:28 dac-e-2 kernel: Lustre: Failing over fs1-OST000d
Aug 20 23:19:28 dac-e-2 kernel: Lustre: Skipped 1 previous similar message
Aug 20 23:19:28 dac-e-2 kernel: Lustre: server umount fs1-OST000d complete
Aug 20 23:19:28 dac-e-2 kernel: Lustre: Skipped 1 previous similar message
Aug 20 23:19:28 dac-e-2 kernel: Lustre: 370613:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339562/real 1566339562]  req@ffff8e27447ba400 x1642422109242128/t0(0) o39->fs1-MDT0000-lwp-OST0010@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339568 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:28 dac-e-2 kernel: Lustre: 370613:0:(client.c:2134:ptlrpc_expire_one_request()) Skipped 277 previous similar messages
Aug 20 23:19:31 dac-e-2 kernel: Lustre: Failing over fs1-OST0015
Aug 20 23:19:31 dac-e-2 kernel: Lustre: Skipped 4 previous similar messages
Aug 20 23:19:31 dac-e-2 kernel: Lustre: server umount fs1-OST0015 complete
Aug 20 23:19:31 dac-e-2 kernel: Lustre: Skipped 4 previous similar messages

HOSTS -------------------------------------------------------------------------
dac-e-21
-------------------------------------------------------------------------------
-- Logs begin at Wed 2019-05-08 14:07:47 BST, end at Wed 2019-08-21 11:34:55 BST. --
Aug 20 22:50:13 dac-e-21 kernel: Lustre: fs1-OST00f0: Connection restored to 60f2ae35-b178-1c80-ba50-e7c416f9922a (at 10.47.20.68@o2ib1)
Aug 20 22:50:13 dac-e-21 kernel: Lustre: Skipped 9 previous similar messages
Aug 20 22:55:51 dac-e-21 kernel: Lustre: fs1-OST00f2: Connection restored to ba6d10d8-29ae-af16-bb29-fe0009074454 (at 10.47.21.33@o2ib1)
Aug 20 22:55:51 dac-e-21 kernel: Lustre: Skipped 78 previous similar messages
Aug 20 22:55:53 dac-e-21 kernel: LustreError: 33397:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff957b26c57a00
Aug 20 22:55:53 dac-e-21 kernel: LustreError: 33398:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff957b26c57a00
Aug 20 22:55:53 dac-e-21 kernel: LustreError: 33395:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff957b26c57a00
Aug 20 22:55:53 dac-e-21 kernel: LustreError: 33406:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff957b26c57a00
Aug 20 22:55:53 dac-e-21 kernel: LustreError: 33402:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9592e323ca00
Aug 20 22:55:53 dac-e-21 kernel: LustreError: 33399:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9592e323ca00
Aug 20 22:55:54 dac-e-21 kernel: Lustre: fs1-OST00f6: Client 3f28bd14-132c-e629-0b42-8ea0fdd5d2a4 (at 10.47.21.31@o2ib1) reconnecting
Aug 20 22:55:55 dac-e-21 kernel: Lustre: fs1-OST00f5: Client d51aba7f-3e9b-b409-76a8-ec3ceb1b1ca5 (at 10.47.21.35@o2ib1) reconnecting
Aug 20 22:55:55 dac-e-21 kernel: LustreError: 43307:0:(ldlm_lib.c:3253:target_bulk_io()) @@@ Reconnect on bulk READ  req@ffff95944af6c850 x1642422175276368/t0(0) o3->3f28bd14-132c-e629-0b42-8ea0fdd5d2a4@10.47.21.31@o2ib1:62/0 lens 488/440 e 0 to 0 dl 1566338162 ref 1 fl Interpret:/0/0 rc 0/0
Aug 20 22:55:55 dac-e-21 kernel: LustreError: 33398:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff957b93f65400
Aug 20 22:55:56 dac-e-21 kernel: Lustre: fs1-OST00f6: Bulk IO read error with 3f28bd14-132c-e629-0b42-8ea0fdd5d2a4 (at 10.47.21.31@o2ib1), client will retry: rc -110
Aug 20 22:55:56 dac-e-21 kernel: LustreError: 120762:0:(ldlm_lib.c:3253:target_bulk_io()) @@@ Reconnect on bulk READ  req@ffff957bf0e7b050 x1642422175273984/t0(0) o3->1c9f1f8c-ae8f-76b3-441d-4d74e9e3a7df@10.47.21.38@o2ib1:82/0 lens 488/440 e 1 to 0 dl 1566338182 ref 1 fl Interpret:/0/0 rc 0/0
Aug 20 22:56:16 dac-e-21 kernel: Lustre: fs1-OST00f0: Client efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1) reconnecting
Aug 20 22:56:16 dac-e-21 kernel: Lustre: Skipped 6 previous similar messages
Aug 20 23:00:49 dac-e-21 kernel: Lustre: fs1-OST00f0: Client 1c9f1f8c-ae8f-76b3-441d-4d74e9e3a7df (at 10.47.21.38@o2ib1) reconnecting
Aug 20 23:00:49 dac-e-21 kernel: Lustre: Skipped 2 previous similar messages
Aug 20 23:00:49 dac-e-21 kernel: Lustre: fs1-OST00f0: Connection restored to 1c9f1f8c-ae8f-76b3-441d-4d74e9e3a7df (at 10.47.21.38@o2ib1)
Aug 20 23:00:49 dac-e-21 kernel: Lustre: Skipped 43 previous similar messages
Aug 20 23:03:25 dac-e-21 kernel: Lustre: fs1-OST00f8: Client 3f28bd14-132c-e629-0b42-8ea0fdd5d2a4 (at 10.47.21.31@o2ib1) reconnecting
Aug 20 23:03:25 dac-e-21 kernel: Lustre: fs1-OST00f8: Connection restored to 3f28bd14-132c-e629-0b42-8ea0fdd5d2a4 (at 10.47.21.31@o2ib1)
Aug 20 23:03:25 dac-e-21 kernel: Lustre: Skipped 2 previous similar messages
Aug 20 23:09:09 dac-e-21 kernel: Lustre: fs1-OST00f0: Connection restored to 60f2ae35-b178-1c80-ba50-e7c416f9922a (at 10.47.20.68@o2ib1)
Aug 20 23:19:10 dac-e-21 kernel: LustreError: 11-0: fs1-MDT0000-lwp-MDT0014: operation mds_disconnect to node 10.47.18.1@o2ib1 failed: rc = -107
Aug 20 23:19:10 dac-e-21 kernel: Lustre: Failing over fs1-MDT0014
Aug 20 23:19:10 dac-e-21 kernel: Lustre: server umount fs1-MDT0014 complete
Aug 20 23:19:12 dac-e-21 kernel: LustreError: 137-5: fs1-MDT0014_UUID: not available for connect from 10.47.18.5@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server.
Aug 20 23:19:13 dac-e-21 kernel: LustreError: 137-5: fs1-MDT0014_UUID: not available for connect from 10.47.18.9@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server.
Aug 20 23:19:24 dac-e-21 kernel: Lustre: 353036:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339558/real 1566339558]  req@ffff9592fc8ec800 x1642422109238080/t0(0) o39->fs1-MDT0000-lwp-OST00f0@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339564 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:25 dac-e-21 kernel: Lustre: 353062:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339559/real 1566339559]  req@ffff957b61cec380 x1642422109238096/t0(0) o39->fs1-MDT0000-lwp-OST00f9@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339565 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:27 dac-e-21 kernel: Lustre: 353299:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339561/real 1566339561]  req@ffff9592fc8ebf00 x1642422109238112/t0(0) o39->fs1-MDT0000-lwp-OST00fb@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339567 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:28 dac-e-21 kernel: Lustre: fs1-MDT0012-lwp-OST00f3: Connection to fs1-MDT0012 (at 10.47.18.19@o2ib1) was lost; in progress operations using this service will wait for recovery to complete
Aug 20 23:19:29 dac-e-21 kernel: Lustre: 355264:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339563/real 1566339563]  req@ffff95933f1ba400 x1642422109242720/t0(0) o39->fs1-MDT0000-lwp-OST00f4@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339569 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:29 dac-e-21 kernel: Lustre: 355264:0:(client.c:2134:ptlrpc_expire_one_request()) Skipped 274 previous similar messages
Aug 20 23:19:29 dac-e-21 kernel: Lustre: Failing over fs1-OST00f4
Aug 20 23:19:29 dac-e-21 kernel: Lustre: server umount fs1-OST00f4 complete
Aug 20 23:19:30 dac-e-21 kernel: Lustre: Failing over fs1-OST00f6
Aug 20 23:19:30 dac-e-21 kernel: Lustre: Skipped 2 previous similar messages
Aug 20 23:19:30 dac-e-21 kernel: Lustre: server umount fs1-OST00f6 complete
Aug 20 23:19:30 dac-e-21 kernel: Lustre: Skipped 2 previous similar messages
Aug 20 23:19:33 dac-e-21 kernel: Lustre: Failing over fs1-OST00fb
Aug 20 23:19:33 dac-e-21 kernel: Lustre: Skipped 3 previous similar messages
Aug 20 23:19:33 dac-e-21 kernel: Lustre: server umount fs1-OST00fb complete
Aug 20 23:19:33 dac-e-21 kernel: Lustre: Skipped 3 previous similar messages
Aug 20 23:19:34 dac-e-21 kernel: Lustre: 354097:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339568/real 1566339568]  req@ffff9592fc8e8900 x1642422109242832/t0(0) o39->fs1-MDT0009-lwp-OST00f2@10.47.18.10@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339574 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:34 dac-e-21 kernel: Lustre: 354097:0:(client.c:2134:ptlrpc_expire_one_request()) Skipped 6 previous similar messages

HOSTS -------------------------------------------------------------------------
dac-e-19
-------------------------------------------------------------------------------
-- Logs begin at Wed 2019-05-08 14:07:47 BST, end at Wed 2019-08-21 11:34:55 BST. --
Aug 20 22:50:13 dac-e-19 kernel: Lustre: fs1-OST00d8: Connection restored to 60f2ae35-b178-1c80-ba50-e7c416f9922a (at 10.47.20.68@o2ib1)
Aug 20 22:50:13 dac-e-19 kernel: Lustre: Skipped 10 previous similar messages
Aug 20 22:55:51 dac-e-19 kernel: Lustre: fs1-OST00e0: Connection restored to 1c9f1f8c-ae8f-76b3-441d-4d74e9e3a7df (at 10.47.21.38@o2ib1)
Aug 20 22:55:51 dac-e-19 kernel: Lustre: Skipped 73 previous similar messages
Aug 20 22:55:53 dac-e-19 kernel: LustreError: 113392:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9c93e36b7200
Aug 20 22:55:53 dac-e-19 kernel: LustreError: 113381:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9c93e36b7200
Aug 20 22:55:53 dac-e-19 kernel: LustreError: 113382:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9c93e36b7200
Aug 20 22:55:53 dac-e-19 kernel: LustreError: 113384:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9c93e36b7200
Aug 20 22:55:53 dac-e-19 kernel: Lustre: fs1-OST00dd: Client 3f28bd14-132c-e629-0b42-8ea0fdd5d2a4 (at 10.47.21.31@o2ib1) reconnecting
Aug 20 22:55:53 dac-e-19 kernel: LustreError: 266193:0:(ldlm_lib.c:3253:target_bulk_io()) @@@ Reconnect on bulk READ  req@ffff9cc109714050 x1642422175276192/t0(0) o3->3f28bd14-132c-e629-0b42-8ea0fdd5d2a4@10.47.21.31@o2ib1:94/0 lens 488/440 e 0 to 0 dl 1566338194 ref 1 fl Interpret:/0/0 rc 0/0
Aug 20 22:55:53 dac-e-19 kernel: LustreError: 113386:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9caad8aa4e00
Aug 20 22:55:55 dac-e-19 kernel: Lustre: fs1-OST00db: Client d51aba7f-3e9b-b409-76a8-ec3ceb1b1ca5 (at 10.47.21.35@o2ib1) reconnecting
Aug 20 22:55:55 dac-e-19 kernel: Lustre: Skipped 1 previous similar message
Aug 20 22:55:55 dac-e-19 kernel: LustreError: 113387:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9caae1487a00
Aug 20 22:55:55 dac-e-19 kernel: LustreError: 113385:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9caae1487a00
Aug 20 22:55:55 dac-e-19 kernel: Lustre: fs1-OST00dd: Bulk IO read error with 3f28bd14-132c-e629-0b42-8ea0fdd5d2a4 (at 10.47.21.31@o2ib1), client will retry: rc -110
Aug 20 22:55:56 dac-e-19 kernel: LustreError: 113385:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9cc0ea6bea00
Aug 20 22:55:56 dac-e-19 kernel: LustreError: 113393:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9cc0ea6bea00
Aug 20 22:55:56 dac-e-19 kernel: LustreError: 113388:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9cc0ea6bea00
Aug 20 22:55:56 dac-e-19 kernel: LustreError: 113387:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9cc0ea6bea00
Aug 20 22:55:56 dac-e-19 kernel: Lustre: fs1-OST00d9: Client efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1) reconnecting
Aug 20 22:55:56 dac-e-19 kernel: Lustre: Skipped 5 previous similar messages
Aug 20 22:55:56 dac-e-19 kernel: Lustre: 199999:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1566338156/real 1566338156]  req@ffff9c933dbf3180 x1642422107583440/t0(0) o105->fs1-OST00de@10.47.21.36@o2ib1:15/16 lens 360/224 e 0 to 1 dl 1566338163 ref 1 fl Rpc:eX/0/ffffffff rc 0/-1
Aug 20 22:55:56 dac-e-19 kernel: LustreError: 113393:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9caad8aa4e00
Aug 20 22:55:56 dac-e-19 kernel: LustreError: 113385:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9caadbe83a00
Aug 20 22:55:56 dac-e-19 kernel: LustreError: 113386:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9caadbe83a00
Aug 20 22:55:56 dac-e-19 kernel: LustreError: 113388:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9caadbe83a00
Aug 20 22:55:56 dac-e-19 kernel: LustreError: 113387:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9caadbe83a00
Aug 20 22:55:56 dac-e-19 kernel: LustreError: 113393:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9caadbe83a00
Aug 20 22:55:56 dac-e-19 kernel: LustreError: 113387:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff9caadbe83a00
Aug 20 22:55:56 dac-e-19 kernel: LustreError: 269635:0:(ldlm_lib.c:3253:target_bulk_io()) @@@ Reconnect on bulk READ  req@ffff9cc0f6540850 x1642422175273776/t0(0) o3->4f2faf4f-1754-0923-7bb5-26c935576df5@10.47.21.36@o2ib1:83/0 lens 488/440 e 1 to 0 dl 1566338183 ref 1 fl Interpret:/0/0 rc 0/0
Aug 20 22:55:56 dac-e-19 kernel: Lustre: fs1-OST00d9: Bulk IO read error with 4f2faf4f-1754-0923-7bb5-26c935576df5 (at 10.47.21.36@o2ib1), client will retry: rc -110
Aug 20 22:56:49 dac-e-19 kernel: LustreError: 266193:0:(ldlm_lib.c:3259:target_bulk_io()) @@@ network error on bulk READ  req@ffff9cc10970a050 x1642422175283520/t0(0) o3->4f2faf4f-1754-0923-7bb5-26c935576df5@10.47.21.36@o2ib1:124/0 lens 488/440 e 1 to 0 dl 1566338224 ref 1 fl Interpret:/0/0 rc 0/0
Aug 20 22:56:49 dac-e-19 kernel: Lustre: fs1-OST00dd: Bulk IO read error with 4f2faf4f-1754-0923-7bb5-26c935576df5 (at 10.47.21.36@o2ib1), client will retry: rc -110
Aug 20 22:56:49 dac-e-19 kernel: Lustre: Skipped 2 previous similar messages
Aug 20 23:01:11 dac-e-19 kernel: Lustre: fs1-OST00dd: Client 4f2faf4f-1754-0923-7bb5-26c935576df5 (at 10.47.21.36@o2ib1) reconnecting
Aug 20 23:01:11 dac-e-19 kernel: Lustre: fs1-OST00dd: Connection restored to 4f2faf4f-1754-0923-7bb5-26c935576df5 (at 10.47.21.36@o2ib1)
Aug 20 23:01:11 dac-e-19 kernel: Lustre: Skipped 42 previous similar messages
Aug 20 23:01:25 dac-e-19 kernel: Lustre: fs1-OST00df: Client 4f2faf4f-1754-0923-7bb5-26c935576df5 (at 10.47.21.36@o2ib1) reconnecting
Aug 20 23:03:25 dac-e-19 kernel: Lustre: fs1-OST00e1: Client 3f28bd14-132c-e629-0b42-8ea0fdd5d2a4 (at 10.47.21.31@o2ib1) reconnecting
Aug 20 23:03:25 dac-e-19 kernel: Lustre: fs1-OST00e1: Connection restored to 3f28bd14-132c-e629-0b42-8ea0fdd5d2a4 (at 10.47.21.31@o2ib1)
Aug 20 23:03:25 dac-e-19 kernel: Lustre: Skipped 1 previous similar message
Aug 20 23:09:09 dac-e-19 kernel: Lustre: fs1-OST00dd: Connection restored to 60f2ae35-b178-1c80-ba50-e7c416f9922a (at 10.47.20.68@o2ib1)
Aug 20 23:09:09 dac-e-19 kernel: Lustre: Skipped 3 previous similar messages
Aug 20 23:19:10 dac-e-19 kernel: Lustre: Failing over fs1-MDT0012
Aug 20 23:19:10 dac-e-19 kernel: LustreError: 11-0: fs1-MDT0001-osp-MDT0012: operation mds_disconnect to node 10.47.18.2@o2ib1 failed: rc = -107
Aug 20 23:19:10 dac-e-19 kernel: LustreError: 251511:0:(osp_dev.c:485:osp_disconnect()) fs1-MDT000d-osp-MDT0012: can't disconnect: rc = -19
Aug 20 23:19:10 dac-e-19 kernel: LustreError: 251511:0:(lod_dev.c:267:lod_sub_process_config()) fs1-MDT0012-mdtlov: error cleaning up LOD index 13: cmd 0xcf031 : rc = -19
Aug 20 23:19:10 dac-e-19 kernel: Lustre: server umount fs1-MDT0012 complete
Aug 20 23:19:11 dac-e-19 kernel: LustreError: 137-5: fs1-MDT0012_UUID: not available for connect from 10.47.18.5@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server.
Aug 20 23:19:13 dac-e-19 kernel: LustreError: 137-5: fs1-MDT0012_UUID: not available for connect from 10.47.18.9@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server.
Aug 20 23:19:24 dac-e-19 kernel: Lustre: 253333:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339558/real 1566339558]  req@ffff9ca95a61de80 x1642422109237344/t0(0) o39->fs1-MDT0000-lwp-OST00d8@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339564 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:25 dac-e-19 kernel: Lustre: 253770:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339559/real 1566339559]  req@ffff9ca95345bf00 x1642422109237360/t0(0) o39->fs1-MDT0000-lwp-OST00e1@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339565 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:26 dac-e-19 kernel: Lustre: 254473:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339560/real 1566339560]  req@ffff9c944967b180 x1642422109237376/t0(0) o39->fs1-MDT0000-lwp-OST00e3@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339566 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:28 dac-e-19 kernel: Lustre: 255794:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339562/real 1566339562]  req@ffff9caa7c38b180 x1642422109237408/t0(0) o39->fs1-MDT0000-lwp-OST00dc@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339568 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:28 dac-e-19 kernel: Lustre: 255794:0:(client.c:2134:ptlrpc_expire_one_request()) Skipped 1 previous similar message
Aug 20 23:19:30 dac-e-19 kernel: Lustre: fs1-MDT0015-lwp-OST00df: Connection to fs1-MDT0015 (at 10.47.18.22@o2ib1) was lost; in progress operations using this service will wait for recovery to complete
Aug 20 23:19:30 dac-e-19 kernel: Lustre: Skipped 1 previous similar message
Aug 20 23:19:31 dac-e-19 kernel: Lustre: Failing over fs1-OST00e1
Aug 20 23:19:31 dac-e-19 kernel: Lustre: server umount fs1-OST00e1 complete
Aug 20 23:19:32 dac-e-19 kernel: Lustre: Failing over fs1-OST00e3
Aug 20 23:19:32 dac-e-19 kernel: Lustre: Skipped 2 previous similar messages
Aug 20 23:19:32 dac-e-19 kernel: Lustre: server umount fs1-OST00e3 complete
Aug 20 23:19:32 dac-e-19 kernel: Lustre: Skipped 2 previous similar messages
Aug 20 23:19:33 dac-e-19 kernel: Lustre: 257212:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339567/real 1566339567]  req@ffff9cc0c6b22d00 x1642422109242048/t0(0) o39->fs1-MDT0000-lwp-OST00e2@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339573 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:33 dac-e-19 kernel: Lustre: 257212:0:(client.c:2134:ptlrpc_expire_one_request()) Skipped 274 previous similar messages
Aug 20 23:19:34 dac-e-19 kernel: Lustre: Failing over fs1-OST00d9
Aug 20 23:19:34 dac-e-19 kernel: Lustre: Skipped 3 previous similar messages
Aug 20 23:19:34 dac-e-19 kernel: Lustre: server umount fs1-OST00d9 complete
Aug 20 23:19:34 dac-e-19 kernel: Lustre: Skipped 3 previous similar messages

HOSTS -------------------------------------------------------------------------
dac-e-20
-------------------------------------------------------------------------------
-- Logs begin at Wed 2019-05-08 14:07:47 BST, end at Wed 2019-08-21 11:34:55 BST. --
Aug 20 22:50:13 dac-e-20 kernel: Lustre: fs1-OST00e4: Connection restored to 60f2ae35-b178-1c80-ba50-e7c416f9922a (at 10.47.20.68@o2ib1)
Aug 20 22:50:13 dac-e-20 kernel: Lustre: Skipped 10 previous similar messages
Aug 20 22:55:51 dac-e-20 kernel: Lustre: fs1-OST00e5: Connection restored to efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1)
Aug 20 22:55:51 dac-e-20 kernel: Lustre: fs1-OST00e7: Connection restored to 4f2faf4f-1754-0923-7bb5-26c935576df5 (at 10.47.21.36@o2ib1)
Aug 20 22:55:51 dac-e-20 kernel: Lustre: Skipped 78 previous similar messages
Aug 20 22:55:53 dac-e-20 kernel: LustreError: 227000:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff91d8e6b86800
Aug 20 22:55:53 dac-e-20 kernel: LustreError: 227003:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff91d8e6b86800
Aug 20 22:55:53 dac-e-20 kernel: Lustre: fs1-OST00ec: Client 3f28bd14-132c-e629-0b42-8ea0fdd5d2a4 (at 10.47.21.31@o2ib1) reconnecting
Aug 20 22:55:53 dac-e-20 kernel: LustreError: 318175:0:(ldlm_lib.c:3253:target_bulk_io()) @@@ Reconnect on bulk READ  req@ffff91d89b8f5850 x1642422175276752/t0(0) o3->3f28bd14-132c-e629-0b42-8ea0fdd5d2a4@10.47.21.31@o2ib1:82/0 lens 488/440 e 1 to 0 dl 1566338182 ref 1 fl Interpret:/0/0 rc 0/0
Aug 20 22:55:53 dac-e-20 kernel: Lustre: fs1-OST00ec: Bulk IO read error with 3f28bd14-132c-e629-0b42-8ea0fdd5d2a4 (at 10.47.21.31@o2ib1), client will retry: rc -110
Aug 20 22:55:56 dac-e-20 kernel: LustreError: 227001:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff91d8548a8c00
Aug 20 22:55:56 dac-e-20 kernel: LustreError: 227000:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff91d8548a8c00
Aug 20 22:55:56 dac-e-20 kernel: LustreError: 227054:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff91d8548a8c00
Aug 20 22:55:56 dac-e-20 kernel: LustreError: 227002:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff91d8548a8c00
Aug 20 22:55:56 dac-e-20 kernel: LustreError: 227003:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff91d8548a8c00
Aug 20 22:55:56 dac-e-20 kernel: LustreError: 227002:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff91d8548a8c00
Aug 20 22:55:56 dac-e-20 kernel: LustreError: 227054:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff91d8548a8c00
Aug 20 22:55:56 dac-e-20 kernel: LustreError: 227003:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff91d8548a8c00
Aug 20 22:55:56 dac-e-20 kernel: LustreError: 227003:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff91d8548a8c00
Aug 20 22:55:56 dac-e-20 kernel: LustreError: 227054:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff91d8548a8c00
Aug 20 22:55:56 dac-e-20 kernel: LustreError: 227000:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff91d8548a8c00
Aug 20 22:55:56 dac-e-20 kernel: LustreError: 227001:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff91d8548a8c00
Aug 20 22:55:56 dac-e-20 kernel: LustreError: 227002:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff91d8548a8c00
Aug 20 22:55:56 dac-e-20 kernel: LustreError: 227003:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff91d8548a8c00
Aug 20 22:55:56 dac-e-20 kernel: LustreError: 227002:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff91d8548a8c00
Aug 20 22:55:56 dac-e-20 kernel: LustreError: 227003:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff91d8548a8c00
Aug 20 22:55:58 dac-e-20 kernel: LNet: 226995:0:(o2iblnd_cb.c:3381:kiblnd_check_conns()) Timed out tx for 10.47.21.34@o2ib1: 1 seconds
Aug 20 22:55:58 dac-e-20 kernel: Lustre: 254047:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1566338153/real 1566338158]  req@ffff91da113b8000 x1642422107579360/t0(0) o104->fs1-OST00e9@10.47.21.34@o2ib1:15/16 lens 296/224 e 0 to 1 dl 1566338164 ref 1 fl Rpc:eX/0/ffffffff rc 0/-1
Aug 20 22:56:16 dac-e-20 kernel: Lustre: fs1-OST00e4: Client efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1) reconnecting
Aug 20 22:56:16 dac-e-20 kernel: Lustre: Skipped 1 previous similar message
Aug 20 22:56:16 dac-e-20 kernel: LustreError: 342581:0:(ldlm_lib.c:3253:target_bulk_io()) @@@ Reconnect on bulk READ  req@ffff91d82aeee850 x1642422175275280/t0(0) o3->efd44c75-059d-5b3e-f4e6-657896b473fc@10.47.21.34@o2ib1:82/0 lens 488/440 e 1 to 0 dl 1566338182 ref 1 fl Interpret:/0/0 rc 0/0
Aug 20 22:56:16 dac-e-20 kernel: Lustre: fs1-OST00e5: Bulk IO read error with efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1), client will retry: rc -110
Aug 20 22:56:22 dac-e-20 kernel: Lustre: fs1-OST00ed: Client 1c9f1f8c-ae8f-76b3-441d-4d74e9e3a7df (at 10.47.21.38@o2ib1) reconnecting
Aug 20 22:56:22 dac-e-20 kernel: Lustre: Skipped 7 previous similar messages
Aug 20 23:00:46 dac-e-20 kernel: Lustre: fs1-OST00ee: Connection restored to b70bd8d1-bef1-cafa-1d50-bfa93684ff22 (at 10.47.21.37@o2ib1)
Aug 20 23:00:46 dac-e-20 kernel: Lustre: Skipped 40 previous similar messages
Aug 20 23:09:09 dac-e-20 kernel: Lustre: fs1-OST00eb: Connection restored to 60f2ae35-b178-1c80-ba50-e7c416f9922a (at 10.47.20.68@o2ib1)
Aug 20 23:09:09 dac-e-20 kernel: Lustre: Skipped 3 previous similar messages
Aug 20 23:19:10 dac-e-20 kernel: Lustre: Failing over fs1-MDT0013
Aug 20 23:19:10 dac-e-20 kernel: LustreError: 11-0: fs1-MDT0001-osp-MDT0013: operation mds_disconnect to node 10.47.18.2@o2ib1 failed: rc = -107
Aug 20 23:19:10 dac-e-20 kernel: Lustre: server umount fs1-MDT0013 complete
Aug 20 23:19:11 dac-e-20 kernel: LustreError: 137-5: fs1-MDT0013_UUID: not available for connect from 10.47.18.9@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server.
Aug 20 23:19:13 dac-e-20 kernel: LustreError: 137-5: fs1-MDT0013_UUID: not available for connect from 10.47.18.8@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server.
Aug 20 23:19:24 dac-e-20 kernel: Lustre: 345623:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339558/real 1566339558]  req@ffff91d815a0e780 x1642422109236944/t0(0) o39->fs1-MDT0000-lwp-OST00e4@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339564 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:25 dac-e-20 kernel: Lustre: 345648:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339559/real 1566339559]  req@ffff91d815a0f080 x1642422109236960/t0(0) o39->fs1-MDT0000-lwp-OST00ed@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339565 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:26 dac-e-20 kernel: Lustre: 345930:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339560/real 1566339560]  req@ffff91d815a0e300 x1642422109236976/t0(0) o39->fs1-MDT0000-lwp-OST00ef@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339566 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:28 dac-e-20 kernel: Lustre: 347324:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339562/real 1566339562]  req@ffff91c11e31f980 x1642422109237008/t0(0) o39->fs1-MDT0000-lwp-OST00e8@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339568 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:28 dac-e-20 kernel: Lustre: 347324:0:(client.c:2134:ptlrpc_expire_one_request()) Skipped 1 previous similar message
Aug 20 23:19:30 dac-e-20 kernel: Lustre: fs1-MDT0008-lwp-OST00eb: Connection to fs1-MDT0008 (at 10.47.18.9@o2ib1) was lost; in progress operations using this service will wait for recovery to complete
Aug 20 23:19:30 dac-e-20 kernel: Lustre: Skipped 1 previous similar message
Aug 20 23:19:30 dac-e-20 kernel: Lustre: Failing over fs1-OST00e4
Aug 20 23:19:30 dac-e-20 kernel: Lustre: server umount fs1-OST00e4 complete
Aug 20 23:19:31 dac-e-20 kernel: Lustre: Failing over fs1-OST00ed
Aug 20 23:19:31 dac-e-20 kernel: Lustre: Skipped 2 previous similar messages
Aug 20 23:19:31 dac-e-20 kernel: Lustre: server umount fs1-OST00ed complete
Aug 20 23:19:31 dac-e-20 kernel: Lustre: Skipped 2 previous similar messages
Aug 20 23:19:33 dac-e-20 kernel: Lustre: 349929:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339567/real 1566339567]  req@ffff91d815a0a400 x1642422109241648/t0(0) o39->fs1-MDT0000-lwp-OST00ee@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339573 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:33 dac-e-20 kernel: Lustre: 349929:0:(client.c:2134:ptlrpc_expire_one_request()) Skipped 276 previous similar messages
Aug 20 23:19:33 dac-e-20 kernel: Lustre: Failing over fs1-OST00e6
Aug 20 23:19:33 dac-e-20 kernel: Lustre: Skipped 5 previous similar messages
Aug 20 23:19:33 dac-e-20 kernel: Lustre: server umount fs1-OST00e6 complete
Aug 20 23:19:33 dac-e-20 kernel: Lustre: Skipped 5 previous similar messages

HOSTS -------------------------------------------------------------------------
dac-e-23
-------------------------------------------------------------------------------
-- Logs begin at Wed 2019-05-08 14:07:47 BST, end at Wed 2019-08-21 11:34:55 BST. --
Aug 20 22:55:51 dac-e-23 kernel: Lustre: fs1-OST0108: Connection restored to 4f2faf4f-1754-0923-7bb5-26c935576df5 (at 10.47.21.36@o2ib1)
Aug 20 22:55:51 dac-e-23 kernel: Lustre: Skipped 86 previous similar messages
Aug 20 22:55:53 dac-e-23 kernel: LustreError: 64096:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffffa1641bf55a00
Aug 20 22:55:53 dac-e-23 kernel: LustreError: 64097:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffffa17be4aaea00
Aug 20 22:55:53 dac-e-23 kernel: LustreError: 64033:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffffa17be4aaea00
Aug 20 22:55:53 dac-e-23 kernel: LustreError: 64036:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffffa17be4aaea00
Aug 20 22:55:53 dac-e-23 kernel: Lustre: fs1-OST010a: Client d51aba7f-3e9b-b409-76a8-ec3ceb1b1ca5 (at 10.47.21.35@o2ib1) reconnecting
Aug 20 22:55:53 dac-e-23 kernel: Lustre: Skipped 1 previous similar message
Aug 20 22:55:53 dac-e-23 kernel: LustreError: 64034:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffffa17be4aaea00
Aug 20 22:55:53 dac-e-23 kernel: LustreError: 68431:0:(ldlm_lib.c:3253:target_bulk_io()) @@@ Reconnect on bulk READ  req@ffffa162a3b06850 x1642422175273760/t0(0) o3->d51aba7f-3e9b-b409-76a8-ec3ceb1b1ca5@10.47.21.35@o2ib1:62/0 lens 488/440 e 0 to 0 dl 1566338162 ref 1 fl Interpret:/0/0 rc 0/0
Aug 20 22:55:53 dac-e-23 kernel: Lustre: fs1-OST010a: Bulk IO read error with d51aba7f-3e9b-b409-76a8-ec3ceb1b1ca5 (at 10.47.21.35@o2ib1), client will retry: rc -110
Aug 20 22:55:54 dac-e-23 kernel: LustreError: 71411:0:(ldlm_lib.c:3253:target_bulk_io()) @@@ Reconnect on bulk READ  req@ffffa17bba76d050 x1642422175280240/t0(0) o3->3f28bd14-132c-e629-0b42-8ea0fdd5d2a4@10.47.21.31@o2ib1:64/0 lens 488/440 e 0 to 0 dl 1566338164 ref 1 fl Interpret:/0/0 rc 0/0
Aug 20 22:55:54 dac-e-23 kernel: Lustre: fs1-OST010b: Bulk IO read error with 3f28bd14-132c-e629-0b42-8ea0fdd5d2a4 (at 10.47.21.31@o2ib1), client will retry: rc -110
Aug 20 22:55:54 dac-e-23 kernel: Lustre: fs1-OST0108: Client 1c9f1f8c-ae8f-76b3-441d-4d74e9e3a7df (at 10.47.21.38@o2ib1) reconnecting
Aug 20 22:55:54 dac-e-23 kernel: Lustre: Skipped 2 previous similar messages
Aug 20 22:55:55 dac-e-23 kernel: LustreError: 64035:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffffa1623d2bfa00
Aug 20 22:55:55 dac-e-23 kernel: LustreError: 64033:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffffa1623d2bfa00
Aug 20 22:55:55 dac-e-23 kernel: LustreError: 64034:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffffa1623d2bfa00
Aug 20 22:55:55 dac-e-23 kernel: LustreError: 64036:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffffa1623d2bfa00
Aug 20 22:55:55 dac-e-23 kernel: LustreError: 64097:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffffa1623d2bfa00
Aug 20 22:55:55 dac-e-23 kernel: LustreError: 64036:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffffa1623d2bfa00
Aug 20 22:55:55 dac-e-23 kernel: LustreError: 64097:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffffa1623d2bfa00
Aug 20 22:55:55 dac-e-23 kernel: LustreError: 64035:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffffa1623d2bfa00
Aug 20 22:55:55 dac-e-23 kernel: LustreError: 64034:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffffa1623d2bfa00
Aug 20 22:55:55 dac-e-23 kernel: LustreError: 64036:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffffa1623d2bfa00
Aug 20 22:55:55 dac-e-23 kernel: LustreError: 64033:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffffa1623d2bfa00
Aug 20 22:55:55 dac-e-23 kernel: LustreError: 64097:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffffa1623d2bfa00
Aug 20 22:55:55 dac-e-23 kernel: LustreError: 64036:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffffa1623d2bfa00
Aug 20 22:55:56 dac-e-23 kernel: Lustre: fs1-OST0112: Client 1c9f1f8c-ae8f-76b3-441d-4d74e9e3a7df (at 10.47.21.38@o2ib1) reconnecting
Aug 20 22:55:56 dac-e-23 kernel: LustreError: 64032:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffffa1623567e600
Aug 20 22:55:56 dac-e-23 kernel: LustreError: 64031:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffffa1623567e600
Aug 20 22:55:56 dac-e-23 kernel: LustreError: 64030:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffffa1623567e600
Aug 20 22:55:56 dac-e-23 kernel: LustreError: 64096:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffffa1623567e600
Aug 20 22:55:56 dac-e-23 kernel: LustreError: 64029:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffffa1623567e600
Aug 20 22:55:58 dac-e-23 kernel: LNet: 64028:0:(o2iblnd_cb.c:3381:kiblnd_check_conns()) Timed out tx for 10.47.21.34@o2ib1: 1 seconds
Aug 20 22:56:00 dac-e-23 kernel: LNet: 64028:0:(o2iblnd_cb.c:3381:kiblnd_check_conns()) Timed out tx for 10.47.21.34@o2ib1: 0 seconds
Aug 20 22:56:00 dac-e-23 kernel: Lustre: fs1-OST010c: Client efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1) reconnecting
Aug 20 22:56:01 dac-e-23 kernel: LNet: 64028:0:(o2iblnd_cb.c:1495:kiblnd_reconnect_peer()) Abort reconnection of 10.47.21.34@o2ib1: connected
Aug 20 22:56:16 dac-e-23 kernel: Lustre: fs1-OST0108: Client efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1) reconnecting
Aug 20 22:56:16 dac-e-23 kernel: LustreError: 246100:0:(ldlm_lib.c:3253:target_bulk_io()) @@@ Reconnect on bulk READ  req@ffffa17c37a21850 x1642422175273632/t0(0) o3->efd44c75-059d-5b3e-f4e6-657896b473fc@10.47.21.34@o2ib1:94/0 lens 488/440 e 0 to 0 dl 1566338194 ref 1 fl Interpret:/0/0 rc 0/0
Aug 20 22:56:16 dac-e-23 kernel: Lustre: fs1-OST010d: Bulk IO read error with efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1), client will retry: rc -110
Aug 20 22:56:49 dac-e-23 kernel: LustreError: 128508:0:(ldlm_lib.c:3259:target_bulk_io()) @@@ network error on bulk READ  req@ffffa1628a29c050 x1642422175279168/t0(0) o3->1c9f1f8c-ae8f-76b3-441d-4d74e9e3a7df@10.47.21.38@o2ib1:117/0 lens 488/440 e 2 to 0 dl 1566338217 ref 1 fl Interpret:/0/0 rc 0/0
Aug 20 22:56:49 dac-e-23 kernel: Lustre: fs1-OST0110: Bulk IO read error with 1c9f1f8c-ae8f-76b3-441d-4d74e9e3a7df (at 10.47.21.38@o2ib1), client will retry: rc -110
Aug 20 23:01:10 dac-e-23 kernel: Lustre: fs1-OST0110: Client 1c9f1f8c-ae8f-76b3-441d-4d74e9e3a7df (at 10.47.21.38@o2ib1) reconnecting
Aug 20 23:01:10 dac-e-23 kernel: Lustre: Skipped 7 previous similar messages
Aug 20 23:01:37 dac-e-23 kernel: Lustre: fs1-OST010b: Client efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1) reconnecting
Aug 20 23:03:25 dac-e-23 kernel: Lustre: fs1-OST010f: Client 3f28bd14-132c-e629-0b42-8ea0fdd5d2a4 (at 10.47.21.31@o2ib1) reconnecting
Aug 20 23:09:09 dac-e-23 kernel: Lustre: fs1-OST010d: Connection restored to 60f2ae35-b178-1c80-ba50-e7c416f9922a (at 10.47.20.68@o2ib1)
Aug 20 23:09:09 dac-e-23 kernel: Lustre: Skipped 55 previous similar messages
Aug 20 23:19:10 dac-e-23 kernel: LustreError: 11-0: fs1-MDT0011-osp-MDT0016: operation mds_statfs to node 10.47.18.18@o2ib1 failed: rc = -107
Aug 20 23:19:10 dac-e-23 kernel: Lustre: fs1-MDT0011-osp-MDT0016: Connection to fs1-MDT0011 (at 10.47.18.18@o2ib1) was lost; in progress operations using this service will wait for recovery to complete
Aug 20 23:19:10 dac-e-23 kernel: Lustre: Failing over fs1-MDT0016
Aug 20 23:19:10 dac-e-23 kernel: LustreError: 128039:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED   req@ffffa17ad067de80 x1642422109232416/t0(0) o41->fs1-MDT0003-osp-MDT0016@10.47.18.4@o2ib1:24/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1
Aug 20 23:19:10 dac-e-23 kernel: Lustre: server umount fs1-MDT0016 complete
Aug 20 23:19:11 dac-e-23 kernel: LustreError: 137-5: fs1-MDT0016_UUID: not available for connect from 10.47.18.5@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server.
Aug 20 23:19:11 dac-e-23 kernel: LustreError: Skipped 4 previous similar messages
Aug 20 23:19:13 dac-e-23 kernel: LustreError: 137-5: fs1-MDT0016_UUID: not available for connect from 10.47.18.8@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server.
Aug 20 23:19:13 dac-e-23 kernel: LustreError: Skipped 1 previous similar message
Aug 20 23:19:24 dac-e-23 kernel: Lustre: 406680:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339558/real 1566339558]  req@ffffa162b1f48900 x1642422109236528/t0(0) o39->fs1-MDT0000-lwp-OST0108@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339564 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:26 dac-e-23 kernel: Lustre: 407321:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339560/real 1566339560]  req@ffffa162e039b180 x1642422109236544/t0(0) o39->fs1-MDT0000-lwp-OST0111@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339566 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:27 dac-e-23 kernel: Lustre: 407348:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339561/real 1566339561]  req@ffffa17ad1ba4800 x1642422109241152/t0(0) o39->fs1-MDT0000-lwp-OST0113@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339567 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:27 dac-e-23 kernel: Lustre: fs1-MDT0001-lwp-OST010d: Connection to fs1-MDT0001 (at 10.47.18.2@o2ib1) was lost; in progress operations using this service will wait for recovery to complete
Aug 20 23:19:27 dac-e-23 kernel: Lustre: Failing over fs1-OST0112
Aug 20 23:19:28 dac-e-23 kernel: Lustre: server umount fs1-OST0112 complete
Aug 20 23:19:29 dac-e-23 kernel: Lustre: Failing over fs1-OST0109
Aug 20 23:19:29 dac-e-23 kernel: Lustre: Skipped 1 previous similar message
Aug 20 23:19:29 dac-e-23 kernel: Lustre: server umount fs1-OST0109 complete
Aug 20 23:19:29 dac-e-23 kernel: Lustre: Skipped 1 previous similar message
Aug 20 23:19:29 dac-e-23 kernel: Lustre: 407398:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339563/real 1566339563]  req@ffffa17ad1ba6c00 x1642422109241184/t0(0) o39->fs1-MDT0000-lwp-OST010c@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339569 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:29 dac-e-23 kernel: Lustre: 407398:0:(client.c:2134:ptlrpc_expire_one_request()) Skipped 275 previous similar messages
Aug 20 23:19:31 dac-e-23 kernel: Lustre: Failing over fs1-OST010e
Aug 20 23:19:31 dac-e-23 kernel: Lustre: Skipped 3 previous similar messages
Aug 20 23:19:31 dac-e-23 kernel: Lustre: server umount fs1-OST010e complete
Aug 20 23:19:31 dac-e-23 kernel: Lustre: Skipped 3 previous similar messages

HOSTS -------------------------------------------------------------------------
dac-e-9
-------------------------------------------------------------------------------
-- Logs begin at Tue 2019-05-07 20:43:31 BST, end at Wed 2019-08-21 11:34:55 BST. --
Aug 20 22:50:13 dac-e-9 kernel: Lustre: fs1-OST0060: Connection restored to 60f2ae35-b178-1c80-ba50-e7c416f9922a (at 10.47.20.68@o2ib1)
Aug 20 22:50:13 dac-e-9 kernel: Lustre: Skipped 9 previous similar messages
Aug 20 22:55:51 dac-e-9 kernel: Lustre: fs1-OST006b: Connection restored to efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1)
Aug 20 22:55:51 dac-e-9 kernel: Lustre: Skipped 78 previous similar messages
Aug 20 22:55:53 dac-e-9 kernel: Lustre: 111107:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1566338151/real 1566338153]  req@ffff8ce0a2ef8480 x1642422107576992/t0(0) o105->fs1-OST006b@10.47.21.31@o2ib1:15/16 lens 360/224 e 0 to 1 dl 1566338162 ref 1 fl Rpc:eX/0/ffffffff rc 0/-1
Aug 20 22:55:53 dac-e-9 kernel: LustreError: 48510:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8cf6dabf3e00
Aug 20 22:55:54 dac-e-9 kernel: LustreError: 48511:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8d0fa0c2e200
Aug 20 22:55:54 dac-e-9 kernel: LustreError: 48514:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8d0fa0c2e200
Aug 20 22:55:54 dac-e-9 kernel: LustreError: 48512:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8d0fa0c2e200
Aug 20 22:55:54 dac-e-9 kernel: LustreError: 48513:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8d0fa0c2e200
Aug 20 22:55:54 dac-e-9 kernel: LustreError: 48519:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8d0fa0c2e200
Aug 20 22:55:54 dac-e-9 kernel: LustreError: 48511:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8d0fa0c2e200
Aug 20 22:55:54 dac-e-9 kernel: LustreError: 48513:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8d0fa0c2e200
Aug 20 22:55:54 dac-e-9 kernel: LustreError: 48512:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8d0fa0c2e200
Aug 20 22:55:55 dac-e-9 kernel: Lustre: fs1-OST0066: Client efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1) reconnecting
Aug 20 22:55:55 dac-e-9 kernel: LustreError: 48512:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8d0e64b51a00
Aug 20 22:55:55 dac-e-9 kernel: LustreError: 48511:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8d0fa0c2e200
Aug 20 22:55:55 dac-e-9 kernel: LustreError: 48519:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8d0fa0c2e200
Aug 20 22:55:55 dac-e-9 kernel: LustreError: 48513:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8d0e64b51a00
Aug 20 22:55:55 dac-e-9 kernel: LustreError: 48514:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8d0fa0c2e200
Aug 20 22:55:55 dac-e-9 kernel: LustreError: 48512:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8d0fa0c2e200
Aug 20 22:55:55 dac-e-9 kernel: Lustre: 113182:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1566338154/real 1566338155]  req@ffff8d0ff7b79f80 x1642422107581632/t0(0) o104->fs1-OST0064@10.47.21.34@o2ib1:15/16 lens 296/224 e 0 to 1 dl 1566338161 ref 1 fl Rpc:eX/0/ffffffff rc 0/-1
Aug 20 22:55:55 dac-e-9 kernel: LustreError: 48513:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8d0fa0c2e200
Aug 20 22:55:55 dac-e-9 kernel: LustreError: 48514:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8d0fa0c2e200
Aug 20 22:55:57 dac-e-9 kernel: Lustre: fs1-OST0067: Client efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1) reconnecting
Aug 20 22:55:57 dac-e-9 kernel: Lustre: Skipped 1 previous similar message
Aug 20 22:55:57 dac-e-9 kernel: LustreError: 113198:0:(ldlm_lib.c:3253:target_bulk_io()) @@@ Reconnect on bulk READ  req@ffff8d0efbc1d850 x1642422175274896/t0(0) o3->efd44c75-059d-5b3e-f4e6-657896b473fc@10.47.21.34@o2ib1:87/0 lens 488/440 e 1 to 0 dl 1566338187 ref 1 fl Interpret:/0/0 rc 0/0
Aug 20 22:55:57 dac-e-9 kernel: Lustre: fs1-OST0067: Bulk IO read error with efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1), client will retry: rc -110
Aug 20 22:55:58 dac-e-9 kernel: LNet: 48506:0:(o2iblnd_cb.c:3381:kiblnd_check_conns()) Timed out tx for 10.47.21.34@o2ib1: 1 seconds
Aug 20 22:55:58 dac-e-9 kernel: Lustre: 115602:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1566338155/real 1566338158]  req@ffff8d0ff7b79b00 x1642422107577232/t0(0) o104->fs1-OST0062@10.47.21.34@o2ib1:15/16 lens 296/224 e 0 to 1 dl 1566338162 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1
Aug 20 22:55:58 dac-e-9 kernel: Lustre: 115602:0:(client.c:2134:ptlrpc_expire_one_request()) Skipped 1 previous similar message
Aug 20 22:55:58 dac-e-9 kernel: Lustre: fs1-OST006b: Client efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1) reconnecting
Aug 20 22:55:58 dac-e-9 kernel: Lustre: Skipped 1 previous similar message
Aug 20 22:55:58 dac-e-9 kernel: LustreError: 138256:0:(ldlm_lib.c:3253:target_bulk_io()) @@@ Reconnect on bulk READ  req@ffff8d0efbc12050 x1642422175280656/t0(0) o3->efd44c75-059d-5b3e-f4e6-657896b473fc@10.47.21.34@o2ib1:93/0 lens 488/440 e 1 to 0 dl 1566338193 ref 1 fl Interpret:/0/0 rc 0/0
Aug 20 22:55:58 dac-e-9 kernel: Lustre: fs1-OST006b: Bulk IO read error with efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1), client will retry: rc -110
Aug 20 22:56:16 dac-e-9 kernel: Lustre: fs1-OST0068: Client efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1) reconnecting
Aug 20 22:56:41 dac-e-9 kernel: Lustre: fs1-OST0062: Client efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1) reconnecting
Aug 20 22:56:41 dac-e-9 kernel: LustreError: 138255:0:(ldlm_lib.c:3253:target_bulk_io()) @@@ Reconnect on bulk READ  req@ffff8d0ef9caf850 x1642422175274480/t0(0) o3->efd44c75-059d-5b3e-f4e6-657896b473fc@10.47.21.34@o2ib1:107/0 lens 488/440 e 2 to 0 dl 1566338207 ref 1 fl Interpret:/0/0 rc 0/0
Aug 20 22:56:41 dac-e-9 kernel: Lustre: fs1-OST0062: Bulk IO read error with efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1), client will retry: rc -110
Aug 20 22:56:49 dac-e-9 kernel: LustreError: 239272:0:(ldlm_lib.c:3268:target_bulk_io()) @@@ truncated bulk READ 15728640(16777216)  req@ffff8cf6f3e13050 x1642422175275056/t0(0) o3->1c9f1f8c-ae8f-76b3-441d-4d74e9e3a7df@10.47.21.38@o2ib1:119/0 lens 488/440 e 1 to 0 dl 1566338219 ref 1 fl Interpret:/0/0 rc 0/0
Aug 20 22:56:49 dac-e-9 kernel: LustreError: 53264:0:(ldlm_lib.c:3259:target_bulk_io()) @@@ network error on bulk READ  req@ffff8cf6f3e12050 x1642422175275600/t0(0) o3->1c9f1f8c-ae8f-76b3-441d-4d74e9e3a7df@10.47.21.38@o2ib1:119/0 lens 488/440 e 1 to 0 dl 1566338219 ref 1 fl Interpret:/0/0 rc 0/0
Aug 20 22:56:49 dac-e-9 kernel: Lustre: fs1-OST0069: Bulk IO read error with 1c9f1f8c-ae8f-76b3-441d-4d74e9e3a7df (at 10.47.21.38@o2ib1), client will retry: rc -110
Aug 20 22:56:49 dac-e-9 kernel: LustreError: 239272:0:(ldlm_lib.c:3268:target_bulk_io()) Skipped 4 previous similar messages
Aug 20 23:00:48 dac-e-9 kernel: Lustre: fs1-OST0068: Client 1c9f1f8c-ae8f-76b3-441d-4d74e9e3a7df (at 10.47.21.38@o2ib1) reconnecting
Aug 20 23:00:48 dac-e-9 kernel: Lustre: fs1-OST0068: Connection restored to 1c9f1f8c-ae8f-76b3-441d-4d74e9e3a7df (at 10.47.21.38@o2ib1)
Aug 20 23:00:48 dac-e-9 kernel: Lustre: Skipped 34 previous similar messages
Aug 20 23:01:08 dac-e-9 kernel: Lustre: fs1-OST0061: Client 3f28bd14-132c-e629-0b42-8ea0fdd5d2a4 (at 10.47.21.31@o2ib1) reconnecting
Aug 20 23:01:41 dac-e-9 kernel: Lustre: fs1-OST0065: Client d51aba7f-3e9b-b409-76a8-ec3ceb1b1ca5 (at 10.47.21.35@o2ib1) reconnecting
Aug 20 23:03:07 dac-e-9 kernel: Lustre: fs1-OST0069: Client 1c9f1f8c-ae8f-76b3-441d-4d74e9e3a7df (at 10.47.21.38@o2ib1) reconnecting
Aug 20 23:07:13 dac-e-9 kernel: Lustre: fs1-OST006a: Client efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1) reconnecting
Aug 20 23:07:13 dac-e-9 kernel: Lustre: Skipped 2 previous similar messages
Aug 20 23:07:13 dac-e-9 kernel: Lustre: fs1-OST006a: Connection restored to efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1)
Aug 20 23:07:13 dac-e-9 kernel: Lustre: Skipped 7 previous similar messages
Aug 20 23:19:10 dac-e-9 kernel: LustreError: 11-0: fs1-MDT0002-osp-MDT0008: operation mds_statfs to node 10.47.18.3@o2ib1 failed: rc = -107
Aug 20 23:19:10 dac-e-9 kernel: Lustre: fs1-MDT0002-osp-MDT0008: Connection to fs1-MDT0002 (at 10.47.18.3@o2ib1) was lost; in progress operations using this service will wait for recovery to complete
Aug 20 23:19:11 dac-e-9 kernel: LustreError: 11-0: fs1-MDT0011-osp-MDT0008: operation mds_statfs to node 10.47.18.18@o2ib1 failed: rc = -107
Aug 20 23:19:11 dac-e-9 kernel: LustreError: Skipped 1 previous similar message
Aug 20 23:19:11 dac-e-9 kernel: Lustre: fs1-MDT0011-osp-MDT0008: Connection to fs1-MDT0011 (at 10.47.18.18@o2ib1) was lost; in progress operations using this service will wait for recovery to complete
Aug 20 23:19:11 dac-e-9 kernel: Lustre: Skipped 1 previous similar message
Aug 20 23:19:12 dac-e-9 kernel: LustreError: 11-0: fs1-MDT000d-osp-MDT0008: operation mds_statfs to node 10.47.18.14@o2ib1 failed: rc = -107
Aug 20 23:19:12 dac-e-9 kernel: LustreError: Skipped 11 previous similar messages
Aug 20 23:19:12 dac-e-9 kernel: Lustre: fs1-MDT000d-osp-MDT0008: Connection to fs1-MDT000d (at 10.47.18.14@o2ib1) was lost; in progress operations using this service will wait for recovery to complete
Aug 20 23:19:12 dac-e-9 kernel: Lustre: Skipped 11 previous similar messages
Aug 20 23:19:15 dac-e-9 kernel: LustreError: 11-0: fs1-MDT0003-osp-MDT0008: operation mds_statfs to node 10.47.18.4@o2ib1 failed: rc = -107
Aug 20 23:19:15 dac-e-9 kernel: LustreError: Skipped 5 previous similar messages
Aug 20 23:19:15 dac-e-9 kernel: Lustre: fs1-MDT0003-osp-MDT0008: Connection to fs1-MDT0003 (at 10.47.18.4@o2ib1) was lost; in progress operations using this service will wait for recovery to complete
Aug 20 23:19:15 dac-e-9 kernel: Lustre: Skipped 5 previous similar messages
Aug 20 23:19:16 dac-e-9 kernel: Lustre: 383753:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339550/real 1566339550]  req@ffff8d1007a0c380 x1642422109231584/t0(0) o39->fs1-MDT0000-lwp-MDT0008@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339556 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:16 dac-e-9 kernel: Lustre: Failing over fs1-MDT0008
Aug 20 23:19:17 dac-e-9 kernel: Lustre: server umount fs1-MDT0008 complete
Aug 20 23:19:24 dac-e-9 kernel: Lustre: 384011:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339558/real 1566339558]  req@ffff8d0e7fb28000 x1642422109243328/t0(0) o39->fs1-MDT0000-lwp-OST0060@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339564 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:26 dac-e-9 kernel: Lustre: 385081:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339560/real 1566339560]  req@ffff8cf625e14800 x1642422109243344/t0(0) o39->fs1-MDT0000-lwp-OST0069@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339566 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:28 dac-e-9 kernel: Lustre: 386495:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339562/real 1566339562]  req@ffff8d1007a09680 x1642422109243376/t0(0) o39->fs1-MDT0000-lwp-OST0062@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339568 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:28 dac-e-9 kernel: Lustre: 386495:0:(client.c:2134:ptlrpc_expire_one_request()) Skipped 1 previous similar message
Aug 20 23:19:31 dac-e-9 kernel: Lustre: fs1-MDT0017-lwp-OST0067: Connection to fs1-MDT0017 (at 10.47.18.24@o2ib1) was lost; in progress operations using this service will wait for recovery to complete
Aug 20 23:19:32 dac-e-9 kernel: Lustre: Failing over fs1-OST0069
Aug 20 23:19:32 dac-e-9 kernel: Lustre: server umount fs1-OST0069 complete
Aug 20 23:19:33 dac-e-9 kernel: Lustre: 387182:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339567/real 1566339567]  req@ffff8d0eb6ead100 x1642422109247984/t0(0) o39->fs1-MDT0000-lwp-OST0068@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339573 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:33 dac-e-9 kernel: Lustre: 387182:0:(client.c:2134:ptlrpc_expire_one_request()) Skipped 271 previous similar messages
Aug 20 23:19:33 dac-e-9 kernel: Lustre: Failing over fs1-OST006b
Aug 20 23:19:33 dac-e-9 kernel: Lustre: Skipped 2 previous similar messages
Aug 20 23:19:33 dac-e-9 kernel: Lustre: server umount fs1-OST006b complete
Aug 20 23:19:33 dac-e-9 kernel: Lustre: Skipped 2 previous similar messages
Aug 20 23:19:35 dac-e-9 kernel: Lustre: Failing over fs1-OST0064
Aug 20 23:19:35 dac-e-9 kernel: Lustre: Skipped 3 previous similar messages
Aug 20 23:19:36 dac-e-9 kernel: Lustre: server umount fs1-OST0064 complete
Aug 20 23:19:36 dac-e-9 kernel: Lustre: Skipped 3 previous similar messages

HOSTS -------------------------------------------------------------------------
dac-e-6
-------------------------------------------------------------------------------
-- Logs begin at Thu 2019-05-09 08:54:36 BST, end at Wed 2019-08-21 11:34:55 BST. --
Aug 20 22:50:13 dac-e-6 kernel: Lustre: fs1-OST003c: Connection restored to 60f2ae35-b178-1c80-ba50-e7c416f9922a (at 10.47.20.68@o2ib1)
Aug 20 22:50:13 dac-e-6 kernel: Lustre: Skipped 8 previous similar messages
Aug 20 22:55:51 dac-e-6 kernel: Lustre: fs1-OST0047: Connection restored to 60f2ae35-b178-1c80-ba50-e7c416f9922a (at 10.47.20.68@o2ib1)
Aug 20 22:55:51 dac-e-6 kernel: Lustre: Skipped 76 previous similar messages
Aug 20 22:55:53 dac-e-6 kernel: LustreError: 58242:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8fdb55b70000
Aug 20 22:55:53 dac-e-6 kernel: LustreError: 58245:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8fdb55b70000
Aug 20 22:55:53 dac-e-6 kernel: LustreError: 58244:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8fdb55b70000
Aug 20 22:55:53 dac-e-6 kernel: LustreError: 58243:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8fdb55b70000
Aug 20 22:55:53 dac-e-6 kernel: Lustre: 322699:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1566338151/real 1566338153]  req@ffff8fdb3eeab180 x1642422107578976/t0(0) o105->fs1-OST0044@10.47.21.31@o2ib1:15/16 lens 360/224 e 0 to 1 dl 1566338158 ref 1 fl Rpc:eX/0/ffffffff rc 0/-1
Aug 20 22:55:53 dac-e-6 kernel: LustreError: 58249:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8ff28fcf4200
Aug 20 22:55:53 dac-e-6 kernel: LustreError: 58292:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8ff28fcf4200
Aug 20 22:55:53 dac-e-6 kernel: LustreError: 58244:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8fdb55b70000
Aug 20 22:55:53 dac-e-6 kernel: LustreError: 58246:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8ff28fcf4200
Aug 20 22:55:53 dac-e-6 kernel: LustreError: 58247:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8ff28fcf4200
Aug 20 22:55:53 dac-e-6 kernel: LustreError: 58249:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8fdab17b2800
Aug 20 22:55:53 dac-e-6 kernel: LustreError: 58292:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8fdab17b2800
Aug 20 22:55:53 dac-e-6 kernel: LustreError: 58248:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8fdab17b2800
Aug 20 22:55:53 dac-e-6 kernel: Lustre: fs1-OST003d: Client d51aba7f-3e9b-b409-76a8-ec3ceb1b1ca5 (at 10.47.21.35@o2ib1) reconnecting
Aug 20 22:55:53 dac-e-6 kernel: LustreError: 58249:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8fdab17b2800
Aug 20 22:55:53 dac-e-6 kernel: LustreError: 58246:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8fdab17b2800
Aug 20 22:55:54 dac-e-6 kernel: LustreError: 246170:0:(ldlm_lib.c:3253:target_bulk_io()) @@@ Reconnect on bulk READ  req@ffff8fdaeaf94050 x1642422175273856/t0(0) o3->d51aba7f-3e9b-b409-76a8-ec3ceb1b1ca5@10.47.21.35@o2ib1:62/0 lens 488/440 e 0 to 0 dl 1566338162 ref 1 fl Interpret:/0/0 rc 0/0
Aug 20 22:55:55 dac-e-6 kernel: Lustre: 153995:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1566338151/real 1566338155]  req@ffff8ff1e1adc380 x1642422107578576/t0(0) o104->fs1-OST0043@10.47.21.34@o2ib1:15/16 lens 296/224 e 0 to 1 dl 1566338162 ref 1 fl Rpc:eX/0/ffffffff rc 0/-1
Aug 20 22:55:55 dac-e-6 kernel: Lustre: 153995:0:(client.c:2134:ptlrpc_expire_one_request()) Skipped 1 previous similar message
Aug 20 22:55:55 dac-e-6 kernel: Lustre: fs1-OST0043: Client efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1) reconnecting
Aug 20 22:55:55 dac-e-6 kernel: Lustre: Skipped 1 previous similar message
Aug 20 22:55:55 dac-e-6 kernel: LustreError: 58292:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8ff18c73ea00
Aug 20 22:55:55 dac-e-6 kernel: LustreError: 58249:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8ff18c73ea00
Aug 20 22:55:55 dac-e-6 kernel: LustreError: 58247:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8ff18c73ea00
Aug 20 22:55:55 dac-e-6 kernel: LustreError: 58248:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8ff18c73ea00
Aug 20 22:55:56 dac-e-6 kernel: LustreError: 58247:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8ff18c73ea00
Aug 20 22:55:56 dac-e-6 kernel: LustreError: 58291:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8fda64e55000
Aug 20 22:55:56 dac-e-6 kernel: LustreError: 58243:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8fda64e55000
Aug 20 22:55:56 dac-e-6 kernel: LustreError: 58244:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8fda64e55000
Aug 20 22:55:56 dac-e-6 kernel: LustreError: 58248:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8ff18c73ea00
Aug 20 22:55:56 dac-e-6 kernel: LNet: 58291:0:(o2iblnd_cb.c:413:kiblnd_handle_rx()) PUT_NACK from 10.47.21.35@o2ib1
Aug 20 22:55:56 dac-e-6 kernel: LustreError: 58247:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8ff18c73ea00
Aug 20 22:55:56 dac-e-6 kernel: Lustre: fs1-OST003d: Bulk IO read error with d51aba7f-3e9b-b409-76a8-ec3ceb1b1ca5 (at 10.47.21.35@o2ib1), client will retry: rc -110
Aug 20 22:55:56 dac-e-6 kernel: LustreError: 58249:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8ff18c73ea00
Aug 20 22:55:56 dac-e-6 kernel: LustreError: 58247:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8ff18c73ea00
Aug 20 22:55:56 dac-e-6 kernel: LustreError: 149469:0:(ldlm_lib.c:3253:target_bulk_io()) @@@ Reconnect on bulk READ  req@ffff8ff266c2b850 x1642422175273616/t0(0) o3->efd44c75-059d-5b3e-f4e6-657896b473fc@10.47.21.34@o2ib1:94/0 lens 488/440 e 0 to 0 dl 1566338194 ref 1 fl Interpret:/0/0 rc 0/0
Aug 20 22:55:57 dac-e-6 kernel: Lustre: fs1-OST003d: Client efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1) reconnecting
Aug 20 22:55:57 dac-e-6 kernel: Lustre: Skipped 3 previous similar messages
Aug 20 22:55:58 dac-e-6 kernel: LustreError: 187214:0:(ldlm_lib.c:3253:target_bulk_io()) @@@ Reconnect on bulk READ  req@ffff8ff280cb3850 x1642422175273536/t0(0) o3->efd44c75-059d-5b3e-f4e6-657896b473fc@10.47.21.34@o2ib1:87/0 lens 488/440 e 1 to 0 dl 1566338187 ref 1 fl Interpret:/0/0 rc 0/0
Aug 20 22:55:58 dac-e-6 kernel: Lustre: fs1-OST0041: Bulk IO read error with efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1), client will retry: rc -110
Aug 20 22:55:58 dac-e-6 kernel: Lustre: Skipped 1 previous similar message
Aug 20 22:56:16 dac-e-6 kernel: Lustre: fs1-OST0043: Client efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1) reconnecting
Aug 20 22:56:16 dac-e-6 kernel: Lustre: Skipped 3 previous similar messages
Aug 20 22:56:23 dac-e-6 kernel: Lustre: fs1-OST003d: Client d51aba7f-3e9b-b409-76a8-ec3ceb1b1ca5 (at 10.47.21.35@o2ib1) reconnecting
Aug 20 22:56:23 dac-e-6 kernel: Lustre: Skipped 1 previous similar message
Aug 20 22:56:24 dac-e-6 kernel: LustreError: 264305:0:(ldlm_lib.c:3253:target_bulk_io()) @@@ Reconnect on bulk READ  req@ffff8fdaeaf92050 x1642422175273520/t0(0) o3->d51aba7f-3e9b-b409-76a8-ec3ceb1b1ca5@10.47.21.35@o2ib1:94/0 lens 488/440 e 0 to 0 dl 1566338194 ref 1 fl Interpret:/0/0 rc 0/0
Aug 20 22:56:24 dac-e-6 kernel: Lustre: fs1-OST0042: Bulk IO read error with d51aba7f-3e9b-b409-76a8-ec3ceb1b1ca5 (at 10.47.21.35@o2ib1), client will retry: rc -110
Aug 20 22:56:50 dac-e-6 kernel: LustreError: 245754:0:(ldlm_lib.c:3259:target_bulk_io()) @@@ network error on bulk READ  req@ffff8ff280cb6050 x1642422175277616/t0(0) o3->3f28bd14-132c-e629-0b42-8ea0fdd5d2a4@10.47.21.31@o2ib1:137/0 lens 488/440 e 3 to 0 dl 1566338237 ref 1 fl Interpret:/0/0 rc 0/0
Aug 20 22:56:50 dac-e-6 kernel: Lustre: fs1-OST0044: Bulk IO read error with 3f28bd14-132c-e629-0b42-8ea0fdd5d2a4 (at 10.47.21.31@o2ib1), client will retry: rc -110
Aug 20 23:01:08 dac-e-6 kernel: Lustre: fs1-OST0044: Client 3f28bd14-132c-e629-0b42-8ea0fdd5d2a4 (at 10.47.21.31@o2ib1) reconnecting
Aug 20 23:01:08 dac-e-6 kernel: Lustre: Skipped 2 previous similar messages
Aug 20 23:01:08 dac-e-6 kernel: Lustre: fs1-OST0044: Connection restored to 3f28bd14-132c-e629-0b42-8ea0fdd5d2a4 (at 10.47.21.31@o2ib1)
Aug 20 23:01:08 dac-e-6 kernel: Lustre: Skipped 42 previous similar messages
Aug 20 23:03:33 dac-e-6 kernel: Lustre: fs1-OST0043: Connection restored to efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1)
Aug 20 23:03:33 dac-e-6 kernel: Lustre: Skipped 3 previous similar messages
Aug 20 23:09:09 dac-e-6 kernel: Lustre: fs1-OST003d: Connection restored to 60f2ae35-b178-1c80-ba50-e7c416f9922a (at 10.47.20.68@o2ib1)
Aug 20 23:09:09 dac-e-6 kernel: Lustre: Skipped 1 previous similar message
Aug 20 23:19:10 dac-e-6 kernel: LustreError: 11-0: fs1-MDT0000-lwp-MDT0005: operation mds_disconnect to node 10.47.18.1@o2ib1 failed: rc = -107
Aug 20 23:19:10 dac-e-6 kernel: Lustre: Failing over fs1-MDT0005
Aug 20 23:19:10 dac-e-6 kernel: Lustre: server umount fs1-MDT0005 complete
Aug 20 23:19:12 dac-e-6 kernel: LustreError: 137-5: fs1-MDT0005_UUID: not available for connect from 10.47.18.5@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server.
Aug 20 23:19:13 dac-e-6 kernel: LustreError: 137-5: fs1-MDT0005_UUID: not available for connect from 10.47.18.9@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server.
Aug 20 23:19:24 dac-e-6 kernel: Lustre: 258514:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339558/real 1566339558]  req@ffff8fdb0571e300 x1642422109238096/t0(0) o39->fs1-MDT0000-lwp-OST003c@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339564 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:24 dac-e-6 kernel: Lustre: 258514:0:(client.c:2134:ptlrpc_expire_one_request()) Skipped 2 previous similar messages
Aug 20 23:19:26 dac-e-6 kernel: Lustre: 258553:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339560/real 1566339560]  req@ffff8fdb05719200 x1642422109238112/t0(0) o39->fs1-MDT0000-lwp-OST0045@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339566 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:27 dac-e-6 kernel: Lustre: 259182:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339561/real 1566339561]  req@ffff8fdb05718000 x1642422109238128/t0(0) o39->fs1-MDT0000-lwp-OST0047@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339567 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:29 dac-e-6 kernel: Lustre: 147298:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339562/real 1566339562]  req@ffff8fd9d27a1200 x1642422109242400/t0(0) o400->fs1-MDT000c-lwp-OST0043@10.47.18.13@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339569 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:29 dac-e-6 kernel: Lustre: 147298:0:(client.c:2134:ptlrpc_expire_one_request()) Skipped 1 previous similar message
Aug 20 23:19:29 dac-e-6 kernel: Lustre: fs1-MDT000c-lwp-OST0043: Connection to fs1-MDT000c (at 10.47.18.13@o2ib1) was lost; in progress operations using this service will wait for recovery to complete
Aug 20 23:19:29 dac-e-6 kernel: Lustre: fs1-MDT0003-lwp-OST0043: Connection to fs1-MDT0003 (at 10.47.18.4@o2ib1) was lost; in progress operations using this service will wait for recovery to complete
Aug 20 23:19:29 dac-e-6 kernel: Lustre: Skipped 1 previous similar message
Aug 20 23:19:29 dac-e-6 kernel: Lustre: Failing over fs1-OST0040
Aug 20 23:19:30 dac-e-6 kernel: Lustre: server umount fs1-OST0040 complete
Aug 20 23:19:31 dac-e-6 kernel: Lustre: Failing over fs1-OST0042
Aug 20 23:19:31 dac-e-6 kernel: Lustre: Skipped 2 previous similar messages
Aug 20 23:19:31 dac-e-6 kernel: Lustre: server umount fs1-OST0042 complete
Aug 20 23:19:31 dac-e-6 kernel: Lustre: Skipped 2 previous similar messages
Aug 20 23:19:33 dac-e-6 kernel: Lustre: Failing over fs1-OST0047
Aug 20 23:19:33 dac-e-6 kernel: Lustre: Skipped 4 previous similar messages
Aug 20 23:19:33 dac-e-6 kernel: Lustre: server umount fs1-OST0047 complete
Aug 20 23:19:33 dac-e-6 kernel: Lustre: Skipped 4 previous similar messages
Aug 20 23:19:33 dac-e-6 kernel: Lustre: 262188:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339567/real 1566339567]  req@ffff8fdb408d6780 x1642422109242816/t0(0) o39->fs1-MDT0000-lwp-OST0046@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339573 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:33 dac-e-6 kernel: Lustre: 262188:0:(client.c:2134:ptlrpc_expire_one_request()) Skipped 277 previous similar messages

HOSTS -------------------------------------------------------------------------
dac-e-22
-------------------------------------------------------------------------------
-- Logs begin at Thu 2019-05-09 10:24:44 BST, end at Wed 2019-08-21 11:34:55 BST. --
Aug 20 22:50:13 dac-e-22 kernel: Lustre: fs1-OST00fc: Connection restored to 60f2ae35-b178-1c80-ba50-e7c416f9922a (at 10.47.20.68@o2ib1)
Aug 20 22:50:13 dac-e-22 kernel: Lustre: Skipped 10 previous similar messages
Aug 20 22:55:51 dac-e-22 kernel: Lustre: fs1-OST0102: Connection restored to a9c07af9-d96f-6905-bdea-228af9a88046 (at 10.47.21.32@o2ib1)
Aug 20 22:55:51 dac-e-22 kernel: Lustre: Skipped 72 previous similar messages
Aug 20 22:55:53 dac-e-22 kernel: Lustre: fs1-OST0101: Client 3f28bd14-132c-e629-0b42-8ea0fdd5d2a4 (at 10.47.21.31@o2ib1) reconnecting
Aug 20 22:55:53 dac-e-22 kernel: LustreError: 78668:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff90200a8f9200
Aug 20 22:55:53 dac-e-22 kernel: LustreError: 78662:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff90200a8f9200
Aug 20 22:55:54 dac-e-22 kernel: LustreError: 78668:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff90200a8f9200
Aug 20 22:55:54 dac-e-22 kernel: LustreError: 78662:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff90200a8f9200
Aug 20 22:55:54 dac-e-22 kernel: LustreError: 78668:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff900738ab0800
Aug 20 22:55:54 dac-e-22 kernel: LustreError: 78662:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff900738ab0800
Aug 20 22:55:54 dac-e-22 kernel: LustreError: 78668:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff900738ab0800
Aug 20 22:55:54 dac-e-22 kernel: LustreError: 78661:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff900738ab0800
Aug 20 22:55:54 dac-e-22 kernel: LustreError: 78662:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff90200a8f9200
Aug 20 22:55:54 dac-e-22 kernel: LustreError: 78661:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff90200a8f9200
Aug 20 22:55:54 dac-e-22 kernel: Lustre: 183057:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1566338154/real 1566338154]  req@ffff90095927e300 x1642422107580928/t0(0) o104->fs1-OST00fd@10.47.21.38@o2ib1:15/16 lens 296/224 e 0 to 1 dl 1566338165 ref 1 fl Rpc:eX/0/ffffffff rc 0/-1
Aug 20 22:55:54 dac-e-22 kernel: LustreError: 78657:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff90073dfeb000
Aug 20 22:55:54 dac-e-22 kernel: LustreError: 78667:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff90073dfeb000
Aug 20 22:55:54 dac-e-22 kernel: LustreError: 78658:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff90073dfeb000
Aug 20 22:55:54 dac-e-22 kernel: LustreError: 78659:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff90073dfeb000
Aug 20 22:55:54 dac-e-22 kernel: LustreError: 78662:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff900738ab0800
Aug 20 22:55:54 dac-e-22 kernel: LustreError: 78660:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff900738ab0800
Aug 20 22:55:54 dac-e-22 kernel: LustreError: 78663:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff900738ab0800
Aug 20 22:55:54 dac-e-22 kernel: Lustre: 169380:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1566338154/real 1566338154]  req@ffff8ff2d957c380 x1642422107580944/t0(0) o105->fs1-OST0101@10.47.21.34@o2ib1:15/16 lens 360/224 e 0 to 1 dl 1566338165 ref 1 fl Rpc:eX/0/ffffffff rc 0/-1
Aug 20 22:55:54 dac-e-22 kernel: Lustre: 169380:0:(client.c:2134:ptlrpc_expire_one_request()) Skipped 2 previous similar messages
Aug 20 22:55:54 dac-e-22 kernel: LustreError: 78663:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff90200a8f9200
Aug 20 22:55:54 dac-e-22 kernel: LustreError: 78660:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff90200a8f9200
Aug 20 22:55:55 dac-e-22 kernel: Lustre: fs1-OST00fc: Client 1c9f1f8c-ae8f-76b3-441d-4d74e9e3a7df (at 10.47.21.38@o2ib1) reconnecting
Aug 20 22:55:56 dac-e-22 kernel: LustreError: 262072:0:(ldlm_lib.c:3253:target_bulk_io()) @@@ Reconnect on bulk READ  req@ffff900953b33050 x1642422175278000/t0(0) o3->1c9f1f8c-ae8f-76b3-441d-4d74e9e3a7df@10.47.21.38@o2ib1:66/0 lens 488/440 e 0 to 0 dl 1566338166 ref 1 fl Interpret:/0/0 rc 0/0
Aug 20 22:55:56 dac-e-22 kernel: Lustre: fs1-OST00fe: Bulk IO read error with 1c9f1f8c-ae8f-76b3-441d-4d74e9e3a7df (at 10.47.21.38@o2ib1), client will retry: rc -110
Aug 20 22:55:58 dac-e-22 kernel: Lustre: fs1-OST0101: Client efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1) reconnecting
Aug 20 22:55:58 dac-e-22 kernel: Lustre: Skipped 2 previous similar messages
Aug 20 22:55:59 dac-e-22 kernel: LNet: 78655:0:(o2iblnd_cb.c:3381:kiblnd_check_conns()) Timed out tx for 10.47.21.34@o2ib1: 0 seconds
Aug 20 22:55:59 dac-e-22 kernel: Lustre: 169380:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1566338154/real 1566338159]  req@ffff8ff2d957c380 x1642422107580944/t0(0) o105->fs1-OST0101@10.47.21.34@o2ib1:15/16 lens 360/224 e 0 to 1 dl 1566338165 ref 1 fl Rpc:eXS/2/ffffffff rc -11/-1
Aug 20 22:56:00 dac-e-22 kernel: LNet: 78655:0:(o2iblnd_cb.c:3381:kiblnd_check_conns()) Timed out tx for 10.47.21.34@o2ib1: 1 seconds
Aug 20 22:56:07 dac-e-22 kernel: Lustre: 172263:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566338156/real 1566338156]  req@ffff90092ffb2d00 x1642422107582096/t0(0) o104->fs1-OST0102@10.47.21.34@o2ib1:15/16 lens 296/224 e 0 to 1 dl 1566338167 ref 1 fl Rpc:X/0/ffffffff rc 0/-1
Aug 20 22:56:07 dac-e-22 kernel: Lustre: 172263:0:(client.c:2134:ptlrpc_expire_one_request()) Skipped 1 previous similar message
Aug 20 22:56:16 dac-e-22 kernel: Lustre: fs1-OST00fe: Client efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1) reconnecting
Aug 20 22:56:16 dac-e-22 kernel: LustreError: 207893:0:(ldlm_lib.c:3253:target_bulk_io()) @@@ Reconnect on bulk READ  req@ffff901fbda4a050 x1642422175274752/t0(0) o3->efd44c75-059d-5b3e-f4e6-657896b473fc@10.47.21.34@o2ib1:87/0 lens 488/440 e 1 to 0 dl 1566338187 ref 1 fl Interpret:/0/0 rc 0/0
Aug 20 22:56:16 dac-e-22 kernel: Lustre: fs1-OST00fc: Bulk IO read error with efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1), client will retry: rc -110
Aug 20 22:56:16 dac-e-22 kernel: Lustre: Skipped 1 previous similar message
Aug 20 22:56:16 dac-e-22 kernel: LustreError: 207893:0:(ldlm_lib.c:3253:target_bulk_io()) Skipped 2 previous similar messages
Aug 20 22:56:21 dac-e-22 kernel: Lustre: fs1-OST00fe: Client 4f2faf4f-1754-0923-7bb5-26c935576df5 (at 10.47.21.36@o2ib1) reconnecting
Aug 20 22:56:21 dac-e-22 kernel: Lustre: Skipped 6 previous similar messages
Aug 20 22:56:49 dac-e-22 kernel: LustreError: 207984:0:(ldlm_lib.c:3268:target_bulk_io()) @@@ truncated bulk READ 14680064(16777216)  req@ffff901fbda48850 x1642422175285376/t0(0) o3->efd44c75-059d-5b3e-f4e6-657896b473fc@10.47.21.34@o2ib1:133/0 lens 488/440 e 1 to 0 dl 1566338233 ref 1 fl Interpret:/0/0 rc 0/0
Aug 20 22:56:49 dac-e-22 kernel: Lustre: fs1-OST0101: Bulk IO read error with efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1), client will retry: rc -110
Aug 20 22:56:49 dac-e-22 kernel: Lustre: Skipped 1 previous similar message
Aug 20 23:00:46 dac-e-22 kernel: Lustre: fs1-OST0106: Connection restored to b70bd8d1-bef1-cafa-1d50-bfa93684ff22 (at 10.47.21.37@o2ib1)
Aug 20 23:00:46 dac-e-22 kernel: Lustre: Skipped 47 previous similar messages
Aug 20 23:03:24 dac-e-22 kernel: Lustre: fs1-OST0102: Client 3f28bd14-132c-e629-0b42-8ea0fdd5d2a4 (at 10.47.21.31@o2ib1) reconnecting
Aug 20 23:09:09 dac-e-22 kernel: Lustre: fs1-OST00fc: Connection restored to 60f2ae35-b178-1c80-ba50-e7c416f9922a (at 10.47.20.68@o2ib1)
Aug 20 23:09:09 dac-e-22 kernel: Lustre: Skipped 4 previous similar messages
Aug 20 23:19:10 dac-e-22 kernel: LustreError: 11-0: fs1-MDT0000-lwp-MDT0015: operation mds_disconnect to node 10.47.18.1@o2ib1 failed: rc = -107
Aug 20 23:19:10 dac-e-22 kernel: Lustre: Failing over fs1-MDT0015
Aug 20 23:19:10 dac-e-22 kernel: Lustre: server umount fs1-MDT0015 complete
Aug 20 23:19:11 dac-e-22 kernel: LustreError: 137-5: fs1-MDT0015_UUID: not available for connect from 10.47.18.9@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server.
Aug 20 23:19:12 dac-e-22 kernel: LustreError: 137-5: fs1-MDT0015_UUID: not available for connect from 10.47.18.8@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server.
Aug 20 23:19:12 dac-e-22 kernel: LustreError: Skipped 1 previous similar message
Aug 20 23:19:24 dac-e-22 kernel: Lustre: 272486:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339558/real 1566339558]  req@ffff9007736cde80 x1642422109236944/t0(0) o39->fs1-MDT0000-lwp-OST00fc@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339564 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:25 dac-e-22 kernel: Lustre: 273502:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339559/real 1566339559]  req@ffff90077cf53a80 x1642422109236960/t0(0) o39->fs1-MDT0000-lwp-OST0105@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339565 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:27 dac-e-22 kernel: Lustre: 274302:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339561/real 1566339561]  req@ffff902052189680 x1642422109236976/t0(0) o39->fs1-MDT0000-lwp-OST0107@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339567 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:29 dac-e-22 kernel: Lustre: 276242:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339563/real 1566339563]  req@ffff90077cf50480 x1642422109237008/t0(0) o39->fs1-MDT0000-lwp-OST0100@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339569 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:29 dac-e-22 kernel: Lustre: 276242:0:(client.c:2134:ptlrpc_expire_one_request()) Skipped 1 previous similar message
Aug 20 23:19:31 dac-e-22 kernel: Lustre: fs1-MDT0006-lwp-OST0101: Connection to fs1-MDT0006 (at 10.47.18.7@o2ib1) was lost; in progress operations using this service will wait for recovery to complete
Aug 20 23:19:31 dac-e-22 kernel: Lustre: Failing over fs1-OST0101
Aug 20 23:19:32 dac-e-22 kernel: Lustre: server umount fs1-OST0101 complete
Aug 20 23:19:33 dac-e-22 kernel: Lustre: Failing over fs1-OST0103
Aug 20 23:19:33 dac-e-22 kernel: Lustre: Skipped 2 previous similar messages
Aug 20 23:19:33 dac-e-22 kernel: Lustre: server umount fs1-OST0103 complete
Aug 20 23:19:33 dac-e-22 kernel: Lustre: Skipped 2 previous similar messages
Aug 20 23:19:34 dac-e-22 kernel: Lustre: 275697:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339568/real 1566339568]  req@ffff9007496cc380 x1642422109241648/t0(0) o39->fs1-MDT0009-lwp-OST00fe@10.47.18.10@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339574 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:34 dac-e-22 kernel: Lustre: 275697:0:(client.c:2134:ptlrpc_expire_one_request()) Skipped 271 previous similar messages
Aug 20 23:19:35 dac-e-22 kernel: Lustre: Failing over fs1-OST0100
Aug 20 23:19:35 dac-e-22 kernel: Lustre: Skipped 4 previous similar messages
Aug 20 23:19:35 dac-e-22 kernel: Lustre: server umount fs1-OST0100 complete
Aug 20 23:19:35 dac-e-22 kernel: Lustre: Skipped 4 previous similar messages

HOSTS -------------------------------------------------------------------------
dac-e-8
-------------------------------------------------------------------------------
-- Logs begin at Thu 2019-05-09 08:54:37 BST, end at Wed 2019-08-21 11:34:55 BST. --
Aug 20 22:55:51 dac-e-8 kernel: Lustre: fs1-OST005f: Connection restored to a9c07af9-d96f-6905-bdea-228af9a88046 (at 10.47.21.32@o2ib1)
Aug 20 22:55:51 dac-e-8 kernel: Lustre: Skipped 91 previous similar messages
Aug 20 22:55:53 dac-e-8 kernel: Lustre: fs1-OST0056: Client 3f28bd14-132c-e629-0b42-8ea0fdd5d2a4 (at 10.47.21.31@o2ib1) reconnecting
Aug 20 22:55:53 dac-e-8 kernel: LustreError: 49823:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8fc1d07aee00
Aug 20 22:55:53 dac-e-8 kernel: LustreError: 49825:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8fc1e7661c00
Aug 20 22:55:53 dac-e-8 kernel: LustreError: 49826:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8fc1e7661c00
Aug 20 22:55:53 dac-e-8 kernel: LustreError: 49831:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8fc1e7661c00
Aug 20 22:55:53 dac-e-8 kernel: LustreError: 49824:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8fc1e7661c00
Aug 20 22:55:53 dac-e-8 kernel: LustreError: 245584:0:(ldlm_lib.c:3253:target_bulk_io()) @@@ Reconnect on bulk READ  req@ffff8fc1e9a29050 x1642422175277152/t0(0) o3->3f28bd14-132c-e629-0b42-8ea0fdd5d2a4@10.47.21.31@o2ib1:62/0 lens 488/440 e 0 to 0 dl 1566338162 ref 1 fl Interpret:/0/0 rc 0/0
Aug 20 22:55:53 dac-e-8 kernel: LustreError: 245584:0:(ldlm_lib.c:3253:target_bulk_io()) Skipped 1 previous similar message
Aug 20 22:55:53 dac-e-8 kernel: Lustre: fs1-OST0056: Bulk IO read error with 3f28bd14-132c-e629-0b42-8ea0fdd5d2a4 (at 10.47.21.31@o2ib1), client will retry: rc -110
Aug 20 22:55:54 dac-e-8 kernel: LustreError: 49826:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8fc2d87ef200
Aug 20 22:55:54 dac-e-8 kernel: LustreError: 49824:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8fc2d87ef200
Aug 20 22:55:54 dac-e-8 kernel: LustreError: 49825:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8fc2d87ef200
Aug 20 22:55:54 dac-e-8 kernel: LustreError: 49831:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8fc2d87ef200
Aug 20 22:55:54 dac-e-8 kernel: LustreError: 49823:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8fc2d87ef200
Aug 20 22:55:54 dac-e-8 kernel: LustreError: 49825:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8fc2d87ef200
Aug 20 22:55:54 dac-e-8 kernel: LustreError: 49823:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8fc2d87ef200
Aug 20 22:55:54 dac-e-8 kernel: LustreError: 49826:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8fc2d87ef200
Aug 20 22:55:54 dac-e-8 kernel: LustreError: 49824:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8fc2d87ef200
Aug 20 22:55:54 dac-e-8 kernel: LustreError: 49831:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8fc2d87ef200
Aug 20 22:55:54 dac-e-8 kernel: LustreError: 49825:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8fc2d87ef200
Aug 20 22:55:54 dac-e-8 kernel: LustreError: 49823:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8fc2d87ef200
Aug 20 22:55:54 dac-e-8 kernel: LustreError: 49826:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8fc2d87ef200
Aug 20 22:55:54 dac-e-8 kernel: LustreError: 49824:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8fc2d87ef200
Aug 20 22:55:54 dac-e-8 kernel: LustreError: 49825:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8fc2d87ef200
Aug 20 22:55:54 dac-e-8 kernel: LustreError: 49831:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8fc2d87ef200
Aug 20 22:55:54 dac-e-8 kernel: LustreError: 180110:0:(ldlm_lib.c:3253:target_bulk_io()) @@@ Reconnect on bulk READ  req@ffff8fc1e9a36050 x1642422175275344/t0(0) o3->efd44c75-059d-5b3e-f4e6-657896b473fc@10.47.21.34@o2ib1:82/0 lens 488/440 e 1 to 0 dl 1566338182 ref 1 fl Interpret:/0/0 rc 0/0
Aug 20 22:55:54 dac-e-8 kernel: Lustre: fs1-OST005a: Bulk IO read error with efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1), client will retry: rc -110
Aug 20 22:55:54 dac-e-8 kernel: LNet: 49826:0:(o2iblnd_cb.c:413:kiblnd_handle_rx()) PUT_NACK from 10.47.21.36@o2ib1
Aug 20 22:55:54 dac-e-8 kernel: LNetError: 49826:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0)
Aug 20 22:55:54 dac-e-8 kernel: LustreError: 49826:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8fc1d07aee00
Aug 20 22:55:54 dac-e-8 kernel: LustreError: 49826:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8fc1d07aee00
Aug 20 22:55:54 dac-e-8 kernel: LustreError: 49826:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8fc1d07aee00
Aug 20 22:55:54 dac-e-8 kernel: LustreError: 49826:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8fc1d07aee00
Aug 20 22:55:54 dac-e-8 kernel: LustreError: 49826:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8fc1d07aee00
Aug 20 22:55:54 dac-e-8 kernel: LustreError: 49826:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8fc1d07aee00
Aug 20 22:55:54 dac-e-8 kernel: LustreError: 49826:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8fc1d07aee00
Aug 20 22:55:54 dac-e-8 kernel: LustreError: 49826:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8fc1d07aee00
Aug 20 22:55:54 dac-e-8 kernel: LustreError: 49826:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8fc1d07aee00
Aug 20 22:55:54 dac-e-8 kernel: LustreError: 49826:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8fc1d07aee00
Aug 20 22:55:54 dac-e-8 kernel: LustreError: 49826:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8fc1d07aee00
Aug 20 22:55:54 dac-e-8 kernel: LustreError: 49826:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8fc1d07aee00
Aug 20 22:55:54 dac-e-8 kernel: LustreError: 49826:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8fc1d07aee00
Aug 20 22:55:54 dac-e-8 kernel: LustreError: 49826:0:(events.c:450:server_bulk_callback()) event type 5, status -125, desc ffff8fc1d07aee00
Aug 20 22:55:56 dac-e-8 kernel: LustreError: 49831:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8fc1d8bec400
Aug 20 22:55:56 dac-e-8 kernel: LustreError: 49825:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8fc1d8bec400
Aug 20 22:55:56 dac-e-8 kernel: LustreError: 49826:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8fc1d8bec400
Aug 20 22:55:56 dac-e-8 kernel: LustreError: 49824:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff8fc1d8bec400
Aug 20 22:55:56 dac-e-8 kernel: Lustre: fs1-OST005d: Client 4f2faf4f-1754-0923-7bb5-26c935576df5 (at 10.47.21.36@o2ib1) reconnecting
Aug 20 22:55:56 dac-e-8 kernel: Lustre: Skipped 3 previous similar messages
Aug 20 22:55:56 dac-e-8 kernel: LustreError: 180116:0:(ldlm_lib.c:3253:target_bulk_io()) @@@ Reconnect on bulk READ  req@ffff8fc1e9a2f850 x1642422175274896/t0(0) o3->4f2faf4f-1754-0923-7bb5-26c935576df5@10.47.21.36@o2ib1:82/0 lens 488/440 e 1 to 0 dl 1566338182 ref 1 fl Interpret:/0/0 rc 0/0
Aug 20 22:55:56 dac-e-8 kernel: Lustre: fs1-OST005d: Bulk IO read error with 4f2faf4f-1754-0923-7bb5-26c935576df5 (at 10.47.21.36@o2ib1), client will retry: rc -110
Aug 20 22:55:56 dac-e-8 kernel: Lustre: Skipped 1 previous similar message
Aug 20 22:55:58 dac-e-8 kernel: LNet: 49818:0:(o2iblnd_cb.c:3381:kiblnd_check_conns()) Timed out tx for 10.47.21.34@o2ib1: 1 seconds
Aug 20 22:55:58 dac-e-8 kernel: Lustre: fs1-OST0058: Client d51aba7f-3e9b-b409-76a8-ec3ceb1b1ca5 (at 10.47.21.35@o2ib1) reconnecting
Aug 20 22:56:16 dac-e-8 kernel: Lustre: fs1-OST005a: Client efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1) reconnecting
Aug 20 22:56:16 dac-e-8 kernel: Lustre: Skipped 1 previous similar message
Aug 20 22:56:21 dac-e-8 kernel: Lustre: fs1-OST005e: Client 4f2faf4f-1754-0923-7bb5-26c935576df5 (at 10.47.21.36@o2ib1) reconnecting
Aug 20 22:56:21 dac-e-8 kernel: Lustre: Skipped 1 previous similar message
Aug 20 22:56:49 dac-e-8 kernel: LustreError: 245584:0:(ldlm_lib.c:3268:target_bulk_io()) @@@ truncated bulk READ 14680064(16777216)  req@ffff8fc1e9a3a850 x1642422175279344/t0(0) o3->efd44c75-059d-5b3e-f4e6-657896b473fc@10.47.21.34@o2ib1:117/0 lens 488/440 e 2 to 0 dl 1566338217 ref 1 fl Interpret:/0/0 rc 0/0
Aug 20 22:56:49 dac-e-8 kernel: Lustre: fs1-OST005f: Bulk IO read error with efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1), client will retry: rc -110
Aug 20 23:01:41 dac-e-8 kernel: Lustre: fs1-OST005f: Client d51aba7f-3e9b-b409-76a8-ec3ceb1b1ca5 (at 10.47.21.35@o2ib1) reconnecting
Aug 20 23:01:41 dac-e-8 kernel: Lustre: Skipped 1 previous similar message
Aug 20 23:03:24 dac-e-8 kernel: Lustre: fs1-MDT0007: Client 3f28bd14-132c-e629-0b42-8ea0fdd5d2a4 (at 10.47.21.31@o2ib1) reconnecting
Aug 20 23:03:24 dac-e-8 kernel: Lustre: Skipped 1 previous similar message
Aug 20 23:06:06 dac-e-8 kernel: Lustre: fs1-OST005f: Client efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1) reconnecting
Aug 20 23:06:06 dac-e-8 kernel: Lustre: fs1-OST005f: Connection restored to efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1)
Aug 20 23:06:06 dac-e-8 kernel: Lustre: Skipped 49 previous similar messages
Aug 20 23:19:10 dac-e-8 kernel: LustreError: 11-0: fs1-MDT0017-osp-MDT0007: operation mds_statfs to node 10.47.18.24@o2ib1 failed: rc = -107
Aug 20 23:19:10 dac-e-8 kernel: Lustre: fs1-MDT0017-osp-MDT0007: Connection to fs1-MDT0017 (at 10.47.18.24@o2ib1) was lost; in progress operations using this service will wait for recovery to complete
Aug 20 23:19:11 dac-e-8 kernel: LustreError: 11-0: fs1-MDT000a-osp-MDT0007: operation mds_statfs to node 10.47.18.11@o2ib1 failed: rc = -107
Aug 20 23:19:11 dac-e-8 kernel: LustreError: Skipped 1 previous similar message
Aug 20 23:19:11 dac-e-8 kernel: Lustre: fs1-MDT000a-osp-MDT0007: Connection to fs1-MDT000a (at 10.47.18.11@o2ib1) was lost; in progress operations using this service will wait for recovery to complete
Aug 20 23:19:11 dac-e-8 kernel: Lustre: Skipped 1 previous similar message
Aug 20 23:19:12 dac-e-8 kernel: LustreError: 11-0: fs1-MDT000b-osp-MDT0007: operation mds_statfs to node 10.47.18.12@o2ib1 failed: rc = -107
Aug 20 23:19:12 dac-e-8 kernel: Lustre: fs1-MDT000c-osp-MDT0007: Connection to fs1-MDT000c (at 10.47.18.13@o2ib1) was lost; in progress operations using this service will wait for recovery to complete
Aug 20 23:19:12 dac-e-8 kernel: Lustre: Skipped 2 previous similar messages
Aug 20 23:19:12 dac-e-8 kernel: LustreError: Skipped 4 previous similar messages
Aug 20 23:19:15 dac-e-8 kernel: LustreError: 11-0: fs1-MDT0002-osp-MDT0007: operation mds_statfs to node 10.47.18.3@o2ib1 failed: rc = -107
Aug 20 23:19:15 dac-e-8 kernel: LustreError: Skipped 12 previous similar messages
Aug 20 23:19:15 dac-e-8 kernel: Lustre: fs1-MDT0002-osp-MDT0007: Connection to fs1-MDT0002 (at 10.47.18.3@o2ib1) was lost; in progress operations using this service will wait for recovery to complete
Aug 20 23:19:15 dac-e-8 kernel: Lustre: Skipped 14 previous similar messages
Aug 20 23:19:16 dac-e-8 kernel: Lustre: 246591:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339550/real 1566339550]  req@ffff8fc180ef0d80 x1642422109232240/t0(0) o39->fs1-MDT0000-lwp-MDT0007@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339556 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:16 dac-e-8 kernel: Lustre: Failing over fs1-MDT0007
Aug 20 23:19:16 dac-e-8 kernel: Lustre: fs1-MDT0007: Not available for connect from 10.47.18.5@o2ib1 (stopping)
Aug 20 23:19:17 dac-e-8 kernel: Lustre: server umount fs1-MDT0007 complete
Aug 20 23:19:24 dac-e-8 kernel: Lustre: 247241:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339558/real 1566339558]  req@ffff8fa93b833600 x1642422109242768/t0(0) o39->fs1-MDT0000-lwp-OST0054@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339564 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:26 dac-e-8 kernel: Lustre: 247979:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339560/real 1566339560]  req@ffff8fc19f4a1680 x1642422109242784/t0(0) o39->fs1-MDT0000-lwp-OST005d@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339566 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:28 dac-e-8 kernel: Lustre: 249534:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339562/real 1566339562]  req@ffff8fc19f4a3600 x1642422109242816/t0(0) o39->fs1-MDT0000-lwp-OST0056@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339568 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:28 dac-e-8 kernel: Lustre: 249534:0:(client.c:2134:ptlrpc_expire_one_request()) Skipped 1 previous similar message
Aug 20 23:19:32 dac-e-8 kernel: Lustre: 252247:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339566/real 1566339566]  req@ffff8fc1697d3a80 x1642422109247408/t0(0) o39->fs1-MDT0000-lwp-OST005c@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339572 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:32 dac-e-8 kernel: Lustre: 252247:0:(client.c:2134:ptlrpc_expire_one_request()) Skipped 4 previous similar messages
Aug 20 23:19:32 dac-e-8 kernel: Lustre: fs1-MDT0007-lwp-OST0059: Connection to fs1-MDT0007 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete
Aug 20 23:19:33 dac-e-8 kernel: Lustre: Failing over fs1-OST005b
Aug 20 23:19:33 dac-e-8 kernel: Lustre: server umount fs1-OST005b complete
Aug 20 23:19:34 dac-e-8 kernel: Lustre: Failing over fs1-OST0056
Aug 20 23:19:34 dac-e-8 kernel: Lustre: Skipped 2 previous similar messages
Aug 20 23:19:34 dac-e-8 kernel: Lustre: server umount fs1-OST0056 complete
Aug 20 23:19:34 dac-e-8 kernel: Lustre: Skipped 2 previous similar messages
Aug 20 23:19:37 dac-e-8 kernel: Lustre: Failing over fs1-OST005a
Aug 20 23:19:37 dac-e-8 kernel: Lustre: Skipped 4 previous similar messages
Aug 20 23:19:37 dac-e-8 kernel: Lustre: server umount fs1-OST005a complete
Aug 20 23:19:37 dac-e-8 kernel: Lustre: Skipped 4 previous similar messages

HOSTS -------------------------------------------------------------------------
dac-e-4
-------------------------------------------------------------------------------
-- Logs begin at Tue 2019-05-07 14:07:28 BST, end at Wed 2019-08-21 11:34:55 BST. --
Aug 20 22:50:13 dac-e-4 kernel: Lustre: fs1-OST0024: Connection restored to 60f2ae35-b178-1c80-ba50-e7c416f9922a (at 10.47.20.68@o2ib1)
Aug 20 22:50:13 dac-e-4 kernel: Lustre: Skipped 11 previous similar messages
Aug 20 22:55:51 dac-e-4 kernel: Lustre: fs1-OST002f: Connection restored to b70bd8d1-bef1-cafa-1d50-bfa93684ff22 (at 10.47.21.37@o2ib1)
Aug 20 22:55:51 dac-e-4 kernel: Lustre: Skipped 77 previous similar messages
Aug 20 22:55:53 dac-e-4 kernel: LNetError: 42622:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-103, 0)
Aug 20 22:55:53 dac-e-4 kernel: LustreError: 42624:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff97fc0e3eee00
Aug 20 22:55:53 dac-e-4 kernel: LustreError: 42626:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff97fc0e3eee00
Aug 20 22:55:53 dac-e-4 kernel: LustreError: 42678:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff97fc0e3eee00
Aug 20 22:55:53 dac-e-4 kernel: LustreError: 42623:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff97fc0e3eee00
Aug 20 22:55:53 dac-e-4 kernel: LustreError: 42625:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff97fc0e3eee00
Aug 20 22:55:53 dac-e-4 kernel: LustreError: 42623:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff97fc0e3eee00
Aug 20 22:55:53 dac-e-4 kernel: LustreError: 42625:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff97fc0e3eee00
Aug 20 22:55:53 dac-e-4 kernel: LustreError: 42626:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff97fc0e3eee00
Aug 20 22:55:53 dac-e-4 kernel: LustreError: 42678:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff97fc0e3eee00
Aug 20 22:55:53 dac-e-4 kernel: LustreError: 42623:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff97fc0e3eee00
Aug 20 22:55:53 dac-e-4 kernel: LustreError: 42624:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff97fc0e3eee00
Aug 20 22:55:53 dac-e-4 kernel: LustreError: 42625:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff97fc0e3eee00
Aug 20 22:55:53 dac-e-4 kernel: LustreError: 42626:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff97fc0e3eee00
Aug 20 22:55:53 dac-e-4 kernel: LustreError: 42678:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff97fc0e3eee00
Aug 20 22:55:53 dac-e-4 kernel: LustreError: 42624:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff97fc0e3eee00
Aug 20 22:55:53 dac-e-4 kernel: LustreError: 42623:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff97fc0e3eee00
Aug 20 22:55:53 dac-e-4 kernel: LustreError: 42622:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff97fc13a43200
Aug 20 22:55:53 dac-e-4 kernel: LustreError: 42622:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff97fc13a43200
Aug 20 22:55:53 dac-e-4 kernel: LustreError: 42624:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff97fc13a43200
Aug 20 22:55:53 dac-e-4 kernel: LustreError: 42626:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff97fc13a43200
Aug 20 22:55:54 dac-e-4 kernel: Lustre: fs1-OST002b: Client 1c9f1f8c-ae8f-76b3-441d-4d74e9e3a7df (at 10.47.21.38@o2ib1) reconnecting
Aug 20 22:55:54 dac-e-4 kernel: LustreError: 226263:0:(ldlm_lib.c:3253:target_bulk_io()) @@@ Reconnect on bulk READ  req@ffff97fd2cee9850 x1642422175275568/t0(0) o3->1c9f1f8c-ae8f-76b3-441d-4d74e9e3a7df@10.47.21.38@o2ib1:62/0 lens 488/440 e 0 to 0 dl 1566338162 ref 1 fl Interpret:/0/0 rc 0/0
Aug 20 22:55:54 dac-e-4 kernel: Lustre: fs1-OST002b: Bulk IO read error with 1c9f1f8c-ae8f-76b3-441d-4d74e9e3a7df (at 10.47.21.38@o2ib1), client will retry: rc -110
Aug 20 22:55:55 dac-e-4 kernel: Lustre: fs1-OST002e: Client efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1) reconnecting
Aug 20 22:55:55 dac-e-4 kernel: Lustre: Skipped 2 previous similar messages
Aug 20 22:55:56 dac-e-4 kernel: LustreError: 42630:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff981393ed2800
Aug 20 22:55:56 dac-e-4 kernel: LustreError: 42679:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff981393ed2800
Aug 20 22:55:56 dac-e-4 kernel: LustreError: 42628:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff981393ed2800
Aug 20 22:55:56 dac-e-4 kernel: LustreError: 42629:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff981393ed2800
Aug 20 22:55:56 dac-e-4 kernel: LustreError: 42630:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff981393ed2800
Aug 20 22:55:56 dac-e-4 kernel: LustreError: 42679:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff981393ed2800
Aug 20 22:55:56 dac-e-4 kernel: LustreError: 42628:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff981393ed2800
Aug 20 22:55:56 dac-e-4 kernel: LustreError: 42627:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff981393ed2800
Aug 20 22:55:56 dac-e-4 kernel: LustreError: 42629:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff981393ed2800
Aug 20 22:55:56 dac-e-4 kernel: LustreError: 42627:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff981393ed2800
Aug 20 22:55:56 dac-e-4 kernel: LustreError: 42629:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff981393ed2800
Aug 20 22:55:56 dac-e-4 kernel: LustreError: 42679:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff981393ed2800
Aug 20 22:55:56 dac-e-4 kernel: LustreError: 42679:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff981393ed2800
Aug 20 22:55:56 dac-e-4 kernel: LustreError: 42628:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff981393ed2800
Aug 20 22:55:56 dac-e-4 kernel: LustreError: 42629:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff981393ed2800
Aug 20 22:55:56 dac-e-4 kernel: LustreError: 42628:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff981393ed2800
Aug 20 22:55:56 dac-e-4 kernel: Lustre: fs1-OST002f: Client efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1) reconnecting
Aug 20 22:55:56 dac-e-4 kernel: Lustre: Skipped 2 previous similar messages
Aug 20 22:55:56 dac-e-4 kernel: LustreError: 132350:0:(ldlm_lib.c:3253:target_bulk_io()) @@@ Reconnect on bulk READ  req@ffff98143927b850 x1642422175275248/t0(0) o3->efd44c75-059d-5b3e-f4e6-657896b473fc@10.47.21.34@o2ib1:82/0 lens 488/440 e 1 to 0 dl 1566338182 ref 1 fl Interpret:/0/0 rc 0/0
Aug 20 22:55:56 dac-e-4 kernel: Lustre: fs1-OST002f: Bulk IO read error with efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1), client will retry: rc -110
Aug 20 22:56:16 dac-e-4 kernel: Lustre: fs1-OST002e: Client efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1) reconnecting
Aug 20 22:56:49 dac-e-4 kernel: LustreError: 252336:0:(ldlm_lib.c:3259:target_bulk_io()) @@@ network error on bulk READ  req@ffff97fceaf4f050 x1642422175275248/t0(0) o3->d51aba7f-3e9b-b409-76a8-ec3ceb1b1ca5@10.47.21.35@o2ib1:119/0 lens 488/440 e 1 to 0 dl 1566338219 ref 1 fl Interpret:/0/0 rc 0/0
Aug 20 22:56:49 dac-e-4 kernel: Lustre: fs1-OST0024: Bulk IO read error with d51aba7f-3e9b-b409-76a8-ec3ceb1b1ca5 (at 10.47.21.35@o2ib1), client will retry: rc -110
Aug 20 23:01:37 dac-e-4 kernel: Lustre: fs1-OST0027: Connection restored to efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1)
Aug 20 23:01:37 dac-e-4 kernel: Lustre: Skipped 36 previous similar messages
Aug 20 23:03:07 dac-e-4 kernel: Lustre: fs1-OST0024: Client d51aba7f-3e9b-b409-76a8-ec3ceb1b1ca5 (at 10.47.21.35@o2ib1) reconnecting
Aug 20 23:09:09 dac-e-4 kernel: Lustre: fs1-OST0025: Connection restored to 60f2ae35-b178-1c80-ba50-e7c416f9922a (at 10.47.20.68@o2ib1)
Aug 20 23:09:09 dac-e-4 kernel: Lustre: Skipped 1 previous similar message
Aug 20 23:17:41 dac-e-4 kernel: perf: interrupt took too long (4904 > 4901), lowering kernel.perf_event_max_sample_rate to 40000
Aug 20 23:19:10 dac-e-4 kernel: LustreError: 11-0: fs1-MDT0000-lwp-MDT0003: operation mds_disconnect to node 10.47.18.1@o2ib1 failed: rc = -107
Aug 20 23:19:10 dac-e-4 kernel: Lustre: Failing over fs1-MDT0003
Aug 20 23:19:10 dac-e-4 kernel: LustreError: 236112:0:(osp_dev.c:485:osp_disconnect()) fs1-MDT000f-osp-MDT0003: can't disconnect: rc = -19
Aug 20 23:19:10 dac-e-4 kernel: LustreError: 236112:0:(lod_dev.c:267:lod_sub_process_config()) fs1-MDT0003-mdtlov: error cleaning up LOD index 15: cmd 0xcf031 : rc = -19
Aug 20 23:19:10 dac-e-4 kernel: Lustre: server umount fs1-MDT0003 complete
Aug 20 23:19:11 dac-e-4 kernel: LustreError: 137-5: fs1-MDT0003_UUID: not available for connect from 10.47.18.5@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server.
Aug 20 23:19:13 dac-e-4 kernel: LustreError: 137-5: fs1-MDT0003_UUID: not available for connect from 10.47.18.8@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server.
Aug 20 23:19:15 dac-e-4 kernel: LustreError: 137-5: fs1-MDT0003_UUID: not available for connect from 10.47.18.9@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server.
Aug 20 23:19:24 dac-e-4 kernel: Lustre: 240528:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339558/real 1566339558]  req@ffff98159a7a2d00 x1642422109238720/t0(0) o39->fs1-MDT0000-lwp-OST0024@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339564 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:26 dac-e-4 kernel: Lustre: 241053:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339560/real 1566339560]  req@ffff97fcbfff0480 x1642422109238736/t0(0) o39->fs1-MDT0000-lwp-OST002d@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339566 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:27 dac-e-4 kernel: Lustre: 241080:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339561/real 1566339561]  req@ffff97fcbfff1680 x1642422109238752/t0(0) o39->fs1-MDT0000-lwp-OST002f@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339567 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:29 dac-e-4 kernel: Lustre: 241130:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339563/real 1566339563]  req@ffff97fcbfff3a80 x1642422109238784/t0(0) o39->fs1-MDT0000-lwp-OST0028@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339569 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:29 dac-e-4 kernel: Lustre: 241130:0:(client.c:2134:ptlrpc_expire_one_request()) Skipped 1 previous similar message
Aug 20 23:19:34 dac-e-4 kernel: Lustre: 241105:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339568/real 1566339568]  req@ffff98142a242400 x1642422109243344/t0(0) o39->fs1-MDT0009-lwp-OST0026@10.47.18.10@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339574 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:34 dac-e-4 kernel: Lustre: 241105:0:(client.c:2134:ptlrpc_expire_one_request()) Skipped 6 previous similar messages
Aug 20 23:19:35 dac-e-4 kernel: Lustre: fs1-MDT0010-lwp-OST0026: Connection to fs1-MDT0010 (at 10.47.18.17@o2ib1) was lost; in progress operations using this service will wait for recovery to complete
Aug 20 23:19:35 dac-e-4 kernel: Lustre: Failing over fs1-OST0028
Aug 20 23:19:36 dac-e-4 kernel: Lustre: server umount fs1-OST0028 complete
Aug 20 23:19:37 dac-e-4 kernel: Lustre: Failing over fs1-OST002a
Aug 20 23:19:37 dac-e-4 kernel: Lustre: Skipped 2 previous similar messages
Aug 20 23:19:37 dac-e-4 kernel: Lustre: server umount fs1-OST002a complete
Aug 20 23:19:37 dac-e-4 kernel: Lustre: Skipped 2 previous similar messages
Aug 20 23:19:39 dac-e-4 kernel: Lustre: Failing over fs1-OST002f
Aug 20 23:19:39 dac-e-4 kernel: Lustre: Skipped 4 previous similar messages
Aug 20 23:19:39 dac-e-4 kernel: Lustre: server umount fs1-OST002f complete
Aug 20 23:19:39 dac-e-4 kernel: Lustre: Skipped 4 previous similar messages

HOSTS -------------------------------------------------------------------------
dac-e-3
-------------------------------------------------------------------------------
-- Logs begin at Wed 2019-05-08 14:07:47 BST, end at Wed 2019-08-21 11:34:55 BST. --
Aug 20 22:50:13 dac-e-3 kernel: Lustre: fs1-OST0018: Connection restored to 60f2ae35-b178-1c80-ba50-e7c416f9922a (at 10.47.20.68@o2ib1)
Aug 20 22:50:13 dac-e-3 kernel: Lustre: Skipped 8 previous similar messages
Aug 20 22:55:51 dac-e-3 kernel: Lustre: fs1-OST001d: Connection restored to 4f2faf4f-1754-0923-7bb5-26c935576df5 (at 10.47.21.36@o2ib1)
Aug 20 22:55:51 dac-e-3 kernel: Lustre: Skipped 73 previous similar messages
Aug 20 22:55:53 dac-e-3 kernel: LustreError: 110604:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff99276c3f6e00
Aug 20 22:55:53 dac-e-3 kernel: LustreError: 110606:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff99276c3f6e00
Aug 20 22:55:53 dac-e-3 kernel: LustreError: 110603:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff99276c3f6e00
Aug 20 22:55:53 dac-e-3 kernel: LustreError: 110660:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff99276c3f6e00
Aug 20 22:55:53 dac-e-3 kernel: Lustre: 201271:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1566338151/real 1566338153]  req@ffff9940fb47ad00 x1642422107577920/t0(0) o104->fs1-OST0023@10.47.21.35@o2ib1:15/16 lens 296/224 e 0 to 1 dl 1566338158 ref 1 fl Rpc:eX/0/ffffffff rc 0/-1
Aug 20 22:55:53 dac-e-3 kernel: LNetError: 110598:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-103, 0)
Aug 20 22:55:53 dac-e-3 kernel: LustreError: 110599:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff99276fdcf800
Aug 20 22:55:53 dac-e-3 kernel: LustreError: 110600:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff99276fdcf800
Aug 20 22:55:53 dac-e-3 kernel: LustreError: 110602:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff99276fdcf800
Aug 20 22:55:53 dac-e-3 kernel: LustreError: 110598:0:(events.c:450:server_bulk_callback()) event type 5, status -103, desc ffff99276fdcf800
Aug 20 22:55:54 dac-e-3 kernel: Lustre: fs1-OST001e: Client d51aba7f-3e9b-b409-76a8-ec3ceb1b1ca5 (at 10.47.21.35@o2ib1) reconnecting
Aug 20 22:55:54 dac-e-3 kernel: LNet: 110600:0:(o2iblnd_cb.c:413:kiblnd_handle_rx()) PUT_NACK from 10.47.21.35@o2ib1
Aug 20 22:55:54 dac-e-3 kernel: LustreError: 278399:0:(ldlm_lib.c:3253:target_bulk_io()) @@@ Reconnect on bulk READ  req@ffff9928b8f0a050 x1642422175275376/t0(0) o3->d51aba7f-3e9b-b409-76a8-ec3ceb1b1ca5@10.47.21.35@o2ib1:94/0 lens 488/440 e 0 to 0 dl 1566338194 ref 1 fl Interpret:/0/0 rc 0/0
Aug 20 22:55:54 dac-e-3 kernel: Lustre: fs1-OST001e: Bulk IO read error with d51aba7f-3e9b-b409-76a8-ec3ceb1b1ca5 (at 10.47.21.35@o2ib1), client will retry: rc -110
Aug 20 22:55:55 dac-e-3 kernel: Lustre: fs1-OST0020: Client efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1) reconnecting
Aug 20 22:55:55 dac-e-3 kernel: Lustre: 199329:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1566338154/real 1566338155]  req@ffff9928995b4380 x1642422107582384/t0(0) o105->fs1-OST0023@10.47.21.34@o2ib1:15/16 lens 360/224 e 0 to 1 dl 1566338161 ref 1 fl Rpc:eX/0/ffffffff rc 0/-1
Aug 20 22:55:58 dac-e-3 kernel: Lustre: fs1-OST0023: Client efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1) reconnecting
Aug 20 22:55:58 dac-e-3 kernel: Lustre: Skipped 2 previous similar messages
Aug 20 22:56:16 dac-e-3 kernel: Lustre: fs1-OST0019: Client efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1) reconnecting
Aug 20 22:56:16 dac-e-3 kernel: Lustre: Skipped 204 previous similar messages
Aug 20 22:56:41 dac-e-3 kernel: Lustre: fs1-OST001e: Client efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1) reconnecting
Aug 20 22:56:41 dac-e-3 kernel: Lustre: Skipped 3 previous similar messages
Aug 20 22:56:49 dac-e-3 kernel: LustreError: 282284:0:(ldlm_lib.c:3259:target_bulk_io()) @@@ network error on bulk READ  req@ffff993f8dac2050 x1642422175277776/t0(0) o3->3f28bd14-132c-e629-0b42-8ea0fdd5d2a4@10.47.21.31@o2ib1:119/0 lens 488/440 e 1 to 0 dl 1566338219 ref 1 fl Interpret:/0/0 rc 0/0
Aug 20 22:56:49 dac-e-3 kernel: Lustre: fs1-OST0018: Bulk IO read error with 3f28bd14-132c-e629-0b42-8ea0fdd5d2a4 (at 10.47.21.31@o2ib1), client will retry: rc -110
Aug 20 23:01:29 dac-e-3 kernel: Lustre: fs1-OST0018: Client 3f28bd14-132c-e629-0b42-8ea0fdd5d2a4 (at 10.47.21.31@o2ib1) reconnecting
Aug 20 23:01:29 dac-e-3 kernel: Lustre: fs1-OST0018: Connection restored to 3f28bd14-132c-e629-0b42-8ea0fdd5d2a4 (at 10.47.21.31@o2ib1)
Aug 20 23:01:29 dac-e-3 kernel: Lustre: Skipped 245 previous similar messages
Aug 20 23:09:09 dac-e-3 kernel: Lustre: fs1-OST0018: Connection restored to 60f2ae35-b178-1c80-ba50-e7c416f9922a (at 10.47.20.68@o2ib1)
Aug 20 23:09:09 dac-e-3 kernel: Lustre: Skipped 1 previous similar message
Aug 20 23:19:10 dac-e-3 kernel: LustreError: 11-0: fs1-MDT0000-lwp-MDT0002: operation mds_disconnect to node 10.47.18.1@o2ib1 failed: rc = -107
Aug 20 23:19:10 dac-e-3 kernel: Lustre: Failing over fs1-MDT0002
Aug 20 23:19:10 dac-e-3 kernel: Lustre: fs1-MDT0002: Not available for connect from 10.47.18.9@o2ib1 (stopping)
Aug 20 23:19:10 dac-e-3 kernel: Lustre: server umount fs1-MDT0002 complete
Aug 20 23:19:11 dac-e-3 kernel: LustreError: 137-5: fs1-MDT0002_UUID: not available for connect from 10.47.18.5@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server.
Aug 20 23:19:15 dac-e-3 kernel: LustreError: 137-5: fs1-MDT0002_UUID: not available for connect from 10.47.18.8@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server.
Aug 20 23:19:24 dac-e-3 kernel: Lustre: 266781:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339558/real 1566339558]  req@ffff992906363a80 x1642422109238784/t0(0) o39->fs1-MDT0000-lwp-OST0018@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339564 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:26 dac-e-3 kernel: Lustre: 267408:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339560/real 1566339560]  req@ffff9928f627da00 x1642422109238800/t0(0) o39->fs1-MDT0000-lwp-OST0021@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339566 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:27 dac-e-3 kernel: Lustre: 267434:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339561/real 1566339561]  req@ffff9928f6279200 x1642422109238816/t0(0) o39->fs1-MDT0000-lwp-OST0023@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339567 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:29 dac-e-3 kernel: Lustre: 267700:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339563/real 1566339563]  req@ffff993f67a04c80 x1642422109238848/t0(0) o39->fs1-MDT0000-lwp-OST001c@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339569 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:29 dac-e-3 kernel: Lustre: 267700:0:(client.c:2134:ptlrpc_expire_one_request()) Skipped 1 previous similar message
Aug 20 23:19:34 dac-e-3 kernel: Lustre: 267460:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339568/real 1566339568]  req@ffff9928f627ec00 x1642422109243408/t0(0) o39->fs1-MDT0009-lwp-OST001a@10.47.18.10@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339574 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:34 dac-e-3 kernel: Lustre: 267460:0:(client.c:2134:ptlrpc_expire_one_request()) Skipped 6 previous similar messages
Aug 20 23:19:35 dac-e-3 kernel: Lustre: fs1-MDT0010-lwp-OST001f: Connection to fs1-MDT0010 (at 10.47.18.17@o2ib1) was lost; in progress operations using this service will wait for recovery to complete
Aug 20 23:19:35 dac-e-3 kernel: Lustre: Failing over fs1-OST001c
Aug 20 23:19:36 dac-e-3 kernel: Lustre: server umount fs1-OST001c complete
Aug 20 23:19:37 dac-e-3 kernel: Lustre: Failing over fs1-OST001e
Aug 20 23:19:37 dac-e-3 kernel: Lustre: Skipped 2 previous similar messages
Aug 20 23:19:37 dac-e-3 kernel: Lustre: server umount fs1-OST001e complete
Aug 20 23:19:37 dac-e-3 kernel: Lustre: Skipped 2 previous similar messages
Aug 20 23:19:39 dac-e-3 kernel: Lustre: Failing over fs1-OST0023
Aug 20 23:19:39 dac-e-3 kernel: Lustre: Skipped 4 previous similar messages
Aug 20 23:19:39 dac-e-3 kernel: Lustre: server umount fs1-OST0023 complete
Aug 20 23:19:39 dac-e-3 kernel: Lustre: Skipped 4 previous similar messages

HOSTS -------------------------------------------------------------------------
dac-e-5
-------------------------------------------------------------------------------
-- Logs begin at Thu 2019-05-09 08:54:36 BST, end at Wed 2019-08-21 11:34:55 BST. --
Aug 20 22:50:13 dac-e-5 kernel: Lustre: fs1-OST0030: Connection restored to 60f2ae35-b178-1c80-ba50-e7c416f9922a (at 10.47.20.68@o2ib1)
Aug 20 22:50:13 dac-e-5 kernel: Lustre: Skipped 9 previous similar messages
Aug 20 22:55:51 dac-e-5 kernel: Lustre: fs1-OST0031: Connection restored to 9a3f8291-8d90-e819-b6ab-9b2c8e66825a (at 10.47.20.69@o2ib1)
Aug 20 22:55:51 dac-e-5 kernel: Lustre: Skipped 79 previous similar messages
Aug 20 22:55:53 dac-e-5 kernel: Lustre: 288170:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1566338151/real 1566338153]  req@ffff883dcdb4ba80 x1642422107579168/t0(0) o104->fs1-OST003a@10.47.21.35@o2ib1:15/16 lens 296/224 e 0 to 1 dl 1566338162 ref 1 fl Rpc:eX/0/ffffffff rc 0/-1
Aug 20 22:55:54 dac-e-5 kernel: LustreError: 189637:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff882533a02800
Aug 20 22:55:54 dac-e-5 kernel: LustreError: 189639:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff882533a02800
Aug 20 22:55:54 dac-e-5 kernel: LustreError: 189636:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff882533a02800
Aug 20 22:56:49 dac-e-5 kernel: LustreError: 313797:0:(ldlm_lib.c:3259:target_bulk_io()) @@@ network error on bulk READ  req@ffff8824ae210050 x1642422175275008/t0(0) o3->d51aba7f-3e9b-b409-76a8-ec3ceb1b1ca5@10.47.21.35@o2ib1:119/0 lens 488/440 e 1 to 0 dl 1566338219 ref 1 fl Interpret:/0/0 rc 0/0
Aug 20 22:56:49 dac-e-5 kernel: Lustre: fs1-OST0036: Bulk IO read error with d51aba7f-3e9b-b409-76a8-ec3ceb1b1ca5 (at 10.47.21.35@o2ib1), client will retry: rc -110
Aug 20 23:01:37 dac-e-5 kernel: Lustre: fs1-OST0033: Connection restored to efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1)
Aug 20 23:01:37 dac-e-5 kernel: Lustre: Skipped 25 previous similar messages
Aug 20 23:03:38 dac-e-5 kernel: Lustre: fs1-OST0036: Client d51aba7f-3e9b-b409-76a8-ec3ceb1b1ca5 (at 10.47.21.35@o2ib1) reconnecting
Aug 20 23:09:09 dac-e-5 kernel: Lustre: fs1-OST0031: Connection restored to 60f2ae35-b178-1c80-ba50-e7c416f9922a (at 10.47.20.68@o2ib1)
Aug 20 23:09:09 dac-e-5 kernel: Lustre: Skipped 3 previous similar messages
Aug 20 23:19:11 dac-e-5 kernel: LustreError: 11-0: fs1-MDT0010-osp-MDT0004: operation mds_statfs to node 10.47.18.17@o2ib1 failed: rc = -107
Aug 20 23:19:11 dac-e-5 kernel: Lustre: fs1-MDT0010-osp-MDT0004: Connection to fs1-MDT0010 (at 10.47.18.17@o2ib1) was lost; in progress operations using this service will wait for recovery to complete
Aug 20 23:19:11 dac-e-5 kernel: LustreError: 11-0: fs1-MDT0012-osp-MDT0004: operation mds_statfs to node 10.47.18.19@o2ib1 failed: rc = -107
Aug 20 23:19:11 dac-e-5 kernel: LustreError: Skipped 6 previous similar messages
Aug 20 23:19:11 dac-e-5 kernel: Lustre: fs1-MDT0012-osp-MDT0004: Connection to fs1-MDT0012 (at 10.47.18.19@o2ib1) was lost; in progress operations using this service will wait for recovery to complete
Aug 20 23:19:11 dac-e-5 kernel: Lustre: Skipped 6 previous similar messages
Aug 20 23:19:13 dac-e-5 kernel: LustreError: 11-0: fs1-MDT0013-osp-MDT0004: operation mds_statfs to node 10.47.18.20@o2ib1 failed: rc = -107
Aug 20 23:19:13 dac-e-5 kernel: LustreError: Skipped 12 previous similar messages
Aug 20 23:19:13 dac-e-5 kernel: Lustre: fs1-MDT0013-osp-MDT0004: Connection to fs1-MDT0013 (at 10.47.18.20@o2ib1) was lost; in progress operations using this service will wait for recovery to complete
Aug 20 23:19:13 dac-e-5 kernel: Lustre: Skipped 12 previous similar messages
Aug 20 23:19:16 dac-e-5 kernel: LustreError: 11-0: fs1-MDT0007-osp-MDT0004: operation mds_statfs to node 10.47.18.8@o2ib1 failed: rc = -107
Aug 20 23:19:16 dac-e-5 kernel: Lustre: fs1-MDT0007-osp-MDT0004: Connection to fs1-MDT0007 (at 10.47.18.8@o2ib1) was lost; in progress operations using this service will wait for recovery to complete
Aug 20 23:19:16 dac-e-5 kernel: Lustre: 342559:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339550/real 1566339550]  req@ffff88255061a400 x1642422109233536/t0(0) o39->fs1-MDT0000-lwp-MDT0004@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339556 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:16 dac-e-5 kernel: Lustre: Failing over fs1-MDT0004
Aug 20 23:19:17 dac-e-5 kernel: Lustre: server umount fs1-MDT0004 complete
Aug 20 23:19:24 dac-e-5 kernel: Lustre: 342597:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339558/real 1566339558]  req@ffff883c5702f500 x1642422109244112/t0(0) o39->fs1-MDT0000-lwp-OST0030@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339564 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:26 dac-e-5 kernel: Lustre: 342625:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339560/real 1566339560]  req@ffff88255061cc80 x1642422109244128/t0(0) o39->fs1-MDT0000-lwp-OST0039@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339566 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:28 dac-e-5 kernel: Lustre: 343608:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339562/real 1566339562]  req@ffff88255061d100 x1642422109244160/t0(0) o39->fs1-MDT0000-lwp-OST0032@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339568 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:28 dac-e-5 kernel: Lustre: 343608:0:(client.c:2134:ptlrpc_expire_one_request()) Skipped 1 previous similar message
Aug 20 23:19:31 dac-e-5 kernel: Lustre: fs1-MDT000a-lwp-OST0037: Connection to fs1-MDT000a (at 10.47.18.11@o2ib1) was lost; in progress operations using this service will wait for recovery to complete
Aug 20 23:19:31 dac-e-5 kernel: Lustre: Failing over fs1-OST0035
Aug 20 23:19:32 dac-e-5 kernel: Lustre: server umount fs1-OST0035 complete
Aug 20 23:19:33 dac-e-5 kernel: Lustre: Failing over fs1-OST0037
Aug 20 23:19:33 dac-e-5 kernel: Lustre: Skipped 2 previous similar messages
Aug 20 23:19:33 dac-e-5 kernel: Lustre: 343033:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339567/real 1566339567]  req@ffff8825cf633a80 x1642422109248784/t0(0) o39->fs1-MDT0009-lwp-OST003b@10.47.18.10@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339573 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:33 dac-e-5 kernel: Lustre: 343033:0:(client.c:2134:ptlrpc_expire_one_request()) Skipped 273 previous similar messages
Aug 20 23:19:33 dac-e-5 kernel: Lustre: server umount fs1-OST0037 complete
Aug 20 23:19:33 dac-e-5 kernel: Lustre: Skipped 2 previous similar messages
Aug 20 23:19:35 dac-e-5 kernel: Lustre: Failing over fs1-OST0034
Aug 20 23:19:35 dac-e-5 kernel: Lustre: Skipped 4 previous similar messages
Aug 20 23:19:36 dac-e-5 kernel: Lustre: server umount fs1-OST0034 complete
Aug 20 23:19:36 dac-e-5 kernel: Lustre: Skipped 4 previous similar messages

HOSTS -------------------------------------------------------------------------
dac-e-7
-------------------------------------------------------------------------------
-- Logs begin at Tue 2019-05-07 14:07:28 BST, end at Wed 2019-08-21 11:34:55 BST. --
Aug 20 22:50:13 dac-e-7 kernel: Lustre: fs1-OST0048: Connection restored to 60f2ae35-b178-1c80-ba50-e7c416f9922a (at 10.47.20.68@o2ib1)
Aug 20 22:50:13 dac-e-7 kernel: Lustre: Skipped 11 previous similar messages
Aug 20 22:55:51 dac-e-7 kernel: Lustre: fs1-OST004c: Connection restored to d51aba7f-3e9b-b409-76a8-ec3ceb1b1ca5 (at 10.47.21.35@o2ib1)
Aug 20 22:55:51 dac-e-7 kernel: Lustre: fs1-OST004e: Connection restored to b70bd8d1-bef1-cafa-1d50-bfa93684ff22 (at 10.47.21.37@o2ib1)
Aug 20 22:55:51 dac-e-7 kernel: Lustre: Skipped 74 previous similar messages
Aug 20 22:55:53 dac-e-7 kernel: Lustre: fs1-OST004b: Client 3f28bd14-132c-e629-0b42-8ea0fdd5d2a4 (at 10.47.21.31@o2ib1) reconnecting
Aug 20 22:55:53 dac-e-7 kernel: LustreError: 55109:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff966ef1a7ac00
Aug 20 22:55:53 dac-e-7 kernel: LustreError: 55110:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff966ef1a7ac00
Aug 20 22:55:53 dac-e-7 kernel: LustreError: 55118:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff966ef1a7ac00
Aug 20 22:55:53 dac-e-7 kernel: LustreError: 55107:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff966ef1a7ac00
Aug 20 22:55:53 dac-e-7 kernel: LustreError: 55108:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff966ef1a7ac00
Aug 20 22:55:53 dac-e-7 kernel: LustreError: 55109:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff966ef1a7ac00
Aug 20 22:55:53 dac-e-7 kernel: LustreError: 55110:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff966ef1a7ac00
Aug 20 22:55:53 dac-e-7 kernel: LustreError: 55118:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff966ef1a7ac00
Aug 20 22:55:53 dac-e-7 kernel: LustreError: 55110:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff966ef1a7ac00
Aug 20 22:55:53 dac-e-7 kernel: LustreError: 55118:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff966ef1a7ac00
Aug 20 22:55:53 dac-e-7 kernel: LustreError: 55109:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff966ef1a7ac00
Aug 20 22:55:53 dac-e-7 kernel: LustreError: 55110:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff966ef1a7ac00
Aug 20 22:55:53 dac-e-7 kernel: LustreError: 55109:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff966ef1a7ac00
Aug 20 22:55:53 dac-e-7 kernel: LustreError: 55108:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff966ef1a7ac00
Aug 20 22:55:53 dac-e-7 kernel: LustreError: 55110:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff966ef1a7ac00
Aug 20 22:55:53 dac-e-7 kernel: LustreError: 55107:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff966ef1a7ac00
Aug 20 22:55:54 dac-e-7 kernel: LustreError: 55114:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff966fd5b84400
Aug 20 22:55:54 dac-e-7 kernel: LustreError: 55113:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff966fd5b84400
Aug 20 22:55:54 dac-e-7 kernel: Lustre: fs1-MDT0006: Client 4f2faf4f-1754-0923-7bb5-26c935576df5 (at 10.47.21.36@o2ib1) reconnecting
Aug 20 22:55:55 dac-e-7 kernel: Lustre: 144619:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1566338153/real 1566338155]  req@ffff9687d76bda00 x1642422107581104/t0(0) o13->fs1-OST00cb-osc-MDT0006@10.47.18.17@o2ib1:7/4 lens 224/368 e 0 to 1 dl 1566338160 ref 1 fl Rpc:ReX/0/ffffffff rc 0/-1
Aug 20 22:55:55 dac-e-7 kernel: Lustre: fs1-OST00cb-osc-MDT0006: Connection to fs1-OST00cb (at 10.47.18.17@o2ib1) was lost; in progress operations using this service will wait for recovery to complete
Aug 20 22:55:56 dac-e-7 kernel: Lustre: fs1-OST004c: Client 4f2faf4f-1754-0923-7bb5-26c935576df5 (at 10.47.21.36@o2ib1) reconnecting
Aug 20 22:55:56 dac-e-7 kernel: Lustre: Skipped 3 previous similar messages
Aug 20 22:55:56 dac-e-7 kernel: LustreError: 146103:0:(ldlm_lib.c:3253:target_bulk_io()) @@@ Reconnect on bulk READ  req@ffff96877147a850 x1642422175274576/t0(0) o3->4f2faf4f-1754-0923-7bb5-26c935576df5@10.47.21.36@o2ib1:94/0 lens 488/440 e 0 to 0 dl 1566338194 ref 1 fl Interpret:/0/0 rc 0/0
Aug 20 22:55:56 dac-e-7 kernel: LustreError: 146103:0:(ldlm_lib.c:3253:target_bulk_io()) Skipped 1 previous similar message
Aug 20 22:55:56 dac-e-7 kernel: Lustre: fs1-OST004c: Bulk IO read error with 4f2faf4f-1754-0923-7bb5-26c935576df5 (at 10.47.21.36@o2ib1), client will retry: rc -110
Aug 20 22:55:56 dac-e-7 kernel: Lustre: Skipped 1 previous similar message
Aug 20 22:56:23 dac-e-7 kernel: Lustre: fs1-OST004c: Client d51aba7f-3e9b-b409-76a8-ec3ceb1b1ca5 (at 10.47.21.35@o2ib1) reconnecting
Aug 20 22:56:24 dac-e-7 kernel: LustreError: 237376:0:(ldlm_lib.c:3253:target_bulk_io()) @@@ Reconnect on bulk READ  req@ffff966f60702050 x1642422175275328/t0(0) o3->d51aba7f-3e9b-b409-76a8-ec3ceb1b1ca5@10.47.21.35@o2ib1:94/0 lens 488/440 e 0 to 0 dl 1566338194 ref 1 fl Interpret:/0/0 rc 0/0
Aug 20 22:56:24 dac-e-7 kernel: Lustre: fs1-OST004c: Bulk IO read error with d51aba7f-3e9b-b409-76a8-ec3ceb1b1ca5 (at 10.47.21.35@o2ib1), client will retry: rc -110
Aug 20 23:03:07 dac-e-7 kernel: Lustre: fs1-OST0048: Connection restored to d51aba7f-3e9b-b409-76a8-ec3ceb1b1ca5 (at 10.47.21.35@o2ib1)
Aug 20 23:03:07 dac-e-7 kernel: Lustre: Skipped 38 previous similar messages
Aug 20 23:03:25 dac-e-7 kernel: Lustre: fs1-OST0049: Client 3f28bd14-132c-e629-0b42-8ea0fdd5d2a4 (at 10.47.21.31@o2ib1) reconnecting
Aug 20 23:03:25 dac-e-7 kernel: Lustre: Skipped 1 previous similar message
Aug 20 23:09:09 dac-e-7 kernel: Lustre: fs1-OST004f: Connection restored to 60f2ae35-b178-1c80-ba50-e7c416f9922a (at 10.47.20.68@o2ib1)
Aug 20 23:09:09 dac-e-7 kernel: Lustre: Skipped 5 previous similar messages
Aug 20 23:19:10 dac-e-7 kernel: LustreError: 11-0: fs1-MDT0000-lwp-MDT0006: operation mds_disconnect to node 10.47.18.1@o2ib1 failed: rc = -107
Aug 20 23:19:10 dac-e-7 kernel: Lustre: Failing over fs1-MDT0006
Aug 20 23:19:10 dac-e-7 kernel: Lustre: fs1-MDT0006: Not available for connect from 10.47.18.9@o2ib1 (stopping)
Aug 20 23:19:11 dac-e-7 kernel: Lustre: server umount fs1-MDT0006 complete
Aug 20 23:19:11 dac-e-7 kernel: LustreError: 137-5: fs1-MDT0006_UUID: not available for connect from 10.47.18.5@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server.
Aug 20 23:19:13 dac-e-7 kernel: LustreError: 137-5: fs1-MDT0006_UUID: not available for connect from 10.47.18.8@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server.
Aug 20 23:19:24 dac-e-7 kernel: Lustre: 251604:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339558/real 1566339558]  req@ffff966e032cf500 x1642422109233776/t0(0) o39->fs1-MDT0000-lwp-OST0048@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339564 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:26 dac-e-7 kernel: Lustre: 252621:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339560/real 1566339560]  req@ffff966dabeb7980 x1642422109233792/t0(0) o39->fs1-MDT0000-lwp-OST0051@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339566 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:27 dac-e-7 kernel: Lustre: 144615:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339560/real 1566339560]  req@ffff966f79e54800 x1642422109237904/t0(0) o400->fs1-MDT0002-lwp-OST004d@10.47.18.3@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339567 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:27 dac-e-7 kernel: Lustre: fs1-MDT0002-lwp-OST004d: Connection to fs1-MDT0002 (at 10.47.18.3@o2ib1) was lost; in progress operations using this service will wait for recovery to complete
Aug 20 23:19:27 dac-e-7 kernel: Lustre: Failing over fs1-OST0053
Aug 20 23:19:27 dac-e-7 kernel: Lustre: server umount fs1-OST0053 complete
Aug 20 23:19:28 dac-e-7 kernel: Lustre: Failing over fs1-OST004a
Aug 20 23:19:28 dac-e-7 kernel: Lustre: Skipped 1 previous similar message
Aug 20 23:19:28 dac-e-7 kernel: Lustre: server umount fs1-OST004a complete
Aug 20 23:19:28 dac-e-7 kernel: Lustre: Skipped 1 previous similar message
Aug 20 23:19:29 dac-e-7 kernel: Lustre: 252698:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339563/real 1566339563]  req@ffff9686c18d5100 x1642422109238432/t0(0) o39->fs1-MDT0000-lwp-OST004c@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339569 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:29 dac-e-7 kernel: Lustre: 252698:0:(client.c:2134:ptlrpc_expire_one_request()) Skipped 274 previous similar messages
Aug 20 23:19:31 dac-e-7 kernel: Lustre: Failing over fs1-OST004e
Aug 20 23:19:31 dac-e-7 kernel: Lustre: Skipped 4 previous similar messages
Aug 20 23:19:31 dac-e-7 kernel: Lustre: server umount fs1-OST004e complete
Aug 20 23:19:31 dac-e-7 kernel: Lustre: Skipped 4 previous similar messages

HOSTS -------------------------------------------------------------------------
dac-e-24
-------------------------------------------------------------------------------
-- Logs begin at Wed 2019-05-08 14:07:47 BST, end at Wed 2019-08-21 11:34:55 BST. --
Aug 20 22:50:13 dac-e-24 kernel: Lustre: fs1-OST0114: Connection restored to 60f2ae35-b178-1c80-ba50-e7c416f9922a (at 10.47.20.68@o2ib1)
Aug 20 22:50:13 dac-e-24 kernel: Lustre: Skipped 11 previous similar messages
Aug 20 22:55:51 dac-e-24 kernel: Lustre: fs1-OST011a: Connection restored to ba6d10d8-29ae-af16-bb29-fe0009074454 (at 10.47.21.33@o2ib1)
Aug 20 22:55:51 dac-e-24 kernel: Lustre: fs1-OST0116: Connection restored to 4f2faf4f-1754-0923-7bb5-26c935576df5 (at 10.47.21.36@o2ib1)
Aug 20 22:55:51 dac-e-24 kernel: Lustre: Skipped 78 previous similar messages
Aug 20 22:55:53 dac-e-24 kernel: Lustre: fs1-OST0119: Client efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1) reconnecting
Aug 20 22:55:53 dac-e-24 kernel: LustreError: 118330:0:(ldlm_lib.c:3253:target_bulk_io()) @@@ Reconnect on bulk READ  req@ffff8895a77c3050 x1642422175275760/t0(0) o3->4f2faf4f-1754-0923-7bb5-26c935576df5@10.47.21.36@o2ib1:62/0 lens 488/440 e 0 to 0 dl 1566338162 ref 1 fl Interpret:/0/0 rc 0/0
Aug 20 22:55:53 dac-e-24 kernel: LustreError: 429612:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff889565ac3200
Aug 20 22:55:53 dac-e-24 kernel: LustreError: 429611:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff88ab73606800
Aug 20 22:55:54 dac-e-24 kernel: Lustre: fs1-OST0118: Client d51aba7f-3e9b-b409-76a8-ec3ceb1b1ca5 (at 10.47.21.35@o2ib1) reconnecting
Aug 20 22:55:54 dac-e-24 kernel: Lustre: Skipped 1 previous similar message
Aug 20 22:55:54 dac-e-24 kernel: LustreError: 429615:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff889565ac3200
Aug 20 22:55:54 dac-e-24 kernel: LustreError: 429614:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff889565ac3200
Aug 20 22:55:54 dac-e-24 kernel: LustreError: 429612:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff889565ac3200
Aug 20 22:55:54 dac-e-24 kernel: LustreError: 429614:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff889565ac3200
Aug 20 22:55:54 dac-e-24 kernel: LustreError: 429613:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff889565ac3200
Aug 20 22:55:54 dac-e-24 kernel: LustreError: 114535:0:(ldlm_lib.c:3253:target_bulk_io()) @@@ Reconnect on bulk READ  req@ffff889337610050 x1642422175273808/t0(0) o3->d51aba7f-3e9b-b409-76a8-ec3ceb1b1ca5@10.47.21.35@o2ib1:94/0 lens 488/440 e 0 to 0 dl 1566338194 ref 1 fl Interpret:/0/0 rc 0/0
Aug 20 22:55:54 dac-e-24 kernel: LustreError: 429614:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff889565ac3200
Aug 20 22:55:54 dac-e-24 kernel: LustreError: 429615:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff889565ac3200
Aug 20 22:55:54 dac-e-24 kernel: LustreError: 429612:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff889565ac3200
Aug 20 22:55:54 dac-e-24 kernel: LustreError: 429666:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff889565ac3200
Aug 20 22:55:54 dac-e-24 kernel: LustreError: 429613:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff889565ac3200
Aug 20 22:55:54 dac-e-24 kernel: LustreError: 429615:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff88ab7366ae00
Aug 20 22:55:54 dac-e-24 kernel: LustreError: 429613:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff88ab7366ae00
Aug 20 22:55:54 dac-e-24 kernel: LustreError: 429612:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff88ab7366ae00
Aug 20 22:55:54 dac-e-24 kernel: LustreError: 429609:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff887e50fc9400
Aug 20 22:55:54 dac-e-24 kernel: LustreError: 429610:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff887e50fc9400
Aug 20 22:55:54 dac-e-24 kernel: LustreError: 429665:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff887e50fc9400
Aug 20 22:55:54 dac-e-24 kernel: LustreError: 429608:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff887e50fc9400
Aug 20 22:55:54 dac-e-24 kernel: LustreError: 429610:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff887e50fc9400
Aug 20 22:55:54 dac-e-24 kernel: LustreError: 429608:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff887e50fc9400
Aug 20 22:55:54 dac-e-24 kernel: LustreError: 429611:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff887e50fc9400
Aug 20 22:55:54 dac-e-24 kernel: LNet: 429609:0:(o2iblnd_cb.c:413:kiblnd_handle_rx()) PUT_NACK from 10.47.21.35@o2ib1
Aug 20 22:55:54 dac-e-24 kernel: LNet: 429609:0:(o2iblnd_cb.c:413:kiblnd_handle_rx()) Skipped 2 previous similar messages
Aug 20 22:55:54 dac-e-24 kernel: LustreError: 429666:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff88ab37d7ea00
Aug 20 22:55:54 dac-e-24 kernel: LustreError: 429615:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff88ab37d7ea00
Aug 20 22:55:54 dac-e-24 kernel: LustreError: 429612:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff88ab37d7ea00
Aug 20 22:55:54 dac-e-24 kernel: LustreError: 429614:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff88ab37d7ea00
Aug 20 22:55:54 dac-e-24 kernel: Lustre: fs1-OST011a: Bulk IO read error with 4f2faf4f-1754-0923-7bb5-26c935576df5 (at 10.47.21.36@o2ib1), client will retry: rc -110
Aug 20 22:55:55 dac-e-24 kernel: Lustre: 58798:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1566338151/real 1566338155]  req@ffff88ab98cda400 x1642422107576880/t0(0) o104->fs1-OST011b@10.47.21.34@o2ib1:15/16 lens 296/224 e 0 to 1 dl 1566338162 ref 1 fl Rpc:eX/0/ffffffff rc 0/-1
Aug 20 22:55:55 dac-e-24 kernel: LustreError: 429612:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff88ab7366ae00
Aug 20 22:55:55 dac-e-24 kernel: LustreError: 429666:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff88ab7366ae00
Aug 20 22:55:55 dac-e-24 kernel: LustreError: 429613:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff88ab7366ae00
Aug 20 22:55:55 dac-e-24 kernel: LustreError: 429613:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff88ab7366ae00
Aug 20 22:55:55 dac-e-24 kernel: LustreError: 429614:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff88ab7366ae00
Aug 20 22:55:55 dac-e-24 kernel: LustreError: 429612:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff88ab7366ae00
Aug 20 22:55:55 dac-e-24 kernel: LustreError: 429613:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff88ab7366ae00
Aug 20 22:55:55 dac-e-24 kernel: LustreError: 429614:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffff88ab7366ae00
Aug 20 22:55:56 dac-e-24 kernel: Lustre: fs1-OST011e: Client d51aba7f-3e9b-b409-76a8-ec3ceb1b1ca5 (at 10.47.21.35@o2ib1) reconnecting
Aug 20 22:55:56 dac-e-24 kernel: LustreError: 114534:0:(ldlm_lib.c:3253:target_bulk_io()) @@@ Reconnect on bulk READ  req@ffff88933761a050 x1642422175273920/t0(0) o3->d51aba7f-3e9b-b409-76a8-ec3ceb1b1ca5@10.47.21.35@o2ib1:94/0 lens 488/440 e 0 to 0 dl 1566338194 ref 1 fl Interpret:/0/0 rc 0/0
Aug 20 22:55:56 dac-e-24 kernel: Lustre: fs1-OST011e: Bulk IO read error with d51aba7f-3e9b-b409-76a8-ec3ceb1b1ca5 (at 10.47.21.35@o2ib1), client will retry: rc -110
Aug 20 22:55:56 dac-e-24 kernel: Lustre: Skipped 1 previous similar message
Aug 20 22:55:58 dac-e-24 kernel: LNet: 429607:0:(o2iblnd_cb.c:3381:kiblnd_check_conns()) Timed out tx for 10.47.21.34@o2ib1: 1 seconds
Aug 20 22:55:58 dac-e-24 kernel: Lustre: 57959:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1566338154/real 1566338158]  req@ffff889398f0ba80 x1642422107581728/t0(0) o105->fs1-OST0119@10.47.21.34@o2ib1:15/16 lens 360/224 e 0 to 1 dl 1566338165 ref 1 fl Rpc:eX/0/ffffffff rc 0/-1
Aug 20 22:55:58 dac-e-24 kernel: Lustre: 57959:0:(client.c:2134:ptlrpc_expire_one_request()) Skipped 3 previous similar messages
Aug 20 22:55:58 dac-e-24 kernel: LustreError: 429607:0:(events.c:450:server_bulk_callback()) event type 5, status -110, desc ffff889565ac3200
Aug 20 22:55:58 dac-e-24 kernel: Lustre: fs1-OST0118: Client efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1) reconnecting
Aug 20 22:55:58 dac-e-24 kernel: LNet: 429615:0:(o2iblnd_cb.c:413:kiblnd_handle_rx()) PUT_NACK from 10.47.21.34@o2ib1
Aug 20 22:55:58 dac-e-24 kernel: LNet: 429615:0:(o2iblnd_cb.c:413:kiblnd_handle_rx()) Skipped 4 previous similar messages
Aug 20 22:56:00 dac-e-24 kernel: LNet: 429607:0:(o2iblnd_cb.c:3381:kiblnd_check_conns()) Timed out tx for 10.47.21.34@o2ib1: 0 seconds
Aug 20 22:56:00 dac-e-24 kernel: LNet: 429607:0:(o2iblnd_cb.c:3381:kiblnd_check_conns()) Skipped 1 previous similar message
Aug 20 22:56:00 dac-e-24 kernel: Lustre: 59167:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1566338155/real 1566338160]  req@ffff887e9e7f5580 x1642422107576896/t0(0) o104->fs1-OST011a@10.47.21.34@o2ib1:15/16 lens 296/224 e 0 to 1 dl 1566338162 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1
Aug 20 22:56:01 dac-e-24 kernel: LNet: 429607:0:(o2iblnd_cb.c:1495:kiblnd_reconnect_peer()) Abort reconnection of 10.47.21.34@o2ib1: connected
Aug 20 22:56:02 dac-e-24 kernel: Lustre: 58797:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566338151/real 1566338154]  req@ffff88ab8f3f2d00 x1642422107576736/t0(0) o104->fs1-OST0119@10.47.21.35@o2ib1:15/16 lens 296/224 e 0 to 1 dl 1566338162 ref 1 fl Rpc:X/0/ffffffff rc 0/-1
Aug 20 22:56:16 dac-e-24 kernel: Lustre: fs1-OST0114: Client efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1) reconnecting
Aug 20 22:56:16 dac-e-24 kernel: LustreError: 118325:0:(ldlm_lib.c:3253:target_bulk_io()) @@@ Reconnect on bulk READ  req@ffff88951ff79050 x1642422175273488/t0(0) o3->efd44c75-059d-5b3e-f4e6-657896b473fc@10.47.21.34@o2ib1:94/0 lens 488/440 e 0 to 0 dl 1566338194 ref 1 fl Interpret:/0/0 rc 0/0
Aug 20 22:56:16 dac-e-24 kernel: LustreError: 118325:0:(ldlm_lib.c:3253:target_bulk_io()) Skipped 1 previous similar message
Aug 20 22:56:16 dac-e-24 kernel: Lustre: fs1-OST011a: Bulk IO read error with efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1), client will retry: rc -110
Aug 20 22:56:16 dac-e-24 kernel: Lustre: Skipped 1 previous similar message
Aug 20 22:56:41 dac-e-24 kernel: Lustre: fs1-OST011c: Client efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1) reconnecting
Aug 20 22:56:41 dac-e-24 kernel: Lustre: Skipped 6 previous similar messages
Aug 20 22:56:41 dac-e-24 kernel: LustreError: 207375:0:(ldlm_lib.c:3253:target_bulk_io()) @@@ Reconnect on bulk READ  req@ffff8894ebb4b850 x1642422175273408/t0(0) o3->efd44c75-059d-5b3e-f4e6-657896b473fc@10.47.21.34@o2ib1:119/0 lens 488/440 e 1 to 0 dl 1566338219 ref 1 fl Interpret:/0/0 rc 0/0
Aug 20 22:56:41 dac-e-24 kernel: Lustre: fs1-OST011c: Bulk IO read error with efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1), client will retry: rc -110
Aug 20 22:56:49 dac-e-24 kernel: LustreError: 118329:0:(ldlm_lib.c:3268:target_bulk_io()) @@@ truncated bulk READ 15728640(16777216)  req@ffff88951ff7d850 x1642422175280912/t0(0) o3->efd44c75-059d-5b3e-f4e6-657896b473fc@10.47.21.34@o2ib1:126/0 lens 488/440 e 1 to 0 dl 1566338226 ref 1 fl Interpret:/0/0 rc 0/0
Aug 20 22:56:49 dac-e-24 kernel: Lustre: fs1-OST0118: Bulk IO read error with efd44c75-059d-5b3e-f4e6-657896b473fc (at 10.47.21.34@o2ib1), client will retry: rc -110
Aug 20 23:00:46 dac-e-24 kernel: Lustre: fs1-OST011e: Connection restored to b70bd8d1-bef1-cafa-1d50-bfa93684ff22 (at 10.47.21.37@o2ib1)
Aug 20 23:00:46 dac-e-24 kernel: Lustre: Skipped 40 previous similar messages
Aug 20 23:03:25 dac-e-24 kernel: Lustre: fs1-OST011d: Client 3f28bd14-132c-e629-0b42-8ea0fdd5d2a4 (at 10.47.21.31@o2ib1) reconnecting
Aug 20 23:09:09 dac-e-24 kernel: Lustre: fs1-OST0119: Connection restored to 60f2ae35-b178-1c80-ba50-e7c416f9922a (at 10.47.20.68@o2ib1)
Aug 20 23:09:09 dac-e-24 kernel: Lustre: Skipped 7 previous similar messages
Aug 20 23:19:10 dac-e-24 kernel: LustreError: 11-0: fs1-MDT0000-lwp-MDT0017: operation mds_disconnect to node 10.47.18.1@o2ib1 failed: rc = -107
Aug 20 23:19:10 dac-e-24 kernel: Lustre: Failing over fs1-MDT0017
Aug 20 23:19:10 dac-e-24 kernel: LustreError: 137-5: fs1-MDT0017_UUID: not available for connect from 10.47.18.8@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server.
Aug 20 23:19:10 dac-e-24 kernel: Lustre: server umount fs1-MDT0017 complete
Aug 20 23:19:11 dac-e-24 kernel: LustreError: 137-5: fs1-MDT0017_UUID: not available for connect from 10.47.18.9@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server.
Aug 20 23:19:12 dac-e-24 kernel: LustreError: 137-5: fs1-MDT0017_UUID: not available for connect from 10.47.18.5@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server.
Aug 20 23:19:24 dac-e-24 kernel: Lustre: 93864:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339558/real 1566339558]  req@ffff889385d2cc80 x1642422109236960/t0(0) o39->fs1-MDT0000-lwp-OST0114@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339564 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:26 dac-e-24 kernel: Lustre: 94805:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339560/real 1566339560]  req@ffff887ea3604380 x1642422109236976/t0(0) o39->fs1-MDT0000-lwp-OST011d@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339566 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:27 dac-e-24 kernel: Lustre: 95538:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339561/real 1566339561]  req@ffff889385d2c380 x1642422109241584/t0(0) o39->fs1-MDT0000-lwp-OST011f@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339567 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:27 dac-e-24 kernel: Lustre: fs1-MDT000f-lwp-OST011b: Connection to fs1-MDT000f (at 10.47.18.16@o2ib1) was lost; in progress operations using this service will wait for recovery to complete
Aug 20 23:19:27 dac-e-24 kernel: Lustre: Failing over fs1-OST011e
Aug 20 23:19:27 dac-e-24 kernel: Lustre: server umount fs1-OST011e complete
Aug 20 23:19:28 dac-e-24 kernel: Lustre: Failing over fs1-OST0115
Aug 20 23:19:28 dac-e-24 kernel: Lustre: Skipped 1 previous similar message
Aug 20 23:19:29 dac-e-24 kernel: Lustre: server umount fs1-OST0115 complete
Aug 20 23:19:29 dac-e-24 kernel: Lustre: Skipped 1 previous similar message
Aug 20 23:19:29 dac-e-24 kernel: Lustre: 95728:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566339563/real 1566339563]  req@ffff8895b47fe780 x1642422109241616/t0(0) o39->fs1-MDT0000-lwp-OST0118@10.47.18.1@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566339569 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
Aug 20 23:19:29 dac-e-24 kernel: Lustre: 95728:0:(client.c:2134:ptlrpc_expire_one_request()) Skipped 276 previous similar messages
Aug 20 23:19:31 dac-e-24 kernel: Lustre: Failing over fs1-OST011a
Aug 20 23:19:31 dac-e-24 kernel: Lustre: Skipped 3 previous similar messages
Aug 20 23:19:31 dac-e-24 kernel: Lustre: server umount fs1-OST011a complete
Aug 20 23:19:31 dac-e-24 kernel: Lustre: Skipped 3 previous similar messages
