HOSTS -------------------------------------------------------------------------
cpu-e-1058
-------------------------------------------------------------------------------
-- Logs begin at Tue 2019-08-20 19:50:13 BST, end at Wed 2019-08-21 11:17:46 BST. --
Aug 20 20:12:21 cpu-e-1058 kernel: Adding 15999996k swap on /dev/sda2.  Priority:-2 extents:1 across:15999996k SSFS
Aug 20 20:14:37 cpu-e-1058 kernel: sched: RT throttling activated
Aug 20 20:31:38 cpu-e-1058 kernel: LNetError: 59394:0:(o2iblnd_cb.c:3335:kiblnd_check_txs_locked()) Timed out tx: tx_queue, 0 seconds
Aug 20 20:31:38 cpu-e-1058 kernel: LNetError: 59394:0:(o2iblnd_cb.c:3410:kiblnd_check_conns()) Timed out RDMA with 10.47.18.6@o2ib1 (6): c: 0, oc: 0, rc: 63
Aug 20 20:32:41 cpu-e-1058 kernel: Lustre: 59920:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566329492/real 1566329492]  req@ffff93d7d689a880 x1642414051656448/t0(0) o3->fs1-OST0043-osc-ffff93bfda720800@10.47.18.6@o2ib1:6/4 lens 488/440 e 1 to 1 dl 1566329561 ref 2 fl Rpc:X/0/ffffffff rc 0/-1
Aug 20 20:32:41 cpu-e-1058 kernel: Lustre: fs1-OST0043-osc-ffff93bfda720800: Connection to fs1-OST0043 (at 10.47.18.6@o2ib1) was lost; in progress operations using this service will wait for recovery to complete
Aug 20 20:32:41 cpu-e-1058 kernel: Lustre: fs1-OST0043-osc-ffff93bfda720800: Connection restored to 10.47.18.6@o2ib1 (at 10.47.18.6@o2ib1)
Aug 20 20:32:43 cpu-e-1058 kernel: Lustre: 59923:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566329492/real 1566329492]  req@ffff93d7d689ec00 x1642414051656480/t0(0) o3->fs1-OST0044-osc-ffff93bfda720800@10.47.18.6@o2ib1:6/4 lens 488/440 e 1 to 1 dl 1566329561 ref 2 fl Rpc:X/0/ffffffff rc 0/-1
Aug 20 20:32:43 cpu-e-1058 kernel: Lustre: fs1-OST0044-osc-ffff93bfda720800: Connection to fs1-OST0044 (at 10.47.18.6@o2ib1) was lost; in progress operations using this service will wait for recovery to complete
Aug 20 20:32:43 cpu-e-1058 kernel: Lustre: fs1-OST0044-osc-ffff93bfda720800: Connection restored to 10.47.18.6@o2ib1 (at 10.47.18.6@o2ib1)

HOSTS -------------------------------------------------------------------------
cpu-e-1054
-------------------------------------------------------------------------------
-- Logs begin at Tue 2019-08-20 19:50:04 BST, end at Wed 2019-08-21 11:17:46 BST. --
Aug 20 20:12:18 cpu-e-1054 kernel: Adding 15999996k swap on /dev/sda2.  Priority:-2 extents:1 across:15999996k SSFS
Aug 20 20:14:37 cpu-e-1054 kernel: sched: RT throttling activated
Aug 20 20:22:44 cpu-e-1054 kernel: Lustre: 59864:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1566328963/real 1566328964]  req@ffff8bd01b57e300 x1642414042283488/t0(0) o3->fs1-OST009b-osc-ffff8bd01ba2c000@10.47.18.13@o2ib1:6/4 lens 488/440 e 0 to 1 dl 1566329007 ref 2 fl Rpc:eX/0/ffffffff rc 0/-1
Aug 20 20:22:44 cpu-e-1054 kernel: Lustre: fs1-OST009b-osc-ffff8bd01ba2c000: Connection to fs1-OST009b (at 10.47.18.13@o2ib1) was lost; in progress operations using this service will wait for recovery to complete
Aug 20 20:22:44 cpu-e-1054 kernel: LustreError: 59356:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8bcfbcf89c00
Aug 20 20:22:44 cpu-e-1054 kernel: LustreError: 59360:0:(events.c:200:client_bulk_callback()) event type 2, status -5, desc ffff8bcfbcf89c00
Aug 20 20:22:45 cpu-e-1054 kernel: LustreError: 59356:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8bcea83ca200
Aug 20 20:22:45 cpu-e-1054 kernel: LustreError: 59356:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8bcea83ca200
Aug 20 20:22:45 cpu-e-1054 kernel: LustreError: 59356:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8be726af9000
Aug 20 20:22:45 cpu-e-1054 kernel: LustreError: 59356:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8be726af9000
Aug 20 20:22:45 cpu-e-1054 kernel: LustreError: 59356:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8be726af9000
Aug 20 20:22:45 cpu-e-1054 kernel: LustreError: 59356:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8be726af9000
Aug 20 20:22:45 cpu-e-1054 kernel: LustreError: 59360:0:(events.c:200:client_bulk_callback()) event type 2, status -5, desc ffff8bcfbd22ac00
Aug 20 20:22:45 cpu-e-1054 kernel: Lustre: fs1-OST0081-osc-ffff8bd01ba2c000: Connection restored to 10.47.18.11@o2ib1 (at 10.47.18.11@o2ib1)
Aug 20 20:22:46 cpu-e-1054 kernel: Lustre: 59893:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1566328963/real 1566328963]  req@ffff8be81460ad00 x1642414042283056/t0(0) o3->fs1-OST0036-osc-ffff8bd01ba2c000@10.47.18.5@o2ib1:6/4 lens 488/440 e 0 to 1 dl 1566328970 ref 2 fl Rpc:eX/0/ffffffff rc 0/-1
Aug 20 20:22:46 cpu-e-1054 kernel: Lustre: 59893:0:(client.c:2134:ptlrpc_expire_one_request()) Skipped 6 previous similar messages
Aug 20 20:22:46 cpu-e-1054 kernel: Lustre: fs1-OST0036-osc-ffff8bd01ba2c000: Connection to fs1-OST0036 (at 10.47.18.5@o2ib1) was lost; in progress operations using this service will wait for recovery to complete
Aug 20 20:22:46 cpu-e-1054 kernel: Lustre: Skipped 6 previous similar messages
Aug 20 20:22:46 cpu-e-1054 kernel: LustreError: 59357:0:(events.c:200:client_bulk_callback()) event type 2, status -5, desc ffff8bcfbd22ac00
Aug 20 20:22:46 cpu-e-1054 kernel: LustreError: 59356:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8be72606b200
Aug 20 20:22:46 cpu-e-1054 kernel: LustreError: 59356:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8be72606b200
Aug 20 20:22:46 cpu-e-1054 kernel: LustreError: 59356:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8be72606b200
Aug 20 20:22:46 cpu-e-1054 kernel: LustreError: 59356:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8be72606b200
Aug 20 20:22:46 cpu-e-1054 kernel: LustreError: 59356:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8be72606b200
Aug 20 20:22:46 cpu-e-1054 kernel: LustreError: 59356:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8be72606b200
Aug 20 20:22:46 cpu-e-1054 kernel: LustreError: 59356:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8be72606b200
Aug 20 20:22:46 cpu-e-1054 kernel: LustreError: 59356:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8be72606b200
Aug 20 20:22:46 cpu-e-1054 kernel: LustreError: 59356:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8be72606b200
Aug 20 20:22:46 cpu-e-1054 kernel: LustreError: 59356:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8be72606b200
Aug 20 20:22:46 cpu-e-1054 kernel: LustreError: 59356:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8be72606b200
Aug 20 20:22:46 cpu-e-1054 kernel: LustreError: 59356:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8be72606b200
Aug 20 20:22:46 cpu-e-1054 kernel: LustreError: 59356:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8bcfbfb8f800
Aug 20 20:22:47 cpu-e-1054 kernel: LustreError: 59363:0:(events.c:200:client_bulk_callback()) event type 2, status -5, desc ffff8be72606b200
Aug 20 20:22:47 cpu-e-1054 kernel: LustreError: 59362:0:(events.c:200:client_bulk_callback()) event type 2, status -5, desc ffff8be72606b200
Aug 20 20:22:47 cpu-e-1054 kernel: LustreError: 59361:0:(events.c:200:client_bulk_callback()) event type 2, status -5, desc ffff8be72606b200
Aug 20 20:22:47 cpu-e-1054 kernel: LustreError: 59364:0:(events.c:200:client_bulk_callback()) event type 2, status -5, desc ffff8be72606b200
Aug 20 20:22:47 cpu-e-1054 kernel: Lustre: 59886:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1566328963/real 1566328963]  req@ffff8be808ed1680 x1642414042283936/t0(0) o3->fs1-OST00e0-osc-ffff8bd01ba2c000@10.47.18.19@o2ib1:6/4 lens 488/440 e 0 to 1 dl 1566329007 ref 2 fl Rpc:eXS/0/ffffffff rc -11/-1
Aug 20 20:22:47 cpu-e-1054 kernel: Lustre: 59886:0:(client.c:2134:ptlrpc_expire_one_request()) Skipped 1 previous similar message
Aug 20 20:22:47 cpu-e-1054 kernel: LNet: 19753:0:(o2iblnd.c:941:kiblnd_create_conn()) peer 10.47.18.43@o2ib1 - queue depth reduced from 128 to 63  to allow for qp creation
Aug 20 20:22:47 cpu-e-1054 kernel: LNet: 19753:0:(o2iblnd.c:941:kiblnd_create_conn()) Skipped 76 previous similar messages
Aug 20 20:22:47 cpu-e-1054 kernel: LustreError: 59356:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8be7255e7e00
Aug 20 20:22:47 cpu-e-1054 kernel: LustreError: 59356:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8be7255e7e00
Aug 20 20:22:47 cpu-e-1054 kernel: Lustre: fs1-OST00b6-osc-ffff8bd01ba2c000: Connection to fs1-OST00b6 (at 10.47.18.16@o2ib1) was lost; in progress operations using this service will wait for recovery to complete
Aug 20 20:22:47 cpu-e-1054 kernel: Lustre: Skipped 1 previous similar message
Aug 20 20:22:47 cpu-e-1054 kernel: LustreError: 59356:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8be7255e7e00
Aug 20 20:22:47 cpu-e-1054 kernel: LustreError: 59356:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8be7255e7e00
Aug 20 20:22:48 cpu-e-1054 kernel: LNetError: 59356:0:(o2iblnd_cb.c:3335:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds
Aug 20 20:22:48 cpu-e-1054 kernel: LNetError: 59356:0:(o2iblnd_cb.c:3410:kiblnd_check_conns()) Timed out RDMA with 10.47.18.14@o2ib1 (5): c: 60, oc: 1, rc: 63
Aug 20 20:22:48 cpu-e-1054 kernel: LustreError: 59356:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8be725280e00
Aug 20 20:22:48 cpu-e-1054 kernel: LustreError: 59356:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8be726af9000
Aug 20 20:22:48 cpu-e-1054 kernel: LustreError: 59356:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8be726af9000
Aug 20 20:22:48 cpu-e-1054 kernel: Lustre: fs1-OST00a4-osc-ffff8bd01ba2c000: Connection restored to 10.47.18.14@o2ib1 (at 10.47.18.14@o2ib1)
Aug 20 20:22:48 cpu-e-1054 kernel: Lustre: Skipped 6 previous similar messages
Aug 20 20:22:48 cpu-e-1054 kernel: LustreError: 59356:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8be726af9000
Aug 20 20:22:48 cpu-e-1054 kernel: LustreError: 59356:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8be726af9000
Aug 20 20:22:48 cpu-e-1054 kernel: LustreError: 59356:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8be726af9000
Aug 20 20:22:48 cpu-e-1054 kernel: LustreError: 59356:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8be726af9000
Aug 20 20:22:49 cpu-e-1054 kernel: LNetError: 59356:0:(o2iblnd_cb.c:3335:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds
Aug 20 20:22:49 cpu-e-1054 kernel: LNetError: 59356:0:(o2iblnd_cb.c:3335:kiblnd_check_txs_locked()) Skipped 3 previous similar messages
Aug 20 20:22:49 cpu-e-1054 kernel: LNetError: 59356:0:(o2iblnd_cb.c:3410:kiblnd_check_conns()) Timed out RDMA with 10.47.18.48@o2ib1 (6): c: 63, oc: 4, rc: 63
Aug 20 20:22:49 cpu-e-1054 kernel: LNetError: 59356:0:(o2iblnd_cb.c:3410:kiblnd_check_conns()) Skipped 3 previous similar messages
Aug 20 20:22:49 cpu-e-1054 kernel: LustreError: 59356:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8be725613a00
Aug 20 20:22:49 cpu-e-1054 kernel: LustreError: 59356:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8be725613a00
Aug 20 20:22:49 cpu-e-1054 kernel: LustreError: 59356:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8be725613a00
Aug 20 20:22:49 cpu-e-1054 kernel: LustreError: 59356:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8be725074200
Aug 20 20:22:49 cpu-e-1054 kernel: LNet: 19753:0:(o2iblnd.c:941:kiblnd_create_conn()) peer 10.47.18.48@o2ib1 - queue depth reduced from 128 to 63  to allow for qp creation
Aug 20 20:22:49 cpu-e-1054 kernel: LNet: 19753:0:(o2iblnd.c:941:kiblnd_create_conn()) Skipped 7 previous similar messages
Aug 20 20:22:49 cpu-e-1054 kernel: LustreError: 59356:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8be725074200
Aug 20 20:22:49 cpu-e-1054 kernel: Lustre: fs1-OST011a-osc-ffff8bd01ba2c000: Connection restored to 10.47.18.24@o2ib1 (at 10.47.18.24@o2ib1)
Aug 20 20:22:49 cpu-e-1054 kernel: LustreError: 59356:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8be725074200
Aug 20 20:22:49 cpu-e-1054 kernel: LustreError: 59356:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8be725074200
Aug 20 20:22:49 cpu-e-1054 kernel: LustreError: 59356:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8be726628a00
Aug 20 20:22:49 cpu-e-1054 kernel: LustreError: 59356:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8be726628a00
Aug 20 20:22:49 cpu-e-1054 kernel: LustreError: 59356:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8be726628a00
Aug 20 20:22:49 cpu-e-1054 kernel: LustreError: 59356:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8be726628a00
Aug 20 20:22:49 cpu-e-1054 kernel: LustreError: 59356:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8bd004eb8400
Aug 20 20:22:49 cpu-e-1054 kernel: LustreError: 59356:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8bd004eb8400
Aug 20 20:22:49 cpu-e-1054 kernel: LustreError: 59356:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8bd004eb8400
Aug 20 20:22:49 cpu-e-1054 kernel: LustreError: 59356:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8bd004eb8400
Aug 20 20:22:49 cpu-e-1054 kernel: LustreError: 59356:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8bcfc12b1a00
Aug 20 20:22:49 cpu-e-1054 kernel: LustreError: 59356:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8bcfc12b1a00
Aug 20 20:22:49 cpu-e-1054 kernel: LustreError: 59356:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8bcfc12b1a00
Aug 20 20:22:49 cpu-e-1054 kernel: LustreError: 59356:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8bcfc12b1a00
Aug 20 20:22:49 cpu-e-1054 kernel: Lustre: 59880:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1566328963/real 1566328963]  req@ffff8bd01404d100 x1642414042283328/t0(0) o3->fs1-OST00f7-osc-ffff8bd01ba2c000@10.47.18.21@o2ib1:6/4 lens 488/440 e 0 to 1 dl 1566329007 ref 2 fl Rpc:eXS/0/ffffffff rc -11/-1
Aug 20 20:22:49 cpu-e-1054 kernel: Lustre: 59880:0:(client.c:2134:ptlrpc_expire_one_request()) Skipped 8 previous similar messages
Aug 20 20:22:49 cpu-e-1054 kernel: LustreError: 59356:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8bd004eb8200
Aug 20 20:22:49 cpu-e-1054 kernel: LustreError: 59356:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8bd004eb8200
Aug 20 20:22:49 cpu-e-1054 kernel: LustreError: 59356:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8bd004eb8200
Aug 20 20:22:50 cpu-e-1054 kernel: LustreError: 59356:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8bd004eb8200
Aug 20 20:22:50 cpu-e-1054 kernel: Lustre: fs1-OST0114-osc-ffff8bd01ba2c000: Connection to fs1-OST0114 (at 10.47.18.24@o2ib1) was lost; in progress operations using this service will wait for recovery to complete
Aug 20 20:22:50 cpu-e-1054 kernel: Lustre: Skipped 6 previous similar messages
Aug 20 20:22:50 cpu-e-1054 kernel: LustreError: 59356:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8bcfbfb8f800
Aug 20 20:22:50 cpu-e-1054 kernel: LustreError: 59356:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8bcfbfb8f800
Aug 20 20:22:50 cpu-e-1054 kernel: LustreError: 59356:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8be7255e7e00
Aug 20 20:22:50 cpu-e-1054 kernel: LustreError: 59356:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8be7255e7e00
Aug 20 20:22:50 cpu-e-1054 kernel: LustreError: 59356:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8bcea83cdc00
Aug 20 20:22:50 cpu-e-1054 kernel: LustreError: 59356:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8bcea83cdc00
Aug 20 20:22:50 cpu-e-1054 kernel: LustreError: 59356:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8bcea83cdc00
Aug 20 20:22:50 cpu-e-1054 kernel: LustreError: 59356:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8bcea83cdc00
Aug 20 20:22:50 cpu-e-1054 kernel: LustreError: 59356:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8bcea83cdc00
Aug 20 20:22:50 cpu-e-1054 kernel: LustreError: 59356:0:(events.c:200:client_bulk_callback()) event type 2, status -103, desc ffff8bcea83cdc00
Aug 20 20:22:50 cpu-e-1054 kernel: LNet: 19753:0:(o2iblnd.c:941:kiblnd_create_conn()) peer 10.47.18.26@o2ib1 - queue depth reduced from 128 to 63  to allow for qp creation
Aug 20 20:22:50 cpu-e-1054 kernel: LNet: 19753:0:(o2iblnd.c:941:kiblnd_create_conn()) Skipped 19 previous similar messages
Aug 20 20:23:15 cpu-e-1054 kernel: Lustre: fs1-OST002c-osc-ffff8bd01ba2c000: Connection restored to 10.47.18.4@o2ib1 (at 10.47.18.4@o2ib1)
Aug 20 20:23:15 cpu-e-1054 kernel: Lustre: Skipped 6 previous similar messages
Aug 20 20:23:15 cpu-e-1054 kernel: LNet: 19753:0:(o2iblnd.c:941:kiblnd_create_conn()) peer 10.47.18.7@o2ib1 - queue depth reduced from 128 to 63  to allow for qp creation
Aug 20 20:23:15 cpu-e-1054 kernel: LNet: 19753:0:(o2iblnd.c:941:kiblnd_create_conn()) Skipped 3 previous similar messages
Aug 20 20:30:56 cpu-e-1054 kernel: LNetError: 59356:0:(o2iblnd_cb.c:3335:kiblnd_check_txs_locked()) Timed out tx: tx_queue, 0 seconds
Aug 20 20:30:56 cpu-e-1054 kernel: LNetError: 59356:0:(o2iblnd_cb.c:3335:kiblnd_check_txs_locked()) Skipped 11 previous similar messages
Aug 20 20:30:56 cpu-e-1054 kernel: LNetError: 59356:0:(o2iblnd_cb.c:3410:kiblnd_check_conns()) Timed out RDMA with 10.47.18.20@o2ib1 (6): c: 0, oc: 0, rc: 63
Aug 20 20:30:56 cpu-e-1054 kernel: LNetError: 59356:0:(o2iblnd_cb.c:3410:kiblnd_check_conns()) Skipped 11 previous similar messages
Aug 20 20:30:58 cpu-e-1054 kernel: Lustre: 59869:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566328963/real 1566328963]  req@ffff8be80272d580 x1642414042283712/t0(0) o3->fs1-OST00f1-osc-ffff8bd01ba2c000@10.47.18.21@o2ib1:6/4 lens 488/440 e 1 to 1 dl 1566329032 ref 2 fl Rpc:X/0/ffffffff rc 0/-1
Aug 20 20:30:58 cpu-e-1054 kernel: Lustre: 59869:0:(client.c:2134:ptlrpc_expire_one_request()) Skipped 3 previous similar messages
Aug 20 20:30:58 cpu-e-1054 kernel: Lustre: fs1-OST00f1-osc-ffff8bd01ba2c000: Connection to fs1-OST00f1 (at 10.47.18.21@o2ib1) was lost; in progress operations using this service will wait for recovery to complete
Aug 20 20:30:58 cpu-e-1054 kernel: Lustre: Skipped 1 previous similar message
Aug 20 20:30:58 cpu-e-1054 kernel: Lustre: fs1-OST00f1-osc-ffff8bd01ba2c000: Connection restored to 10.47.18.21@o2ib1 (at 10.47.18.21@o2ib1)
Aug 20 20:30:58 cpu-e-1054 kernel: Lustre: Skipped 2 previous similar messages
Aug 20 20:31:05 cpu-e-1054 kernel: LNetError: 59356:0:(o2iblnd_cb.c:3335:kiblnd_check_txs_locked()) Timed out tx: tx_queue, 0 seconds
Aug 20 20:31:05 cpu-e-1054 kernel: LNetError: 59356:0:(o2iblnd_cb.c:3410:kiblnd_check_conns()) Timed out RDMA with 10.47.18.4@o2ib1 (6): c: 0, oc: 0, rc: 63
Aug 20 20:31:16 cpu-e-1054 kernel: Lustre: 59887:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566328969/real 1566328969]  req@ffff8be8145eda00 x1642414042283440/t0(0) o3->fs1-OST011a-osc-ffff8bd01ba2c000@10.47.18.24@o2ib1:6/4 lens 488/440 e 2 to 1 dl 1566329063 ref 2 fl Rpc:X/2/ffffffff rc 0/-1
Aug 20 20:31:16 cpu-e-1054 kernel: Lustre: 59887:0:(client.c:2134:ptlrpc_expire_one_request()) Skipped 2 previous similar messages
Aug 20 20:31:16 cpu-e-1054 kernel: Lustre: fs1-OST011a-osc-ffff8bd01ba2c000: Connection to fs1-OST011a (at 10.47.18.24@o2ib1) was lost; in progress operations using this service will wait for recovery to complete
Aug 20 20:31:16 cpu-e-1054 kernel: Lustre: Skipped 2 previous similar messages
Aug 20 20:31:16 cpu-e-1054 kernel: Lustre: fs1-OST011a-osc-ffff8bd01ba2c000: Connection restored to 10.47.18.24@o2ib1 (at 10.47.18.24@o2ib1)
Aug 20 20:31:16 cpu-e-1054 kernel: Lustre: Skipped 2 previous similar messages
Aug 20 20:31:21 cpu-e-1054 kernel: LNetError: 59356:0:(o2iblnd_cb.c:3335:kiblnd_check_txs_locked()) Timed out tx: tx_queue, 0 seconds
Aug 20 20:31:21 cpu-e-1054 kernel: LNetError: 59356:0:(o2iblnd_cb.c:3335:kiblnd_check_txs_locked()) Skipped 2 previous similar messages
Aug 20 20:31:21 cpu-e-1054 kernel: LNetError: 59356:0:(o2iblnd_cb.c:3410:kiblnd_check_conns()) Timed out RDMA with 10.47.18.16@o2ib1 (22): c: 0, oc: 0, rc: 63
Aug 20 20:31:21 cpu-e-1054 kernel: LNetError: 59356:0:(o2iblnd_cb.c:3410:kiblnd_check_conns()) Skipped 2 previous similar messages
Aug 20 20:33:35 cpu-e-1054 kernel: Lustre: 59868:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566328966/real 1566328966]  req@ffff8bd016d69680 x1642414042283104/t0(0) o3->fs1-OST0029-osc-ffff8bd01ba2c000@10.47.18.4@o2ib1:6/4 lens 488/440 e 1 to 1 dl 1566329035 ref 2 fl Rpc:X/2/ffffffff rc 0/-1
Aug 20 20:33:35 cpu-e-1054 kernel: Lustre: 59868:0:(client.c:2134:ptlrpc_expire_one_request()) Skipped 1 previous similar message
Aug 20 20:33:35 cpu-e-1054 kernel: Lustre: fs1-OST0029-osc-ffff8bd01ba2c000: Connection to fs1-OST0029 (at 10.47.18.4@o2ib1) was lost; in progress operations using this service will wait for recovery to complete
Aug 20 20:33:35 cpu-e-1054 kernel: Lustre: Skipped 1 previous similar message
Aug 20 20:33:35 cpu-e-1054 kernel: Lustre: fs1-OST0029-osc-ffff8bd01ba2c000: Connection restored to 10.47.18.4@o2ib1 (at 10.47.18.4@o2ib1)
Aug 20 20:33:35 cpu-e-1054 kernel: Lustre: Skipped 1 previous similar message
Aug 20 20:33:42 cpu-e-1054 kernel: LNetError: 59356:0:(o2iblnd_cb.c:3335:kiblnd_check_txs_locked()) Timed out tx: tx_queue, 0 seconds
Aug 20 20:33:42 cpu-e-1054 kernel: LNetError: 59356:0:(o2iblnd_cb.c:3410:kiblnd_check_conns()) Timed out RDMA with 10.47.18.40@o2ib1 (6): c: 0, oc: 0, rc: 63
Aug 20 20:33:59 cpu-e-1054 kernel: LNetError: 59356:0:(o2iblnd_cb.c:3335:kiblnd_check_txs_locked()) Timed out tx: tx_queue, 0 seconds
Aug 20 20:33:59 cpu-e-1054 kernel: LNetError: 59356:0:(o2iblnd_cb.c:3335:kiblnd_check_txs_locked()) Skipped 2 previous similar messages
Aug 20 20:33:59 cpu-e-1054 kernel: LNetError: 59356:0:(o2iblnd_cb.c:3410:kiblnd_check_conns()) Timed out RDMA with 10.47.18.40@o2ib1 (23): c: 0, oc: 0, rc: 63
Aug 20 20:33:59 cpu-e-1054 kernel: LNetError: 59356:0:(o2iblnd_cb.c:3410:kiblnd_check_conns()) Skipped 2 previous similar messages
Aug 20 20:33:59 cpu-e-1054 kernel: LNet: 19753:0:(o2iblnd.c:941:kiblnd_create_conn()) peer 10.47.18.40@o2ib1 - queue depth reduced from 128 to 63  to allow for qp creation
Aug 20 20:33:59 cpu-e-1054 kernel: LNet: 19753:0:(o2iblnd.c:941:kiblnd_create_conn()) Skipped 15 previous similar messages
Aug 20 20:36:44 cpu-e-1054 kernel: Lustre: 59878:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1566329635/real 1566329635]  req@ffff8be817005a00 x1642414048548848/t0(0) o3->fs1-OST0029-osc-ffff8bd01ba2c000@10.47.18.4@o2ib1:6/4 lens 488/440 e 3 to 1 dl 1566329717 ref 2 fl Rpc:X/0/ffffffff rc 0/-1
Aug 20 20:36:44 cpu-e-1054 kernel: Lustre: 59878:0:(client.c:2134:ptlrpc_expire_one_request()) Skipped 9 previous similar messages
Aug 20 20:36:44 cpu-e-1054 kernel: Lustre: fs1-OST0029-osc-ffff8bd01ba2c000: Connection to fs1-OST0029 (at 10.47.18.4@o2ib1) was lost; in progress operations using this service will wait for recovery to complete
Aug 20 20:36:44 cpu-e-1054 kernel: Lustre: Skipped 9 previous similar messages
Aug 20 20:36:44 cpu-e-1054 kernel: Lustre: fs1-OST0029-osc-ffff8bd01ba2c000: Connection restored to 10.47.18.4@o2ib1 (at 10.47.18.4@o2ib1)
Aug 20 20:36:44 cpu-e-1054 kernel: Lustre: Skipped 9 previous similar messages
Aug 20 20:36:49 cpu-e-1054 kernel: perf: interrupt took too long (2506 > 2500), lowering kernel.perf_event_max_sample_rate to 79000
Aug 20 20:36:50 cpu-e-1054 kernel: LNetError: 59356:0:(o2iblnd_cb.c:3335:kiblnd_check_txs_locked()) Timed out tx: tx_queue, 0 seconds
Aug 20 20:36:50 cpu-e-1054 kernel: LNetError: 59356:0:(o2iblnd_cb.c:3335:kiblnd_check_txs_locked()) Skipped 2 previous similar messages
Aug 20 20:36:50 cpu-e-1054 kernel: LNetError: 59356:0:(o2iblnd_cb.c:3410:kiblnd_check_conns()) Timed out RDMA with 10.47.18.28@o2ib1 (6): c: 0, oc: 0, rc: 63
Aug 20 20:36:50 cpu-e-1054 kernel: LNetError: 59356:0:(o2iblnd_cb.c:3410:kiblnd_check_conns()) Skipped 2 previous similar messages
Aug 20 20:37:09 cpu-e-1054 kernel: LNet: 63427:0:(o2iblnd.c:941:kiblnd_create_conn()) peer 10.47.18.40@o2ib1 - queue depth reduced from 128 to 63  to allow for qp creation
Aug 20 20:37:09 cpu-e-1054 kernel: LNet: 63427:0:(o2iblnd.c:941:kiblnd_create_conn()) Skipped 3 previous similar messages
Aug 20 20:37:28 cpu-e-1054 kernel: LNet: 63427:0:(o2iblnd.c:941:kiblnd_create_conn()) peer 10.47.18.28@o2ib1 - queue depth reduced from 128 to 63  to allow for qp creation
Aug 20 20:37:28 cpu-e-1054 kernel: LNet: 63427:0:(o2iblnd.c:941:kiblnd_create_conn()) Skipped 3 previous similar messages
Aug 20 20:38:02 cpu-e-1054 kernel: LNetError: 59356:0:(o2iblnd_cb.c:3335:kiblnd_check_txs_locked()) Timed out tx: tx_queue, 0 seconds
Aug 20 20:38:02 cpu-e-1054 kernel: LNetError: 59356:0:(o2iblnd_cb.c:3335:kiblnd_check_txs_locked()) Skipped 10 previous similar messages
Aug 20 20:38:02 cpu-e-1054 kernel: LNetError: 59356:0:(o2iblnd_cb.c:3410:kiblnd_check_conns()) Timed out RDMA with 10.47.18.14@o2ib1 (35): c: 0, oc: 0, rc: 63
Aug 20 20:38:02 cpu-e-1054 kernel: LNetError: 59356:0:(o2iblnd_cb.c:3410:kiblnd_check_conns()) Skipped 10 previous similar messages
Aug 20 20:38:02 cpu-e-1054 kernel: Lustre: 59893:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1566329877/real 1566329882]  req@ffff8be813e7d580 x1642414050387792/t0(0) o400->fs1-MDT000d-mdc-ffff8bd01ba2c000@10.47.18.14@o2ib1:12/10 lens 224/224 e 0 to 1 dl 1566329884 ref 1 fl Rpc:eXN/0/ffffffff rc 0/-1
Aug 20 20:38:02 cpu-e-1054 kernel: Lustre: 59893:0:(client.c:2134:ptlrpc_expire_one_request()) Skipped 5 previous similar messages
Aug 20 20:38:02 cpu-e-1054 kernel: Lustre: fs1-MDT000d-mdc-ffff8bd01ba2c000: Connection to fs1-MDT000d (at 10.47.18.14@o2ib1) was lost; in progress operations using this service will wait for recovery to complete
Aug 20 20:38:02 cpu-e-1054 kernel: Lustre: Skipped 5 previous similar messages
Aug 20 20:38:02 cpu-e-1054 kernel: Lustre: fs1-OST00a4-osc-ffff8bd01ba2c000: Connection restored to 10.47.18.14@o2ib1 (at 10.47.18.14@o2ib1)
Aug 20 20:38:02 cpu-e-1054 kernel: Lustre: Skipped 5 previous similar messages
Aug 20 20:38:02 cpu-e-1054 kernel: LNet: 63427:0:(o2iblnd.c:941:kiblnd_create_conn()) peer 10.47.18.14@o2ib1 - queue depth reduced from 128 to 63  to allow for qp creation
Aug 20 20:38:02 cpu-e-1054 kernel: LNet: 63427:0:(o2iblnd.c:941:kiblnd_create_conn()) Skipped 3 previous similar messages
Aug 20 20:38:28 cpu-e-1054 kernel: LNet: 19753:0:(o2iblnd.c:941:kiblnd_create_conn()) peer 10.47.18.40@o2ib1 - queue depth reduced from 128 to 63  to allow for qp creation
Aug 20 20:38:28 cpu-e-1054 kernel: LNet: 19753:0:(o2iblnd.c:941:kiblnd_create_conn()) Skipped 3 previous similar messages
Aug 20 20:38:49 cpu-e-1054 kernel: LNet: 19753:0:(o2iblnd.c:941:kiblnd_create_conn()) peer 10.47.18.28@o2ib1 - queue depth reduced from 128 to 63  to allow for qp creation
Aug 20 20:38:49 cpu-e-1054 kernel: LNet: 19753:0:(o2iblnd.c:941:kiblnd_create_conn()) Skipped 3 previous similar messages
Aug 20 20:39:27 cpu-e-1054 kernel: LNet: 63427:0:(o2iblnd.c:941:kiblnd_create_conn()) peer 10.47.18.40@o2ib1 - queue depth reduced from 128 to 63  to allow for qp creation
Aug 20 20:39:27 cpu-e-1054 kernel: LNet: 63427:0:(o2iblnd.c:941:kiblnd_create_conn()) Skipped 7 previous similar messages

HOSTS -------------------------------------------------------------------------
cpu-e-1056
-------------------------------------------------------------------------------
-- Logs begin at Tue 2019-08-20 19:50:00 BST, end at Wed 2019-08-21 11:17:46 BST. --
Aug 20 20:12:21 cpu-e-1056 kernel: Adding 15999996k swap on /dev/sda2.  Priority:-2 extents:1 across:15999996k SSFS
Aug 20 20:14:37 cpu-e-1056 kernel: sched: RT throttling activated

HOSTS -------------------------------------------------------------------------
cpu-e-1061
-------------------------------------------------------------------------------
-- Logs begin at Tue 2019-08-20 19:49:58 BST, end at Wed 2019-08-21 11:17:46 BST. --
Aug 20 20:12:21 cpu-e-1061 kernel: Adding 15999996k swap on /dev/sda2.  Priority:-2 extents:1 across:15999996k SSFS
Aug 20 20:14:37 cpu-e-1061 kernel: sched: RT throttling activated

HOSTS -------------------------------------------------------------------------
cpu-e-1057
-------------------------------------------------------------------------------
-- Logs begin at Tue 2019-08-20 19:50:20 BST, end at Wed 2019-08-21 11:17:46 BST. --
Aug 20 20:12:21 cpu-e-1057 kernel: Adding 15999996k swap on /dev/sda2.  Priority:-2 extents:1 across:15999996k SSFS
Aug 20 20:14:37 cpu-e-1057 kernel: sched: RT throttling activated

HOSTS -------------------------------------------------------------------------
cpu-e-1059
-------------------------------------------------------------------------------
-- Logs begin at Tue 2019-08-20 19:50:03 BST, end at Wed 2019-08-21 11:17:46 BST. --
Aug 20 20:12:21 cpu-e-1059 kernel: Adding 15999996k swap on /dev/sda2.  Priority:-2 extents:1 across:15999996k SSFS
Aug 20 20:14:37 cpu-e-1059 kernel: sched: RT throttling activated
Aug 20 20:21:24 cpu-e-1059 kernel: ib0: Budget exhausted after napi rescheduled

HOSTS -------------------------------------------------------------------------
cpu-e-837
-------------------------------------------------------------------------------
-- Logs begin at Tue 2019-08-20 19:49:07 BST, end at Wed 2019-08-21 11:17:46 BST. --
Aug 20 20:11:46 cpu-e-837 kernel: Adding 15999996k swap on /dev/sda2.  Priority:-2 extents:1 across:15999996k SSFS
Aug 20 20:14:37 cpu-e-837 kernel: sched: RT throttling activated
Aug 20 20:21:25 cpu-e-837 kernel: ib0: Budget exhausted after napi rescheduled

HOSTS -------------------------------------------------------------------------
cpu-e-1055
-------------------------------------------------------------------------------
-- Logs begin at Tue 2019-08-20 19:50:34 BST, end at Wed 2019-08-21 11:17:46 BST. --
Aug 20 20:12:21 cpu-e-1055 kernel: Adding 15999996k swap on /dev/sda2.  Priority:-2 extents:1 across:15999996k SSFS
Aug 20 20:14:37 cpu-e-1055 kernel: sched: RT throttling activated

HOSTS -------------------------------------------------------------------------
cpu-e-836
-------------------------------------------------------------------------------
-- Logs begin at Tue 2019-08-20 19:48:44 BST, end at Wed 2019-08-21 11:17:46 BST. --
Aug 20 20:11:46 cpu-e-836 kernel: Adding 15999996k swap on /dev/sda2.  Priority:-2 extents:1 across:15999996k SSFS
Aug 20 20:14:37 cpu-e-836 kernel: sched: RT throttling activated

HOSTS -------------------------------------------------------------------------
cpu-e-1060
-------------------------------------------------------------------------------
-- Logs begin at Tue 2019-08-20 19:49:46 BST, end at Wed 2019-08-21 11:17:46 BST. --
Aug 20 20:12:21 cpu-e-1060 kernel: Adding 15999996k swap on /dev/sda2.  Priority:-2 extents:1 across:15999996k SSFS
Aug 20 20:14:37 cpu-e-1060 kernel: sched: RT throttling activated
