Jun 03 18:42:58 fir-md1-s2 kernel: LNet: HW NUMA nodes: 4, HW CPU cores: 48, npartitions: 4 Jun 03 18:42:58 fir-md1-s2 kernel: alg: No test for adler32 (adler32-zlib) Jun 03 18:42:59 fir-md1-s2 kernel: Lustre: Lustre: Build Version: 2.12.5_RC1 Jun 03 18:42:59 fir-md1-s2 kernel: LNet: 20334:0:(config.c:1642:lnet_inet_enumerate()) lnet: Ignoring interface em2: it's down Jun 03 18:42:59 fir-md1-s2 kernel: LNet: Using FastReg for registration Jun 03 18:42:59 fir-md1-s2 kernel: LNet: Added LNI 10.0.10.52@o2ib7 [8/256/0/180] Jun 03 18:43:00 fir-md1-s2 kernel: LDISKFS-fs warning (device dm-0): ldiskfs_multi_mount_protect:321: MMP interval 42 higher than expected, please wait. Jun 03 18:43:42 fir-md1-s2 kernel: LDISKFS-fs (dm-0): file extents enabled, maximum tree depth=5 Jun 03 18:44:18 fir-md1-s2 kernel: LDISKFS-fs (dm-0): recovery complete Jun 03 18:44:18 fir-md1-s2 kernel: LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,acl,no_mbcache,nodelalloc Jun 03 18:44:19 fir-md1-s2 kernel: Lustre: fir-MDT0001: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900 Jun 03 18:44:19 fir-md1-s2 kernel: Lustre: 20461:0:(llog_cat.c:1064:llog_cat_reverse_process()) fir-MDD0001: catalog [0x2a4:0xa:0x0] crosses index zero Jun 03 18:44:19 fir-md1-s2 kernel: Lustre: fir-MDD0001: changelog on Jun 03 18:44:19 fir-md1-s2 kernel: Lustre: 20461:0:(llog_cat.c:899:llog_cat_process_or_fork()) fir-MDD0001: catlog [0x2a4:0xa:0x0] crosses index zero Jun 03 18:44:19 fir-md1-s2 kernel: Lustre: 20461:0:(mdd_device.c:545:mdd_changelog_llog_init()) fir-MDD0001 : orphan changelog records found, starting from index 24071217800 to index 24753450867, being cleared now Jun 03 18:44:24 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.8.19@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 03 18:44:36 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.49.22.29@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 03 18:45:02 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.4.57@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 03 18:45:11 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.34@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 03 18:45:27 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.4.16@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 03 18:46:14 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.47@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 03 18:46:14 fir-md1-s2 kernel: LustreError: Skipped 1 previous similar message Jun 03 18:47:33 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.4.16@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 03 18:47:33 fir-md1-s2 kernel: LustreError: Skipped 1 previous similar message Jun 03 18:48:44 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.47@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 03 18:48:44 fir-md1-s2 kernel: LustreError: Skipped 2 previous similar messages Jun 03 18:50:12 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.34@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 03 18:50:12 fir-md1-s2 kernel: LustreError: Skipped 3 previous similar messages Jun 03 18:50:34 fir-md1-s2 kernel: Lustre: 21042:0:(llog_cat.c:899:llog_cat_process_or_fork()) fir-MDD0001: catlog [0x2a4:0xa:0x0] crosses index zero Jun 03 18:50:34 fir-md1-s2 kernel: Lustre: 21042:0:(llog_cat.c:899:llog_cat_process_or_fork()) Skipped 1 previous similar message Jun 03 18:51:15 fir-md1-s2 kernel: Lustre: 21068:0:(llog_cat.c:899:llog_cat_process_or_fork()) fir-MDD0001: catlog [0x2a4:0xa:0x0] crosses index zero Jun 03 18:52:42 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.34@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 03 18:52:42 fir-md1-s2 kernel: LustreError: Skipped 4 previous similar messages Jun 03 18:53:10 fir-md1-s2 kernel: Lustre: 21091:0:(llog_cat.c:899:llog_cat_process_or_fork()) fir-MDD0001: catlog [0x2a4:0xa:0x0] crosses index zero Jun 03 18:55:42 fir-md1-s2 kernel: Lustre: 21126:0:(llog_cat.c:899:llog_cat_process_or_fork()) fir-MDD0001: catlog [0x2a4:0xa:0x0] crosses index zero Jun 03 18:56:03 fir-md1-s2 kernel: Lustre: 21131:0:(llog_cat.c:899:llog_cat_process_or_fork()) fir-MDD0001: catlog [0x2a4:0xa:0x0] crosses index zero Jun 03 18:57:10 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.4.16@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 03 18:57:10 fir-md1-s2 kernel: LustreError: Skipped 7 previous similar messages Jun 03 18:57:47 fir-md1-s2 kernel: Lustre: 21151:0:(llog_cat.c:899:llog_cat_process_or_fork()) fir-MDD0001: catlog [0x2a4:0xa:0x0] crosses index zero Jun 03 19:00:05 fir-md1-s2 kernel: Lustre: 21186:0:(llog_cat.c:899:llog_cat_process_or_fork()) fir-MDD0001: catlog [0x2a4:0xa:0x0] crosses index zero Jun 03 19:00:05 fir-md1-s2 kernel: Lustre: 21186:0:(llog_cat.c:899:llog_cat_process_or_fork()) Skipped 2 previous similar messages Jun 03 19:05:53 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.47@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 03 19:05:53 fir-md1-s2 kernel: LustreError: Skipped 22 previous similar messages Jun 03 19:09:55 fir-md1-s2 kernel: Lustre: 21314:0:(llog_cat.c:899:llog_cat_process_or_fork()) fir-MDD0001: catlog [0x2a4:0xa:0x0] crosses index zero Jun 03 19:16:45 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.47@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 03 19:16:45 fir-md1-s2 kernel: LustreError: Skipped 24 previous similar messages Jun 03 19:26:46 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.47@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 03 19:26:46 fir-md1-s2 kernel: LustreError: Skipped 19 previous similar messages Jun 03 19:31:00 fir-md1-s2 kernel: Lustre: 21567:0:(llog_cat.c:899:llog_cat_process_or_fork()) fir-MDD0001: catlog [0x2a4:0xa:0x0] crosses index zero Jun 03 19:37:01 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.34@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 03 19:37:01 fir-md1-s2 kernel: LustreError: Skipped 26 previous similar messages Jun 03 19:41:22 fir-md1-s2 kernel: Lustre: 21694:0:(llog_cat.c:899:llog_cat_process_or_fork()) fir-MDD0001: catlog [0x2a4:0xa:0x0] crosses index zero Jun 03 19:41:22 fir-md1-s2 kernel: Lustre: 21694:0:(llog_cat.c:899:llog_cat_process_or_fork()) Skipped 3 previous similar messages Jun 03 19:43:18 fir-md1-s2 kernel: Lustre: 21718:0:(llog_cat.c:899:llog_cat_process_or_fork()) fir-MDD0001: catlog [0x2a4:0xa:0x0] crosses index zero Jun 03 19:45:03 fir-md1-s2 kernel: Lustre: 21757:0:(llog_cat.c:899:llog_cat_process_or_fork()) fir-MDD0001: catlog [0x2a4:0xa:0x0] crosses index zero Jun 03 19:45:03 fir-md1-s2 kernel: Lustre: 21757:0:(llog_cat.c:899:llog_cat_process_or_fork()) Skipped 4 previous similar messages Jun 03 19:47:16 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.47@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 03 19:47:16 fir-md1-s2 kernel: LustreError: Skipped 22 previous similar messages Jun 03 19:47:32 fir-md1-s2 kernel: Lustre: 21790:0:(llog_cat.c:899:llog_cat_process_or_fork()) fir-MDD0001: catlog [0x2a4:0xa:0x0] crosses index zero Jun 03 19:47:32 fir-md1-s2 kernel: Lustre: 21790:0:(llog_cat.c:899:llog_cat_process_or_fork()) Skipped 2 previous similar messages Jun 03 19:53:00 fir-md1-s2 kernel: Lustre: 21868:0:(llog_cat.c:899:llog_cat_process_or_fork()) fir-MDD0001: catlog [0x2a4:0xa:0x0] crosses index zero Jun 03 19:53:00 fir-md1-s2 kernel: Lustre: 21868:0:(llog_cat.c:899:llog_cat_process_or_fork()) Skipped 9 previous similar messages Jun 03 19:58:08 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.47@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 03 19:58:08 fir-md1-s2 kernel: LustreError: Skipped 22 previous similar messages Jun 03 20:02:41 fir-md1-s2 kernel: Lustre: 22093:0:(llog_cat.c:899:llog_cat_process_or_fork()) fir-MDD0001: catlog [0x2a4:0xa:0x0] crosses index zero Jun 03 20:02:41 fir-md1-s2 kernel: Lustre: 22093:0:(llog_cat.c:899:llog_cat_process_or_fork()) Skipped 34 previous similar messages Jun 03 20:08:10 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.47@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 03 20:08:10 fir-md1-s2 kernel: LustreError: Skipped 26 previous similar messages Jun 03 20:13:02 fir-md1-s2 kernel: Lustre: 22231:0:(llog_cat.c:899:llog_cat_process_or_fork()) fir-MDD0001: catlog [0x2a4:0xa:0x0] crosses index zero Jun 03 20:13:02 fir-md1-s2 kernel: Lustre: 22231:0:(llog_cat.c:899:llog_cat_process_or_fork()) Skipped 12 previous similar messages Jun 03 20:18:42 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.4.16@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 03 20:18:42 fir-md1-s2 kernel: LustreError: Skipped 21 previous similar messages Jun 03 20:23:05 fir-md1-s2 kernel: Lustre: 22454:0:(llog_cat.c:899:llog_cat_process_or_fork()) fir-MDD0001: catlog [0x2a4:0xa:0x0] crosses index zero Jun 03 20:23:05 fir-md1-s2 kernel: Lustre: 22454:0:(llog_cat.c:899:llog_cat_process_or_fork()) Skipped 30 previous similar messages Jun 03 20:28:52 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.34@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 03 20:28:52 fir-md1-s2 kernel: LustreError: Skipped 23 previous similar messages Jun 03 20:33:06 fir-md1-s2 kernel: Lustre: 22636:0:(llog_cat.c:899:llog_cat_process_or_fork()) fir-MDD0001: catlog [0x2a4:0xa:0x0] crosses index zero Jun 03 20:33:06 fir-md1-s2 kernel: Lustre: 22636:0:(llog_cat.c:899:llog_cat_process_or_fork()) Skipped 61 previous similar messages Jun 03 20:38:13 fir-md1-s2 kernel: Lustre: fir-MDT0001: in recovery but waiting for the first client to connect Jun 03 20:38:13 fir-md1-s2 kernel: Lustre: fir-MDT0001: Will be in recovery for at least 2:30, or until 1306 clients reconnect Jun 03 20:38:14 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.50.7.36@o2ib2) Jun 03 20:38:14 fir-md1-s2 kernel: Lustre: Skipped 52 previous similar messages Jun 03 20:38:15 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.20.35@o2ib1) Jun 03 20:38:15 fir-md1-s2 kernel: Lustre: Skipped 25 previous similar messages Jun 03 20:38:16 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.50.1.36@o2ib2) Jun 03 20:38:16 fir-md1-s2 kernel: Lustre: Skipped 36 previous similar messages Jun 03 20:38:18 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.50.3.39@o2ib2) Jun 03 20:38:18 fir-md1-s2 kernel: Lustre: Skipped 76 previous similar messages Jun 03 20:38:22 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.50.1.58@o2ib2) Jun 03 20:38:22 fir-md1-s2 kernel: Lustre: Skipped 225 previous similar messages Jun 03 20:38:25 fir-md1-s2 kernel: Lustre: fir-MDT0001: Denying connection for new client 570cf4c9-787b-4 (at 10.49.27.27@o2ib1), waiting for 1306 known clients (540 recovered, 8 in progress, and 0 evicted) to recover in 14:23 Jun 03 20:38:30 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.30.20@o2ib1) Jun 03 20:38:30 fir-md1-s2 kernel: Lustre: Skipped 434 previous similar messages Jun 03 20:38:46 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 03 20:38:46 fir-md1-s2 kernel: Lustre: Skipped 542 previous similar messages Jun 03 20:38:50 fir-md1-s2 kernel: Lustre: fir-MDT0001: Denying connection for new client 570cf4c9-787b-4 (at 10.49.27.27@o2ib1), waiting for 1306 known clients (1249 recovered, 55 in progress, and 0 evicted) to recover in 13:57 Jun 03 20:38:58 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.8.19@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 03 20:38:58 fir-md1-s2 kernel: LustreError: Skipped 24 previous similar messages Jun 03 20:39:15 fir-md1-s2 kernel: Lustre: fir-MDT0001: Denying connection for new client 570cf4c9-787b-4 (at 10.49.27.27@o2ib1), waiting for 1306 known clients (1249 recovered, 56 in progress, and 0 evicted) to recover in 13:32 Jun 03 20:39:40 fir-md1-s2 kernel: Lustre: fir-MDT0001: Denying connection for new client 570cf4c9-787b-4 (at 10.49.27.27@o2ib1), waiting for 1306 known clients (1249 recovered, 56 in progress, and 0 evicted) to recover in 13:07 Jun 03 20:40:05 fir-md1-s2 kernel: Lustre: fir-MDT0001: Denying connection for new client 570cf4c9-787b-4 (at 10.49.27.27@o2ib1), waiting for 1306 known clients (1249 recovered, 56 in progress, and 0 evicted) to recover in 12:42 Jun 03 20:40:30 fir-md1-s2 kernel: Lustre: fir-MDT0001: Denying connection for new client 570cf4c9-787b-4 (at 10.49.27.27@o2ib1), waiting for 1306 known clients (1249 recovered, 56 in progress, and 0 evicted) to recover in 12:17 Jun 03 20:40:55 fir-md1-s2 kernel: Lustre: fir-MDT0001: Denying connection for new client 570cf4c9-787b-4 (at 10.49.27.27@o2ib1), waiting for 1306 known clients (1249 recovered, 56 in progress, and 0 evicted) to recover in 11:52 Jun 03 20:41:08 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnected, waiting for 1306 clients in recovery for 11:40 Jun 03 20:41:08 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 03 20:41:08 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Jun 03 20:41:17 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) reconnected, waiting for 1306 clients in recovery for 11:30 Jun 03 20:41:46 fir-md1-s2 kernel: Lustre: fir-MDT0001: Denying connection for new client 570cf4c9-787b-4 (at 10.49.27.27@o2ib1), waiting for 1306 known clients (1249 recovered, 56 in progress, and 0 evicted) to recover in 11:02 Jun 03 20:41:46 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Jun 03 20:43:01 fir-md1-s2 kernel: Lustre: fir-MDT0001: Denying connection for new client 570cf4c9-787b-4 (at 10.49.27.27@o2ib1), waiting for 1306 known clients (1249 recovered, 56 in progress, and 0 evicted) to recover in 9:47 Jun 03 20:43:01 fir-md1-s2 kernel: Lustre: Skipped 2 previous similar messages Jun 03 20:43:38 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnected, waiting for 1306 clients in recovery for 9:09 Jun 03 20:43:38 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 03 20:43:38 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Jun 03 20:43:47 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) reconnected, waiting for 1306 clients in recovery for 9:00 Jun 03 20:46:09 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnected, waiting for 1306 clients in recovery for 6:39 Jun 03 20:46:09 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 03 20:46:09 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Jun 03 20:46:18 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) reconnected, waiting for 1306 clients in recovery for 6:29 Jun 03 20:48:04 fir-md1-s2 kernel: Lustre: fir-MDT0001: Denying connection for new client 2e2a0151-2452-4 (at 10.49.27.27@o2ib1), waiting for 1306 known clients (1249 recovered, 56 in progress, and 0 evicted) to recover in 4:43 Jun 03 20:48:04 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 03 20:48:39 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnected, waiting for 1306 clients in recovery for 4:08 Jun 03 20:49:13 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.4.16@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 03 20:49:13 fir-md1-s2 kernel: LustreError: Skipped 20 previous similar messages Jun 03 20:50:29 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) reconnected, waiting for 1306 clients in recovery for 2:19 Jun 03 20:50:29 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Jun 03 20:50:29 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 03 20:50:29 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages Jun 03 20:52:09 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) reconnected, waiting for 1306 clients in recovery for 0:38 Jun 03 20:52:09 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Jun 03 20:52:29 fir-md1-s2 kernel: Lustre: fir-MDT0001: Denying connection for new client 2e2a0151-2452-4 (at 10.49.27.27@o2ib1), waiting for 1306 known clients (1249 recovered, 56 in progress, and 0 evicted) to recover in 0:19 Jun 03 20:52:29 fir-md1-s2 kernel: Lustre: Skipped 10 previous similar messages Jun 03 20:52:48 fir-md1-s2 kernel: Lustre: fir-MDT0001: recovery is timed out, evict stale exports Jun 03 20:52:48 fir-md1-s2 kernel: Lustre: fir-MDT0001: disconnecting 1 stale clients Jun 03 20:52:48 fir-md1-s2 kernel: Lustre: 22713:0:(ldlm_lib.c:1782:extend_recovery_timer()) fir-MDT0001: extended recovery timer reached hard limit: 900, extend: 1 Jun 03 20:53:13 fir-md1-s2 kernel: Lustre: fir-MDT0001: recovery is timed out, evict stale exports Jun 03 20:53:13 fir-md1-s2 kernel: Lustre: fir-MDT0001: disconnecting 2 stale clients Jun 03 20:53:13 fir-md1-s2 kernel: Lustre: 22713:0:(ldlm_lib.c:1782:extend_recovery_timer()) fir-MDT0001: extended recovery timer reached hard limit: 900, extend: 1 Jun 03 20:53:13 fir-md1-s2 kernel: Lustre: 22713:0:(ldlm_lib.c:2063:target_recovery_overseer()) fir-MDT0001 recovery is aborted by hard timeout Jun 03 20:53:13 fir-md1-s2 kernel: Lustre: 22713:0:(ldlm_lib.c:2073:target_recovery_overseer()) recovery is aborted, evict exports in recovery Jun 03 20:53:13 fir-md1-s2 kernel: Lustre: 22713:0:(ldlm_lib.c:1616:abort_req_replay_queue()) @@@ aborted: req@ffff8d246e963180 x1663398908753664/t0(639966616953) o35->b846ec69-3763-4@10.50.2.49@o2ib2:282/0 lens 392/0 e 36 to 0 dl 1591242812 ref 1 fl Complete:/4/ffffffff rc 0/-1 Jun 03 20:53:13 fir-md1-s2 kernel: LustreError: 22713:0:(ldlm_lib.c:1637:abort_lock_replay_queue()) @@@ aborted: req@ffff8d247352ba80 x1660888113275072/t0(0) o101->d52f107d-43b3-4@10.50.10.34@o2ib2:286/0 lens 328/0 e 6 to 0 dl 1591242816 ref 1 fl Complete:/40/ffffffff rc 0/-1 Jun 03 20:53:13 fir-md1-s2 kernel: Lustre: fir-MDT0001: Recovery over after 15:00, of 1306 clients 1249 recovered and 57 were evicted. Jun 03 20:55:21 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 03 20:55:55 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) reconnecting Jun 03 20:57:01 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 03 20:57:27 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d1312f61c00, cur 1591243047 expire 1591242897 last 1591242820 Jun 03 20:59:15 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.4.16@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 03 20:59:15 fir-md1-s2 kernel: LustreError: Skipped 25 previous similar messages Jun 03 21:00:06 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) reconnecting Jun 03 21:00:06 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 03 21:00:06 fir-md1-s2 kernel: Lustre: Skipped 64 previous similar messages Jun 03 21:00:47 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 03 21:01:47 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d12da6cb400, cur 1591243307 expire 1591243157 last 1591243080 Jun 03 21:01:47 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Jun 03 21:04:17 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) reconnecting Jun 03 21:04:58 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 03 21:05:58 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d24755eb800, cur 1591243558 expire 1591243408 last 1591243331 Jun 03 21:05:58 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Jun 03 21:09:09 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 03 21:09:42 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.4.16@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 03 21:09:42 fir-md1-s2 kernel: LustreError: Skipped 23 previous similar messages Jun 03 21:10:34 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d1462495c00, cur 1591243834 expire 1591243684 last 1591243607 Jun 03 21:10:34 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Jun 03 21:11:39 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 03 21:11:39 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 03 21:14:10 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 03 21:14:10 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Jun 03 21:15:26 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8cf449667c00, cur 1591244126 expire 1591243976 last 1591243899 Jun 03 21:15:26 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Jun 03 21:18:30 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) reconnecting Jun 03 21:18:30 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Jun 03 21:19:45 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.4.16@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 03 21:19:45 fir-md1-s2 kernel: LustreError: Skipped 20 previous similar messages Jun 03 21:20:27 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d0467eee000, cur 1591244427 expire 1591244277 last 1591244200 Jun 03 21:20:27 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Jun 03 21:21:41 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 03 21:21:41 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages Jun 03 21:23:22 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 03 21:23:22 fir-md1-s2 kernel: Lustre: Skipped 2 previous similar messages Jun 03 21:25:28 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d2479b74800, cur 1591244728 expire 1591244578 last 1591244501 Jun 03 21:25:28 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Jun 03 21:29:48 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d135f69ac00, cur 1591244988 expire 1591244838 last 1591244761 Jun 03 21:29:48 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Jun 03 21:30:24 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.8.19@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 03 21:30:24 fir-md1-s2 kernel: LustreError: Skipped 26 previous similar messages Jun 03 21:32:18 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) reconnecting Jun 03 21:32:18 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Jun 03 21:32:18 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 03 21:32:18 fir-md1-s2 kernel: Lustre: Skipped 10 previous similar messages Jun 03 21:33:59 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d10f83c8c00, cur 1591245239 expire 1591245089 last 1591245012 Jun 03 21:33:59 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Jun 03 21:38:35 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d145fec9400, cur 1591245515 expire 1591245365 last 1591245288 Jun 03 21:38:35 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Jun 03 21:40:48 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.34@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 03 21:40:48 fir-md1-s2 kernel: LustreError: Skipped 23 previous similar messages Jun 03 21:42:20 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) reconnecting Jun 03 21:42:20 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Jun 03 21:42:20 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 03 21:42:20 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages Jun 03 21:43:27 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d240c30e400, cur 1591245807 expire 1591245657 last 1591245580 Jun 03 21:43:27 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Jun 03 21:50:50 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.34@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 03 21:50:50 fir-md1-s2 kernel: LustreError: Skipped 20 previous similar messages Jun 03 21:52:22 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) reconnecting Jun 03 21:52:22 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Jun 03 21:52:22 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 03 21:52:22 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages Jun 03 21:52:48 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d047766fc00, cur 1591246368 expire 1591246218 last 1591246141 Jun 03 21:52:48 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages Jun 03 22:01:17 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.34@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 03 22:01:17 fir-md1-s2 kernel: LustreError: Skipped 26 previous similar messages Jun 03 22:02:49 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 03 22:02:49 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 03 22:05:20 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) reconnecting Jun 03 22:05:20 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 03 22:06:36 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d04776e1800, cur 1591247196 expire 1591247046 last 1591246969 Jun 03 22:06:36 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 03 22:11:19 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.34@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 03 22:11:19 fir-md1-s2 kernel: LustreError: Skipped 21 previous similar messages Jun 03 22:12:51 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 03 22:12:51 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages Jun 03 22:15:22 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) reconnecting Jun 03 22:15:22 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Jun 03 22:20:49 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d12c4a50000, cur 1591248049 expire 1591247899 last 1591247822 Jun 03 22:20:49 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 03 22:22:23 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.47@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 03 22:22:23 fir-md1-s2 kernel: LustreError: Skipped 23 previous similar messages Jun 03 22:23:34 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 03 22:23:34 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 03 22:28:44 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) reconnecting Jun 03 22:28:44 fir-md1-s2 kernel: Lustre: Skipped 6 previous similar messages Jun 03 22:32:42 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.8.19@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 03 22:32:42 fir-md1-s2 kernel: LustreError: Skipped 26 previous similar messages Jun 03 22:33:36 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 03 22:33:36 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 03 22:34:37 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d0468db6c00, cur 1591248877 expire 1591248727 last 1591248650 Jun 03 22:34:37 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 03 22:42:52 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.47@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 03 22:42:52 fir-md1-s2 kernel: LustreError: Skipped 20 previous similar messages Jun 03 22:43:13 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 03 22:43:13 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 03 22:45:03 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 03 22:45:03 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages Jun 03 22:48:50 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8ce6e03dd400, cur 1591249730 expire 1591249580 last 1591249503 Jun 03 22:48:50 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 03 22:53:08 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.34@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 03 22:53:08 fir-md1-s2 kernel: LustreError: Skipped 25 previous similar messages Jun 03 22:53:15 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 03 22:53:15 fir-md1-s2 kernel: Lustre: Skipped 6 previous similar messages Jun 03 22:55:21 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 03 22:55:21 fir-md1-s2 kernel: Lustre: Skipped 10 previous similar messages Jun 03 23:02:38 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d04773f9000, cur 1591250558 expire 1591250408 last 1591250331 Jun 03 23:02:38 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 03 23:03:22 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.47@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 03 23:03:22 fir-md1-s2 kernel: LustreError: Skipped 23 previous similar messages Jun 03 23:06:13 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 03 23:06:13 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Jun 03 23:06:13 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 03 23:06:13 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 03 23:13:24 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.47@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 03 23:13:24 fir-md1-s2 kernel: LustreError: Skipped 20 previous similar messages Jun 03 23:16:15 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 03 23:16:15 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Jun 03 23:16:15 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 03 23:16:15 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages Jun 03 23:16:51 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d1187bbd400, cur 1591251411 expire 1591251261 last 1591251184 Jun 03 23:16:51 fir-md1-s2 kernel: Lustre: Skipped 6 previous similar messages Jun 03 23:23:55 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.4.16@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 03 23:23:55 fir-md1-s2 kernel: LustreError: Skipped 25 previous similar messages Jun 03 23:26:52 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 03 23:26:52 fir-md1-s2 kernel: Lustre: Skipped 10 previous similar messages Jun 03 23:29:22 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) reconnecting Jun 03 23:29:22 fir-md1-s2 kernel: Lustre: Skipped 6 previous similar messages Jun 03 23:30:39 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d10cb7d5800, cur 1591252239 expire 1591252089 last 1591252012 Jun 03 23:30:39 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 03 23:34:47 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.4.16@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 03 23:34:47 fir-md1-s2 kernel: LustreError: Skipped 23 previous similar messages Jun 03 23:36:54 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 03 23:36:54 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages Jun 03 23:39:24 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) reconnecting Jun 03 23:39:24 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Jun 03 23:44:49 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.4.16@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 03 23:44:49 fir-md1-s2 kernel: LustreError: Skipped 20 previous similar messages Jun 03 23:44:52 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d2478d3ec00, cur 1591253092 expire 1591252942 last 1591252865 Jun 03 23:44:52 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 03 23:47:37 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 03 23:47:37 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 03 23:52:47 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) reconnecting Jun 03 23:52:47 fir-md1-s2 kernel: Lustre: Skipped 6 previous similar messages Jun 03 23:55:13 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.47@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 03 23:55:13 fir-md1-s2 kernel: LustreError: Skipped 27 previous similar messages Jun 03 23:57:39 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 03 23:57:39 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 03 23:58:40 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d1462495400, cur 1591253920 expire 1591253770 last 1591253693 Jun 03 23:58:40 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 00:05:18 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.4.16@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 00:05:18 fir-md1-s2 kernel: LustreError: Skipped 20 previous similar messages Jun 04 00:07:16 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 04 00:07:16 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 00:09:47 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 00:09:47 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages Jun 04 00:12:04 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 3ae7e8c2-607a-4 (at 10.49.27.35@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8cf44be3fc00, cur 1591254724 expire 1591254574 last 1591254497 Jun 04 00:12:04 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 00:15:21 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.4.16@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 00:15:21 fir-md1-s2 kernel: LustreError: Skipped 24 previous similar messages Jun 04 00:18:08 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 04 00:18:08 fir-md1-s2 kernel: Lustre: Skipped 6 previous similar messages Jun 04 00:19:49 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 00:19:49 fir-md1-s2 kernel: Lustre: Skipped 10 previous similar messages Jun 04 00:22:55 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d24796f8c00, cur 1591255375 expire 1591255225 last 1591255148 Jun 04 00:22:55 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Jun 04 00:26:17 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.49.23.15@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 00:26:17 fir-md1-s2 kernel: LustreError: Skipped 24 previous similar messages Jun 04 00:30:16 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 04 00:30:16 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Jun 04 00:30:16 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 00:30:16 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages Jun 04 00:36:24 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.34@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 00:36:24 fir-md1-s2 kernel: LustreError: Skipped 21 previous similar messages Jun 04 00:36:33 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d047766cc00, cur 1591256193 expire 1591256043 last 1591255966 Jun 04 00:36:33 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 00:40:18 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 04 00:40:18 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Jun 04 00:40:18 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 00:40:18 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages Jun 04 00:46:30 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.8.19@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 00:46:30 fir-md1-s2 kernel: LustreError: Skipped 23 previous similar messages Jun 04 00:46:36 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d14634a0c00, cur 1591256796 expire 1591256646 last 1591256569 Jun 04 00:46:36 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages Jun 04 00:51:18 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 04 00:51:18 fir-md1-s2 kernel: Lustre: Skipped 10 previous similar messages Jun 04 00:53:24 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) reconnecting Jun 04 00:53:24 fir-md1-s2 kernel: Lustre: Skipped 6 previous similar messages Jun 04 00:56:44 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.4.16@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 00:56:44 fir-md1-s2 kernel: LustreError: Skipped 23 previous similar messages Jun 04 00:59:41 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d1464004800, cur 1591257581 expire 1591257431 last 1591257354 Jun 04 00:59:41 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 01:03:18 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 01:03:18 fir-md1-s2 kernel: Lustre: Skipped 10 previous similar messages Jun 04 01:03:26 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) reconnecting Jun 04 01:03:26 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Jun 04 01:07:24 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.8.19@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 01:07:24 fir-md1-s2 kernel: LustreError: Skipped 22 previous similar messages Jun 04 01:14:10 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 04 01:14:10 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 01:14:10 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 01:14:10 fir-md1-s2 kernel: Lustre: Skipped 10 previous similar messages Jun 04 01:14:36 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d10245cec00, cur 1591258476 expire 1591258326 last 1591258249 Jun 04 01:14:36 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 01:17:48 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.34@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 01:17:48 fir-md1-s2 kernel: LustreError: Skipped 26 previous similar messages Jun 04 01:24:12 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 01:24:12 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 04 01:26:18 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 04 01:26:18 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Jun 04 01:27:42 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d1462490000, cur 1591259262 expire 1591259112 last 1591259035 Jun 04 01:27:42 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 01:27:54 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.8.19@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 01:27:54 fir-md1-s2 kernel: LustreError: Skipped 22 previous similar messages Jun 04 01:35:38 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 04 01:35:38 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages Jun 04 01:36:20 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 04 01:36:20 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Jun 04 01:38:29 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.47@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 01:38:29 fir-md1-s2 kernel: LustreError: Skipped 21 previous similar messages Jun 04 01:42:37 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d1462494000, cur 1591260157 expire 1591260007 last 1591259930 Jun 04 01:42:37 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 01:45:57 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 01:45:57 fir-md1-s2 kernel: Lustre: Skipped 10 previous similar messages Jun 04 01:48:34 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.4.16@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 01:48:34 fir-md1-s2 kernel: LustreError: Skipped 26 previous similar messages Jun 04 01:49:26 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) reconnecting Jun 04 01:49:26 fir-md1-s2 kernel: Lustre: Skipped 6 previous similar messages Jun 04 01:55:43 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d1431471400, cur 1591260943 expire 1591260793 last 1591260716 Jun 04 01:55:43 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 01:56:49 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 01:56:49 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages Jun 04 01:58:59 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.47@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 01:58:59 fir-md1-s2 kernel: LustreError: Skipped 21 previous similar messages Jun 04 01:59:28 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) reconnecting Jun 04 01:59:28 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Jun 04 02:06:51 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 02:06:51 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages Jun 04 02:08:46 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 66d106a0-48e6-4 (at 10.49.27.27@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d12c4a57000, cur 1591261726 expire 1591261576 last 1591261499 Jun 04 02:08:46 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 02:09:03 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.4.16@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 02:09:03 fir-md1-s2 kernel: LustreError: Skipped 22 previous similar messages Jun 04 02:11:52 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 04 02:11:52 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 02:17:27 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 04 02:17:27 fir-md1-s2 kernel: Lustre: Skipped 10 previous similar messages Jun 04 02:19:08 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d037eba3400, cur 1591262348 expire 1591262198 last 1591262121 Jun 04 02:19:08 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Jun 04 02:19:53 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.47@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 02:19:53 fir-md1-s2 kernel: LustreError: Skipped 26 previous similar messages Jun 04 02:22:19 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 04 02:22:19 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Jun 04 02:27:29 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 04 02:27:29 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages Jun 04 02:30:08 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.34@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 02:30:08 fir-md1-s2 kernel: LustreError: Skipped 20 previous similar messages Jun 04 02:32:22 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 04 02:32:22 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages Jun 04 02:33:38 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d246a524000, cur 1591263218 expire 1591263068 last 1591262991 Jun 04 02:33:38 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 02:37:31 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 04 02:37:31 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages Jun 04 02:40:30 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.49.23.15@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 02:40:30 fir-md1-s2 kernel: LustreError: Skipped 25 previous similar messages Jun 04 02:45:53 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) reconnecting Jun 04 02:45:53 fir-md1-s2 kernel: Lustre: Skipped 7 previous similar messages Jun 04 02:47:34 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d246a525c00, cur 1591264054 expire 1591263904 last 1591263827 Jun 04 02:47:34 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 02:47:58 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 04 02:47:58 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 04 02:50:57 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.49.23.15@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 02:50:57 fir-md1-s2 kernel: LustreError: Skipped 23 previous similar messages Jun 04 02:58:00 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 04 02:58:00 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages Jun 04 03:00:22 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 04 03:00:22 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 03:00:59 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.49.23.15@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 03:00:59 fir-md1-s2 kernel: LustreError: Skipped 21 previous similar messages Jun 04 03:01:39 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d1312f63c00, cur 1591264899 expire 1591264749 last 1591264672 Jun 04 03:01:39 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 03:09:43 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 04 03:09:43 fir-md1-s2 kernel: Lustre: Skipped 10 previous similar messages Jun 04 03:11:02 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.49.23.15@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 03:11:02 fir-md1-s2 kernel: LustreError: Skipped 25 previous similar messages Jun 04 03:13:53 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) reconnecting Jun 04 03:13:53 fir-md1-s2 kernel: Lustre: Skipped 7 previous similar messages Jun 04 03:15:35 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d03f08ce000, cur 1591265735 expire 1591265585 last 1591265508 Jun 04 03:15:35 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 03:20:52 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 03:20:52 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 04 03:21:24 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.4.16@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 03:21:24 fir-md1-s2 kernel: LustreError: Skipped 21 previous similar messages Jun 04 03:28:23 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 04 03:28:23 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 03:29:40 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8ced1a7e7800, cur 1591266580 expire 1591266430 last 1591266353 Jun 04 03:29:40 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 03:30:54 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 03:30:54 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages Jun 04 03:31:26 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.4.16@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 03:31:26 fir-md1-s2 kernel: LustreError: Skipped 21 previous similar messages Jun 04 03:41:50 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.47@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 03:41:50 fir-md1-s2 kernel: LustreError: Skipped 26 previous similar messages Jun 04 03:41:54 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) reconnecting Jun 04 03:41:54 fir-md1-s2 kernel: Lustre: Skipped 7 previous similar messages Jun 04 03:41:54 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 04 03:41:54 fir-md1-s2 kernel: Lustre: Skipped 10 previous similar messages Jun 04 03:43:36 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d240be52800, cur 1591267416 expire 1591267266 last 1591267189 Jun 04 03:43:36 fir-md1-s2 kernel: Lustre: Skipped 29 previous similar messages Jun 04 03:51:55 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.4.16@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 03:51:55 fir-md1-s2 kernel: LustreError: Skipped 20 previous similar messages Jun 04 03:53:54 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 03:53:54 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 04 03:56:24 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 04 03:56:24 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 03:57:41 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d14638f6800, cur 1591268261 expire 1591268111 last 1591268034 Jun 04 03:57:41 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 04:01:57 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.4.16@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 04:01:57 fir-md1-s2 kernel: LustreError: Skipped 22 previous similar messages Jun 04 04:03:56 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 04:03:56 fir-md1-s2 kernel: Lustre: Skipped 10 previous similar messages Jun 04 04:09:55 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) reconnecting Jun 04 04:09:55 fir-md1-s2 kernel: Lustre: Skipped 7 previous similar messages Jun 04 04:11:37 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d23ed2f4400, cur 1591269097 expire 1591268947 last 1591268870 Jun 04 04:11:37 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 04:13:00 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.34@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 04:13:00 fir-md1-s2 kernel: LustreError: Skipped 26 previous similar messages Jun 04 04:14:23 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 04:14:23 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages Jun 04 04:23:02 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.34@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 04:23:02 fir-md1-s2 kernel: LustreError: Skipped 21 previous similar messages Jun 04 04:24:25 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 04 04:24:25 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 04:24:25 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 04:24:25 fir-md1-s2 kernel: Lustre: Skipped 7 previous similar messages Jun 04 04:25:42 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d147948dc00, cur 1591269942 expire 1591269792 last 1591269715 Jun 04 04:25:42 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 04:33:08 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.8.19@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 04:33:08 fir-md1-s2 kernel: LustreError: Skipped 23 previous similar messages Jun 04 04:35:51 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 04 04:35:51 fir-md1-s2 kernel: Lustre: Skipped 11 previous similar messages Jun 04 04:37:56 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) reconnecting Jun 04 04:37:56 fir-md1-s2 kernel: Lustre: Skipped 7 previous similar messages Jun 04 04:39:38 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d0465a79800, cur 1591270778 expire 1591270628 last 1591270551 Jun 04 04:39:38 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 04:43:19 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.4.16@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 04:43:19 fir-md1-s2 kernel: LustreError: Skipped 23 previous similar messages Jun 04 04:47:25 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 04:47:25 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 04 04:52:26 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 04 04:52:26 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 04:53:37 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.8.19@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 04:53:37 fir-md1-s2 kernel: LustreError: Skipped 22 previous similar messages Jun 04 04:53:43 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d24796fec00, cur 1591271623 expire 1591271473 last 1591271396 Jun 04 04:53:43 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 04:58:17 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 04:58:17 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 04 05:03:45 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d1464001c00, cur 1591272225 expire 1591272075 last 1591271998 Jun 04 05:03:45 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages Jun 04 05:03:47 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.4.16@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 05:03:47 fir-md1-s2 kernel: LustreError: Skipped 24 previous similar messages Jun 04 05:06:14 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 04 05:06:14 fir-md1-s2 kernel: Lustre: Skipped 7 previous similar messages Jun 04 05:08:19 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 05:08:19 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 04 05:13:49 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.4.16@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 05:13:49 fir-md1-s2 kernel: LustreError: Skipped 22 previous similar messages Jun 04 05:16:42 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d0469187000, cur 1591273002 expire 1591272852 last 1591272775 Jun 04 05:16:42 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 05:20:27 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 04 05:20:27 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 05:20:27 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 05:20:27 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 04 05:24:08 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.8.19@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 05:24:08 fir-md1-s2 kernel: LustreError: Skipped 21 previous similar messages Jun 04 05:26:44 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d0469181c00, cur 1591273604 expire 1591273454 last 1591273377 Jun 04 05:26:44 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages Jun 04 05:32:09 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 05:32:09 fir-md1-s2 kernel: Lustre: Skipped 11 previous similar messages Jun 04 05:34:15 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 04 05:34:15 fir-md1-s2 kernel: Lustre: Skipped 7 previous similar messages Jun 04 05:34:57 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.34@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 05:34:57 fir-md1-s2 kernel: LustreError: Skipped 26 previous similar messages Jun 04 05:40:07 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d23ed2f4800, cur 1591274407 expire 1591274257 last 1591274180 Jun 04 05:40:07 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 05:43:27 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 05:43:27 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 04 05:45:36 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.47@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 05:45:36 fir-md1-s2 kernel: LustreError: Skipped 23 previous similar messages Jun 04 05:48:28 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 04 05:48:28 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 05:54:19 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 05:54:19 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 04 05:54:45 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d1028c47800, cur 1591275285 expire 1591275135 last 1591275058 Jun 04 05:54:45 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 05:56:28 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.4.16@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 05:56:28 fir-md1-s2 kernel: LustreError: Skipped 22 previous similar messages Jun 04 06:02:16 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 04 06:02:16 fir-md1-s2 kernel: Lustre: Skipped 7 previous similar messages Jun 04 06:04:21 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 06:04:21 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 04 06:06:31 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.47@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 06:06:31 fir-md1-s2 kernel: LustreError: Skipped 26 previous similar messages Jun 04 06:08:08 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d240c6cd400, cur 1591276088 expire 1591275938 last 1591275861 Jun 04 06:08:08 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 06:16:29 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 04 06:16:29 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 06:16:29 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 06:16:29 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 04 06:16:58 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.4.16@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 06:16:58 fir-md1-s2 kernel: LustreError: Skipped 20 previous similar messages Jun 04 06:22:46 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d0467ce8800, cur 1591276966 expire 1591276816 last 1591276739 Jun 04 06:22:46 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 06:27:17 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.8.19@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 06:27:17 fir-md1-s2 kernel: LustreError: Skipped 25 previous similar messages Jun 04 06:28:11 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 06:28:11 fir-md1-s2 kernel: Lustre: Skipped 11 previous similar messages Jun 04 06:30:17 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 04 06:30:17 fir-md1-s2 kernel: Lustre: Skipped 7 previous similar messages Jun 04 06:36:09 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d23ed2f2c00, cur 1591277769 expire 1591277619 last 1591277542 Jun 04 06:36:09 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 06:37:52 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.4.16@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 06:37:52 fir-md1-s2 kernel: LustreError: Skipped 23 previous similar messages Jun 04 06:39:29 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 06:39:29 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 04 06:44:30 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 04 06:44:30 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 06:48:11 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.8.19@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 06:48:11 fir-md1-s2 kernel: LustreError: Skipped 22 previous similar messages Jun 04 06:49:31 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 06:49:31 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages Jun 04 06:50:47 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d120c2dec00, cur 1591278647 expire 1591278497 last 1591278420 Jun 04 06:50:47 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 06:54:32 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 04 06:54:32 fir-md1-s2 kernel: Lustre: Skipped 6 previous similar messages Jun 04 06:58:22 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.47@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 06:58:22 fir-md1-s2 kernel: LustreError: Skipped 25 previous similar messages Jun 04 07:00:30 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 04 07:00:30 fir-md1-s2 kernel: Lustre: Skipped 10 previous similar messages Jun 04 07:04:17 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d2467e81400, cur 1591279457 expire 1591279307 last 1591279230 Jun 04 07:04:17 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 07:07:29 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 04 07:07:29 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Jun 04 07:08:37 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.34@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 07:08:37 fir-md1-s2 kernel: LustreError: Skipped 22 previous similar messages Jun 04 07:12:31 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 07:12:31 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 04 07:13:17 fir-md1-s2 kernel: perf: interrupt took too long (2503 > 2500), lowering kernel.perf_event_max_sample_rate to 79000 Jun 04 07:17:32 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 04 07:17:32 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages Jun 04 07:18:39 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.34@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 07:18:39 fir-md1-s2 kernel: LustreError: Skipped 20 previous similar messages Jun 04 07:18:48 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d1464005c00, cur 1591280328 expire 1591280178 last 1591280101 Jun 04 07:18:48 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 07:22:33 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 07:22:33 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 04 07:29:06 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.34@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 07:29:06 fir-md1-s2 kernel: LustreError: Skipped 26 previous similar messages Jun 04 07:30:54 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 04 07:30:54 fir-md1-s2 kernel: Lustre: Skipped 7 previous similar messages Jun 04 07:32:36 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d0479ed6c00, cur 1591281156 expire 1591281006 last 1591280929 Jun 04 07:32:36 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 07:33:00 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 07:33:00 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 04 07:39:08 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.34@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 07:39:08 fir-md1-s2 kernel: LustreError: Skipped 21 previous similar messages Jun 04 07:43:02 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 07:43:02 fir-md1-s2 kernel: Lustre: Skipped 7 previous similar messages Jun 04 07:45:32 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 04 07:45:32 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 07:46:49 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d04691dec00, cur 1591282009 expire 1591281859 last 1591281782 Jun 04 07:46:49 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 07:49:11 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.34@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 07:49:11 fir-md1-s2 kernel: LustreError: Skipped 22 previous similar messages Jun 04 07:54:44 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 07:54:44 fir-md1-s2 kernel: Lustre: Skipped 11 previous similar messages Jun 04 07:58:55 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 04 07:58:55 fir-md1-s2 kernel: Lustre: Skipped 7 previous similar messages Jun 04 07:59:38 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.34@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 07:59:38 fir-md1-s2 kernel: LustreError: Skipped 24 previous similar messages Jun 04 08:00:37 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d1028c46800, cur 1591282837 expire 1591282687 last 1591282610 Jun 04 08:00:37 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 08:06:02 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 08:06:02 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 04 08:09:40 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.34@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 08:09:40 fir-md1-s2 kernel: LustreError: Skipped 21 previous similar messages Jun 04 08:13:33 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 04 08:13:33 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 08:14:50 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d247efa5c00, cur 1591283690 expire 1591283540 last 1591283463 Jun 04 08:14:50 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 08:16:54 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 08:16:54 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 04 08:19:45 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.8.19@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 08:19:45 fir-md1-s2 kernel: LustreError: Skipped 23 previous similar messages Jun 04 08:26:56 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 04 08:26:56 fir-md1-s2 kernel: Lustre: Skipped 7 previous similar messages Jun 04 08:26:56 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 08:26:56 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 04 08:28:38 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d0fd5a1b000, cur 1591284518 expire 1591284368 last 1591284291 Jun 04 08:28:38 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 08:30:09 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.34@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 08:30:09 fir-md1-s2 kernel: LustreError: Skipped 23 previous similar messages Jun 04 08:39:04 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 08:39:04 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 04 08:40:11 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.34@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 08:40:11 fir-md1-s2 kernel: LustreError: Skipped 21 previous similar messages Jun 04 08:41:34 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 04 08:41:34 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 08:42:51 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d14638f4400, cur 1591285371 expire 1591285221 last 1591285144 Jun 04 08:42:51 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 08:50:25 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.4.16@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 08:50:25 fir-md1-s2 kernel: LustreError: Skipped 24 previous similar messages Jun 04 08:50:46 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 08:50:46 fir-md1-s2 kernel: Lustre: Skipped 11 previous similar messages Jun 04 08:54:57 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 04 08:54:57 fir-md1-s2 kernel: Lustre: Skipped 7 previous similar messages Jun 04 08:56:39 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d240be56c00, cur 1591286199 expire 1591286049 last 1591285972 Jun 04 08:56:39 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 09:00:27 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.4.16@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 09:00:27 fir-md1-s2 kernel: LustreError: Skipped 22 previous similar messages Jun 04 09:02:04 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 09:02:04 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 04 09:09:35 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 04 09:09:35 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 09:10:46 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.8.19@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 09:10:46 fir-md1-s2 kernel: LustreError: Skipped 22 previous similar messages Jun 04 09:10:52 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d11c0ad2400, cur 1591287052 expire 1591286902 last 1591286825 Jun 04 09:10:52 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 09:12:06 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 09:12:06 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages Jun 04 09:20:57 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.47@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 09:20:57 fir-md1-s2 kernel: LustreError: Skipped 25 previous similar messages Jun 04 09:23:04 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) reconnecting Jun 04 09:23:04 fir-md1-s2 kernel: Lustre: Skipped 7 previous similar messages Jun 04 09:23:04 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 04 09:23:04 fir-md1-s2 kernel: Lustre: Skipped 10 previous similar messages Jun 04 09:24:46 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d032f365000, cur 1591287886 expire 1591287736 last 1591287659 Jun 04 09:24:46 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 09:31:12 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.34@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 09:31:12 fir-md1-s2 kernel: LustreError: Skipped 22 previous similar messages Jun 04 09:35:06 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 09:35:06 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 04 09:37:36 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 04 09:37:36 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 09:38:53 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d0477229400, cur 1591288733 expire 1591288583 last 1591288506 Jun 04 09:38:53 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 09:41:14 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.34@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 09:41:14 fir-md1-s2 kernel: LustreError: Skipped 20 previous similar messages Jun 04 09:45:08 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 09:45:08 fir-md1-s2 kernel: Lustre: Skipped 10 previous similar messages Jun 04 09:51:05 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) reconnecting Jun 04 09:51:05 fir-md1-s2 kernel: Lustre: Skipped 7 previous similar messages Jun 04 09:51:41 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.34@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 09:51:41 fir-md1-s2 kernel: LustreError: Skipped 26 previous similar messages Jun 04 09:52:47 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d1461841800, cur 1591289567 expire 1591289417 last 1591289340 Jun 04 09:52:47 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 09:55:35 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 09:55:35 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages Jun 04 10:01:43 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.34@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 10:01:43 fir-md1-s2 kernel: LustreError: Skipped 21 previous similar messages Jun 04 10:05:37 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 04 10:05:37 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 10:05:37 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 10:05:37 fir-md1-s2 kernel: Lustre: Skipped 31 previous similar messages Jun 04 10:06:53 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d2377e38400, cur 1591290413 expire 1591290263 last 1591290186 Jun 04 10:06:53 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 10:11:46 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.34@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 10:11:46 fir-md1-s2 kernel: LustreError: Skipped 59 previous similar messages Jun 04 10:17:00 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 04 10:17:00 fir-md1-s2 kernel: Lustre: Skipped 11 previous similar messages Jun 04 10:19:06 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) reconnecting Jun 04 10:19:06 fir-md1-s2 kernel: Lustre: Skipped 7 previous similar messages Jun 04 10:20:47 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d0469e0f800, cur 1591291247 expire 1591291097 last 1591291020 Jun 04 10:20:47 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 10:22:13 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.34@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 10:22:13 fir-md1-s2 kernel: LustreError: Skipped 24 previous similar messages Jun 04 10:28:37 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 10:28:37 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 04 10:32:15 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.34@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 10:32:15 fir-md1-s2 kernel: LustreError: Skipped 21 previous similar messages Jun 04 10:33:38 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 04 10:33:38 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 10:34:54 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d2479b77400, cur 1591292094 expire 1591291944 last 1591291867 Jun 04 10:34:54 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 10:38:45 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 04 10:38:45 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 04 10:42:20 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.8.19@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 10:42:20 fir-md1-s2 kernel: LustreError: Skipped 23 previous similar messages Jun 04 10:47:07 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) reconnecting Jun 04 10:47:07 fir-md1-s2 kernel: Lustre: Skipped 7 previous similar messages Jun 04 10:48:48 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d031178bc00, cur 1591292928 expire 1591292778 last 1591292701 Jun 04 10:48:48 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 10:49:12 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 04 10:49:12 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 04 10:52:39 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.49.23.15@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 10:52:39 fir-md1-s2 kernel: LustreError: Skipped 23 previous similar messages Jun 04 10:59:14 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 04 10:59:14 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages Jun 04 11:01:39 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 04 11:01:39 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 11:02:46 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.34@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 11:02:46 fir-md1-s2 kernel: LustreError: Skipped 21 previous similar messages Jun 04 11:02:55 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d23ed2f7000, cur 1591293775 expire 1591293625 last 1591293548 Jun 04 11:02:55 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 11:10:57 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 04 11:10:57 fir-md1-s2 kernel: Lustre: Skipped 10 previous similar messages Jun 04 11:13:00 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.4.16@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 11:13:00 fir-md1-s2 kernel: LustreError: Skipped 25 previous similar messages Jun 04 11:15:08 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) reconnecting Jun 04 11:15:08 fir-md1-s2 kernel: Lustre: Skipped 7 previous similar messages Jun 04 11:16:49 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d240be57000, cur 1591294609 expire 1591294459 last 1591294382 Jun 04 11:16:49 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 11:22:08 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 11:22:08 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 04 11:23:02 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.4.16@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 11:23:02 fir-md1-s2 kernel: LustreError: Skipped 22 previous similar messages Jun 04 11:29:40 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 04 11:29:40 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 11:30:56 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8ce56fd24800, cur 1591295456 expire 1591295306 last 1591295229 Jun 04 11:30:56 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 11:32:10 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 11:32:10 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages Jun 04 11:33:21 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.8.19@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 11:33:21 fir-md1-s2 kernel: LustreError: Skipped 21 previous similar messages Jun 04 11:39:42 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 04 11:39:42 fir-md1-s2 kernel: Lustre: Skipped 6 previous similar messages Jun 04 11:43:09 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 04 11:43:09 fir-md1-s2 kernel: Lustre: Skipped 10 previous similar messages Jun 04 11:44:10 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.34@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 11:44:10 fir-md1-s2 kernel: LustreError: Skipped 26 previous similar messages Jun 04 11:44:50 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d03a9b23000, cur 1591296290 expire 1591296140 last 1591296063 Jun 04 11:44:50 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 11:52:39 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 04 11:52:39 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Jun 04 11:54:24 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.47@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 11:54:24 fir-md1-s2 kernel: LustreError: Skipped 23 previous similar messages Jun 04 11:55:10 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 11:55:10 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 04 11:58:57 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d0cdf70cc00, cur 1591297137 expire 1591296987 last 1591296910 Jun 04 11:58:57 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 12:02:42 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 04 12:02:42 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Jun 04 12:04:43 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.8.19@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 12:04:43 fir-md1-s2 kernel: LustreError: Skipped 21 previous similar messages Jun 04 12:06:02 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 12:06:02 fir-md1-s2 kernel: Lustre: Skipped 10 previous similar messages Jun 04 12:12:51 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d2377e3e400, cur 1591297971 expire 1591297821 last 1591297744 Jun 04 12:12:51 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 12:14:45 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.8.19@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 12:14:45 fir-md1-s2 kernel: LustreError: Skipped 24 previous similar messages Jun 04 12:15:45 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) reconnecting Jun 04 12:15:45 fir-md1-s2 kernel: Lustre: Skipped 6 previous similar messages Jun 04 12:16:04 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 12:16:04 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 04 12:24:56 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.47@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 12:24:56 fir-md1-s2 kernel: LustreError: Skipped 22 previous similar messages Jun 04 12:25:48 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) reconnecting Jun 04 12:25:48 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Jun 04 12:26:07 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.26.35@o2ib1) Jun 04 12:26:07 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages Jun 04 12:26:29 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client d5e31a1a-3c70-4 (at 10.49.26.35@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8cf44b85c400, cur 1591298789 expire 1591298639 last 1591298562 Jun 04 12:26:29 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 12:34:58 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.47@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 12:34:58 fir-md1-s2 kernel: LustreError: Skipped 22 previous similar messages Jun 04 12:37:05 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 04 12:37:05 fir-md1-s2 kernel: Lustre: Skipped 10 previous similar messages Jun 04 12:39:10 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) reconnecting Jun 04 12:39:10 fir-md1-s2 kernel: Lustre: Skipped 6 previous similar messages Jun 04 12:40:52 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d145f245000, cur 1591299652 expire 1591299502 last 1591299425 Jun 04 12:40:52 fir-md1-s2 kernel: Lustre: Skipped 6 previous similar messages Jun 04 12:44:59 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.4.16@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 12:44:59 fir-md1-s2 kernel: LustreError: Skipped 24 previous similar messages Jun 04 12:48:41 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 12:48:41 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 04 12:53:42 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 04 12:53:42 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 12:54:59 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8cebce370400, cur 1591300499 expire 1591300349 last 1591300272 Jun 04 12:54:59 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 12:55:27 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.47@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 12:55:27 fir-md1-s2 kernel: LustreError: Skipped 21 previous similar messages Jun 04 12:58:43 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 12:58:43 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages Jun 04 13:03:44 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 04 13:03:44 fir-md1-s2 kernel: Lustre: Skipped 6 previous similar messages Jun 04 13:06:07 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.34@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 13:06:07 fir-md1-s2 kernel: LustreError: Skipped 25 previous similar messages Jun 04 13:08:53 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d246ac63800, cur 1591301333 expire 1591301183 last 1591301106 Jun 04 13:08:53 fir-md1-s2 kernel: Lustre: Skipped 6 previous similar messages Jun 04 13:09:17 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 04 13:09:17 fir-md1-s2 kernel: Lustre: Skipped 11 previous similar messages Jun 04 13:16:42 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 04 13:16:42 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Jun 04 13:17:11 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.4.16@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 13:17:11 fir-md1-s2 kernel: LustreError: Skipped 24 previous similar messages Jun 04 13:19:19 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 04 13:19:19 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages Jun 04 13:23:00 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d1028c45800, cur 1591302180 expire 1591302030 last 1591301953 Jun 04 13:23:00 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 13:26:44 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 04 13:26:44 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Jun 04 13:27:13 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.4.16@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 13:27:13 fir-md1-s2 kernel: LustreError: Skipped 21 previous similar messages Jun 04 13:30:05 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 13:30:05 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 04 13:36:54 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d14717f0800, cur 1591303014 expire 1591302864 last 1591302787 Jun 04 13:36:54 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 13:37:41 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.47@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 13:37:41 fir-md1-s2 kernel: LustreError: Skipped 26 previous similar messages Jun 04 13:39:48 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) reconnecting Jun 04 13:39:48 fir-md1-s2 kernel: Lustre: Skipped 6 previous similar messages Jun 04 13:40:07 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 13:40:07 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 04 13:47:42 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.4.16@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 13:47:42 fir-md1-s2 kernel: LustreError: Skipped 20 previous similar messages Jun 04 13:49:50 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) reconnecting Jun 04 13:49:50 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Jun 04 13:51:01 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d24712f8800, cur 1591303861 expire 1591303711 last 1591303634 Jun 04 13:51:01 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 13:52:15 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 13:52:15 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages Jun 04 13:57:45 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.4.16@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 13:57:45 fir-md1-s2 kernel: LustreError: Skipped 21 previous similar messages Jun 04 14:00:36 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 04 14:00:36 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 14:01:03 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d0cdfae4c00, cur 1591304463 expire 1591304313 last 1591304236 Jun 04 14:01:03 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages Jun 04 14:02:17 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 14:02:17 fir-md1-s2 kernel: Lustre: Skipped 10 previous similar messages Jun 04 14:07:47 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.47@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 14:07:47 fir-md1-s2 kernel: LustreError: Skipped 26 previous similar messages Jun 04 14:12:44 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 04 14:12:44 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Jun 04 14:12:44 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 14:12:44 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages Jun 04 14:14:07 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d247940c800, cur 1591305247 expire 1591305097 last 1591305020 Jun 04 14:14:07 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 14:18:14 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.4.16@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 14:18:14 fir-md1-s2 kernel: LustreError: Skipped 20 previous similar messages Jun 04 14:22:46 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 04 14:22:46 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Jun 04 14:22:46 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 14:22:46 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages Jun 04 14:28:16 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.4.16@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 14:28:16 fir-md1-s2 kernel: LustreError: Skipped 22 previous similar messages Jun 04 14:29:04 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d04063cf400, cur 1591306144 expire 1591305994 last 1591305917 Jun 04 14:29:04 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 14:33:44 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 04 14:33:44 fir-md1-s2 kernel: Lustre: Skipped 10 previous similar messages Jun 04 14:35:50 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) reconnecting Jun 04 14:35:50 fir-md1-s2 kernel: Lustre: Skipped 6 previous similar messages Jun 04 14:39:22 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.34@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 14:39:22 fir-md1-s2 kernel: LustreError: Skipped 26 previous similar messages Jun 04 14:42:07 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d2471fa0800, cur 1591306927 expire 1591306777 last 1591306700 Jun 04 14:42:07 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 14:45:46 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 14:45:46 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 04 14:45:52 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) reconnecting Jun 04 14:45:52 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Jun 04 14:49:24 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.34@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 14:49:24 fir-md1-s2 kernel: LustreError: Skipped 21 previous similar messages Jun 04 14:56:38 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 04 14:56:38 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 14:56:38 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 14:56:38 fir-md1-s2 kernel: Lustre: Skipped 11 previous similar messages Jun 04 14:57:04 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d1461847000, cur 1591307824 expire 1591307674 last 1591307597 Jun 04 14:57:04 fir-md1-s2 kernel: Lustre: Skipped 6 previous similar messages Jun 04 14:59:38 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.4.16@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 14:59:38 fir-md1-s2 kernel: LustreError: Skipped 24 previous similar messages Jun 04 15:06:40 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 15:06:40 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 04 15:08:46 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 04 15:08:46 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Jun 04 15:09:40 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.4.16@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 15:09:40 fir-md1-s2 kernel: LustreError: Skipped 22 previous similar messages Jun 04 15:10:08 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d24796fb800, cur 1591308608 expire 1591308458 last 1591308381 Jun 04 15:10:08 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 15:18:04 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 04 15:18:04 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages Jun 04 15:18:48 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 04 15:18:48 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Jun 04 15:20:32 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.47@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 15:20:32 fir-md1-s2 kernel: LustreError: Skipped 23 previous similar messages Jun 04 15:25:05 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d0391e5d000, cur 1591309505 expire 1591309355 last 1591309278 Jun 04 15:25:05 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 15:28:25 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 15:28:25 fir-md1-s2 kernel: Lustre: Skipped 10 previous similar messages Jun 04 15:30:48 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.34@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 15:30:48 fir-md1-s2 kernel: LustreError: Skipped 26 previous similar messages Jun 04 15:31:52 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) reconnecting Jun 04 15:31:52 fir-md1-s2 kernel: Lustre: Skipped 6 previous similar messages Jun 04 15:38:09 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d2473632400, cur 1591310289 expire 1591310139 last 1591310062 Jun 04 15:38:09 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 15:39:17 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 15:39:17 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages Jun 04 15:41:02 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.47@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 15:41:02 fir-md1-s2 kernel: LustreError: Skipped 22 previous similar messages Jun 04 15:41:54 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) reconnecting Jun 04 15:41:54 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Jun 04 15:49:19 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 15:49:19 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages Jun 04 15:51:54 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.47@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 15:51:54 fir-md1-s2 kernel: LustreError: Skipped 22 previous similar messages Jun 04 15:52:40 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 04 15:52:40 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 15:53:06 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d24677ce800, cur 1591311186 expire 1591311036 last 1591310959 Jun 04 15:53:06 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 15:59:53 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 04 15:59:53 fir-md1-s2 kernel: Lustre: Skipped 10 previous similar messages Jun 04 16:01:56 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.47@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 16:01:56 fir-md1-s2 kernel: LustreError: Skipped 26 previous similar messages Jun 04 16:04:48 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 04 16:04:48 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Jun 04 16:06:10 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d0479fea400, cur 1591311970 expire 1591311820 last 1591311743 Jun 04 16:06:10 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 16:09:55 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 04 16:09:55 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages Jun 04 16:12:40 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.8.19@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 16:12:40 fir-md1-s2 kernel: LustreError: Skipped 20 previous similar messages Jun 04 16:14:50 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 04 16:14:50 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages Jun 04 16:19:57 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 04 16:19:57 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages Jun 04 16:20:23 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d240c6cc800, cur 1591312823 expire 1591312673 last 1591312596 Jun 04 16:20:23 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 16:22:50 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.4.16@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 16:22:50 fir-md1-s2 kernel: LustreError: Skipped 25 previous similar messages Jun 04 16:24:52 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 04 16:24:52 fir-md1-s2 kernel: Lustre: Skipped 6 previous similar messages Jun 04 16:30:24 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 04 16:30:24 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 04 16:33:09 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.8.19@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 16:33:09 fir-md1-s2 kernel: LustreError: Skipped 23 previous similar messages Jun 04 16:34:11 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d0cdfdbf000, cur 1591313651 expire 1591313501 last 1591313424 Jun 04 16:34:11 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 16:37:50 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 04 16:37:50 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Jun 04 16:40:26 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 04 16:40:26 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages Jun 04 16:43:11 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.8.19@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 16:43:11 fir-md1-s2 kernel: LustreError: Skipped 19 previous similar messages Jun 04 16:47:52 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 04 16:47:52 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Jun 04 16:48:24 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d24796fe800, cur 1591314504 expire 1591314354 last 1591314277 Jun 04 16:48:24 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 16:51:12 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 16:51:12 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 04 16:53:47 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.47@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 16:53:47 fir-md1-s2 kernel: LustreError: Skipped 26 previous similar messages Jun 04 17:00:55 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) reconnecting Jun 04 17:00:55 fir-md1-s2 kernel: Lustre: Skipped 6 previous similar messages Jun 04 17:01:14 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 17:01:14 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 04 17:02:12 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d0cdf70a000, cur 1591315332 expire 1591315182 last 1591315105 Jun 04 17:02:12 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 17:04:14 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.4.16@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 17:04:14 fir-md1-s2 kernel: LustreError: Skipped 23 previous similar messages Jun 04 17:10:58 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) reconnecting Jun 04 17:10:58 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Jun 04 17:12:38 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 04 17:12:38 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 04 17:14:16 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.47@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 17:14:16 fir-md1-s2 kernel: LustreError: Skipped 22 previous similar messages Jun 04 17:16:25 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d2309bbe400, cur 1591316185 expire 1591316035 last 1591315958 Jun 04 17:16:25 fir-md1-s2 kernel: Lustre: Skipped 6 previous similar messages Jun 04 17:22:59 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 17:22:59 fir-md1-s2 kernel: Lustre: Skipped 10 previous similar messages Jun 04 17:24:20 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) reconnecting Jun 04 17:24:20 fir-md1-s2 kernel: Lustre: Skipped 6 previous similar messages Jun 04 17:24:35 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.8.19@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 17:24:35 fir-md1-s2 kernel: LustreError: Skipped 25 previous similar messages Jun 04 17:30:13 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d2473c90c00, cur 1591317013 expire 1591316863 last 1591316786 Jun 04 17:30:13 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 17:33:51 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 17:33:51 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages Jun 04 17:34:59 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.34@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 17:34:59 fir-md1-s2 kernel: LustreError: Skipped 22 previous similar messages Jun 04 17:38:52 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 04 17:38:52 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 17:43:53 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 17:43:53 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages Jun 04 17:44:26 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d246a522800, cur 1591317866 expire 1591317716 last 1591317639 Jun 04 17:44:26 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 17:45:01 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.34@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 17:45:01 fir-md1-s2 kernel: LustreError: Skipped 22 previous similar messages Jun 04 17:48:54 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 04 17:48:54 fir-md1-s2 kernel: Lustre: Skipped 6 previous similar messages Jun 04 17:54:27 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 04 17:54:27 fir-md1-s2 kernel: Lustre: Skipped 10 previous similar messages Jun 04 17:55:28 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.34@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 17:55:28 fir-md1-s2 kernel: LustreError: Skipped 24 previous similar messages Jun 04 17:58:14 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d032f719800, cur 1591318694 expire 1591318544 last 1591318467 Jun 04 17:58:14 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 18:01:52 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 04 18:01:52 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Jun 04 18:04:29 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 04 18:04:29 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages Jun 04 18:05:30 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.34@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 18:05:30 fir-md1-s2 kernel: LustreError: Skipped 21 previous similar messages Jun 04 18:11:54 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 04 18:11:54 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Jun 04 18:12:27 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d0cdffcfc00, cur 1591319547 expire 1591319397 last 1591319320 Jun 04 18:12:27 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 18:15:15 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 18:15:15 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 04 18:16:09 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.47@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 18:16:09 fir-md1-s2 kernel: LustreError: Skipped 23 previous similar messages Jun 04 18:24:58 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) reconnecting Jun 04 18:24:58 fir-md1-s2 kernel: Lustre: Skipped 6 previous similar messages Jun 04 18:25:17 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 18:25:17 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 04 18:26:11 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.4.16@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 18:26:11 fir-md1-s2 kernel: LustreError: Skipped 24 previous similar messages Jun 04 18:26:15 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d092a7c8800, cur 1591320375 expire 1591320225 last 1591320148 Jun 04 18:26:15 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 18:35:00 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) reconnecting Jun 04 18:35:00 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Jun 04 18:36:39 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.47@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 18:36:39 fir-md1-s2 kernel: LustreError: Skipped 21 previous similar messages Jun 04 18:37:25 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 18:37:25 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages Jun 04 18:41:12 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d230860d800, cur 1591321272 expire 1591321122 last 1591321045 Jun 04 18:41:12 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 18:47:05 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.4.16@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 18:47:05 fir-md1-s2 kernel: LustreError: Skipped 25 previous similar messages Jun 04 18:48:48 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) reconnecting Jun 04 18:48:48 fir-md1-s2 kernel: Lustre: Skipped 6 previous similar messages Jun 04 18:48:48 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 04 18:48:48 fir-md1-s2 kernel: Lustre: Skipped 11 previous similar messages Jun 04 18:54:41 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d02b1e08800, cur 1591322081 expire 1591321931 last 1591321854 Jun 04 18:54:41 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 18:57:08 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.47@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 18:57:08 fir-md1-s2 kernel: LustreError: Skipped 23 previous similar messages Jun 04 19:00:25 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 19:00:25 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 04 19:02:55 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 04 19:02:55 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 19:07:10 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.47@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 19:07:10 fir-md1-s2 kernel: LustreError: Skipped 19 previous similar messages Jun 04 19:09:13 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d0a64f51400, cur 1591322953 expire 1591322803 last 1591322726 Jun 04 19:09:13 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 19:10:33 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 04 19:10:33 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 04 19:12:57 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 04 19:12:57 fir-md1-s2 kernel: Lustre: Skipped 6 previous similar messages Jun 04 19:17:50 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.34@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 19:17:50 fir-md1-s2 kernel: LustreError: Skipped 26 previous similar messages Jun 04 19:21:00 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 04 19:21:00 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 04 19:22:41 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d1431473400, cur 1591323761 expire 1591323611 last 1591323534 Jun 04 19:22:41 fir-md1-s2 kernel: Lustre: Skipped 6 previous similar messages Jun 04 19:25:55 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 04 19:25:55 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Jun 04 19:28:46 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.8.19@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 19:28:46 fir-md1-s2 kernel: LustreError: Skipped 23 previous similar messages Jun 04 19:31:02 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 04 19:31:02 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages Jun 04 19:34:56 fir-md1-s2 kernel: Lustre: DEBUG MARKER: Thu Jun 4 19:34:56 2020 Jun 04 19:35:57 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 04 19:35:57 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Jun 04 19:37:13 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d10245cdc00, cur 1591324633 expire 1591324483 last 1591324406 Jun 04 19:37:13 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 19:38:48 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.8.19@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 19:38:48 fir-md1-s2 kernel: LustreError: Skipped 22 previous similar messages Jun 04 19:42:45 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 04 19:42:45 fir-md1-s2 kernel: Lustre: Skipped 10 previous similar messages Jun 04 19:49:01 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) reconnecting Jun 04 19:49:01 fir-md1-s2 kernel: Lustre: Skipped 6 previous similar messages Jun 04 19:49:15 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.8.19@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 19:49:15 fir-md1-s2 kernel: LustreError: Skipped 26 previous similar messages Jun 04 19:50:42 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d0fd5913c00, cur 1591325442 expire 1591325292 last 1591325215 Jun 04 19:50:42 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 19:53:56 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 19:53:56 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 04 19:59:03 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) reconnecting Jun 04 19:59:03 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Jun 04 19:59:18 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.8.19@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 19:59:18 fir-md1-s2 kernel: LustreError: Skipped 19 previous similar messages Jun 04 20:03:58 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 20:03:58 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages Jun 04 20:05:14 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d2471fa4c00, cur 1591326314 expire 1591326164 last 1591326087 Jun 04 20:05:14 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 20:09:28 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.4.16@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 20:09:28 fir-md1-s2 kernel: LustreError: Skipped 23 previous similar messages Jun 04 20:12:51 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) reconnecting Jun 04 20:12:51 fir-md1-s2 kernel: Lustre: Skipped 6 previous similar messages Jun 04 20:14:56 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 04 20:14:56 fir-md1-s2 kernel: Lustre: Skipped 10 previous similar messages Jun 04 20:18:43 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d0cdf70d400, cur 1591327123 expire 1591326973 last 1591326896 Jun 04 20:18:43 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 20:19:47 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.8.19@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 20:19:47 fir-md1-s2 kernel: LustreError: Skipped 25 previous similar messages Jun 04 20:26:58 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 04 20:26:58 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 20:26:58 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 20:26:58 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 04 20:29:49 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.8.19@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 20:29:49 fir-md1-s2 kernel: LustreError: Skipped 19 previous similar messages Jun 04 20:33:15 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d22e7fb8800, cur 1591327995 expire 1591327845 last 1591327768 Jun 04 20:33:15 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 20:37:50 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 04 20:37:50 fir-md1-s2 kernel: Lustre: Skipped 6 previous similar messages Jun 04 20:37:50 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 20:37:50 fir-md1-s2 kernel: Lustre: Skipped 10 previous similar messages Jun 04 20:40:49 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.4.16@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 20:40:49 fir-md1-s2 kernel: LustreError: Skipped 26 previous similar messages Jun 04 20:46:44 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d047766c400, cur 1591328804 expire 1591328654 last 1591328577 Jun 04 20:46:44 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 20:47:52 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 20:47:52 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 04 20:49:58 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 04 20:49:58 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Jun 04 20:50:51 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.4.16@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 20:50:51 fir-md1-s2 kernel: LustreError: Skipped 23 previous similar messages Jun 04 21:00:00 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 04 21:00:00 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages Jun 04 21:00:00 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 21:00:00 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages Jun 04 21:00:34 fir-md1-s2 kernel: perf: interrupt took too long (3130 > 3128), lowering kernel.perf_event_max_sample_rate to 63000 Jun 04 21:01:11 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.8.19@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 21:01:11 fir-md1-s2 kernel: LustreError: Skipped 22 previous similar messages Jun 04 21:01:16 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d10245ca800, cur 1591329676 expire 1591329526 last 1591329449 Jun 04 21:01:16 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 21:11:13 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.8.19@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 21:11:13 fir-md1-s2 kernel: LustreError: Skipped 24 previous similar messages Jun 04 21:11:23 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 04 21:11:23 fir-md1-s2 kernel: Lustre: Skipped 11 previous similar messages Jun 04 21:13:29 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) reconnecting Jun 04 21:13:29 fir-md1-s2 kernel: Lustre: Skipped 7 previous similar messages Jun 04 21:15:10 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d0986fda800, cur 1591330510 expire 1591330360 last 1591330283 Jun 04 21:15:10 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 21:21:23 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.4.16@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 21:21:23 fir-md1-s2 kernel: LustreError: Skipped 22 previous similar messages Jun 04 21:23:00 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 21:23:00 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 04 21:28:01 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 04 21:28:01 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 21:29:17 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d0986fdc400, cur 1591331357 expire 1591331207 last 1591331130 Jun 04 21:29:17 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 21:31:24 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.47@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 21:31:24 fir-md1-s2 kernel: LustreError: Skipped 22 previous similar messages Jun 04 21:33:08 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 04 21:33:08 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 04 21:41:30 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) reconnecting Jun 04 21:41:30 fir-md1-s2 kernel: Lustre: Skipped 7 previous similar messages Jun 04 21:42:31 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.34@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 21:42:31 fir-md1-s2 kernel: LustreError: Skipped 25 previous similar messages Jun 04 21:43:11 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d1462497c00, cur 1591332191 expire 1591332041 last 1591331964 Jun 04 21:43:11 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 21:43:35 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 04 21:43:35 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 04 21:52:44 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.47@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 21:52:44 fir-md1-s2 kernel: LustreError: Skipped 22 previous similar messages Jun 04 21:53:37 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 04 21:53:37 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages Jun 04 21:56:01 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 04 21:56:01 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 21:57:18 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d1465683000, cur 1591333038 expire 1591332888 last 1591332811 Jun 04 21:57:18 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 22:02:46 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.47@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 22:02:46 fir-md1-s2 kernel: LustreError: Skipped 22 previous similar messages Jun 04 22:05:20 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 04 22:05:20 fir-md1-s2 kernel: Lustre: Skipped 10 previous similar messages Jun 04 22:09:30 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) reconnecting Jun 04 22:09:30 fir-md1-s2 kernel: Lustre: Skipped 7 previous similar messages Jun 04 22:11:12 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d091bf79400, cur 1591333872 expire 1591333722 last 1591333645 Jun 04 22:11:12 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 22:13:13 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.47@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 22:13:13 fir-md1-s2 kernel: LustreError: Skipped 24 previous similar messages Jun 04 22:16:31 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 22:16:31 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 04 22:23:15 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.47@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 22:23:15 fir-md1-s2 kernel: LustreError: Skipped 21 previous similar messages Jun 04 22:24:02 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 04 22:24:02 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 22:25:19 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d092a3c0400, cur 1591334719 expire 1591334569 last 1591334492 Jun 04 22:25:19 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 22:26:33 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 22:26:33 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages Jun 04 22:33:18 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.47@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 22:33:18 fir-md1-s2 kernel: LustreError: Skipped 23 previous similar messages Jun 04 22:37:31 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) reconnecting Jun 04 22:37:31 fir-md1-s2 kernel: Lustre: Skipped 7 previous similar messages Jun 04 22:37:31 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 04 22:37:31 fir-md1-s2 kernel: Lustre: Skipped 10 previous similar messages Jun 04 22:39:13 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d0932eb6c00, cur 1591335553 expire 1591335403 last 1591335326 Jun 04 22:39:13 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 22:43:20 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.4.16@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 22:43:20 fir-md1-s2 kernel: LustreError: Skipped 23 previous similar messages Jun 04 22:49:33 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 22:49:33 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 04 22:52:03 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 04 22:52:03 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 22:53:20 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d230a644000, cur 1591336400 expire 1591336250 last 1591336173 Jun 04 22:53:20 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 22:53:47 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.47@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 22:53:47 fir-md1-s2 kernel: LustreError: Skipped 22 previous similar messages Jun 04 23:00:25 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 23:00:25 fir-md1-s2 kernel: Lustre: Skipped 10 previous similar messages Jun 04 23:02:05 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 04 23:02:05 fir-md1-s2 kernel: Lustre: Skipped 6 previous similar messages Jun 04 23:04:28 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.34@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 23:04:28 fir-md1-s2 kernel: LustreError: Skipped 25 previous similar messages Jun 04 23:07:13 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d0986fdf800, cur 1591337233 expire 1591337083 last 1591337006 Jun 04 23:07:13 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 23:10:27 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 23:10:27 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 04 23:15:03 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 04 23:15:03 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Jun 04 23:15:24 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.8.19@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 23:15:24 fir-md1-s2 kernel: LustreError: Skipped 23 previous similar messages Jun 04 23:21:21 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d2467e84800, cur 1591338081 expire 1591337931 last 1591337854 Jun 04 23:21:21 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 23:22:35 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 23:22:35 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages Jun 04 23:25:05 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 04 23:25:05 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Jun 04 23:25:34 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.4.16@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 23:25:34 fir-md1-s2 kernel: LustreError: Skipped 22 previous similar messages Jun 04 23:33:57 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 04 23:33:57 fir-md1-s2 kernel: Lustre: Skipped 11 previous similar messages Jun 04 23:35:36 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.47@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 23:35:36 fir-md1-s2 kernel: LustreError: Skipped 24 previous similar messages Jun 04 23:35:39 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d23086bc400, cur 1591338939 expire 1591338789 last 1591338712 Jun 04 23:35:39 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 23:38:08 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) reconnecting Jun 04 23:38:08 fir-md1-s2 kernel: Lustre: Skipped 6 previous similar messages Jun 04 23:45:35 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 04 23:45:35 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 04 23:45:55 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.8.19@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 23:45:55 fir-md1-s2 kernel: LustreError: Skipped 21 previous similar messages Jun 04 23:48:09 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) reconnecting Jun 04 23:48:09 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Jun 04 23:49:22 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d1468e98800, cur 1591339762 expire 1591339612 last 1591339535 Jun 04 23:49:22 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 04 23:55:41 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 04 23:55:41 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 04 23:56:05 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.4.16@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 04 23:56:05 fir-md1-s2 kernel: LustreError: Skipped 23 previous similar messages Jun 05 00:01:57 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) reconnecting Jun 05 00:01:57 fir-md1-s2 kernel: Lustre: Skipped 6 previous similar messages Jun 05 00:03:39 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d12c4a52000, cur 1591340619 expire 1591340469 last 1591340392 Jun 05 00:03:39 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 05 00:06:08 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 05 00:06:08 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 05 00:07:11 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.34@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 00:07:11 fir-md1-s2 kernel: LustreError: Skipped 26 previous similar messages Jun 05 00:16:06 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 05 00:16:06 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 05 00:16:10 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 05 00:16:10 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages Jun 05 00:17:13 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.34@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 00:17:13 fir-md1-s2 kernel: LustreError: Skipped 22 previous similar messages Jun 05 00:17:22 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d02b5f6f800, cur 1591341442 expire 1591341292 last 1591341215 Jun 05 00:17:22 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 05 00:26:08 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 05 00:26:08 fir-md1-s2 kernel: Lustre: Skipped 6 previous similar messages Jun 05 00:27:27 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.47@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 00:27:27 fir-md1-s2 kernel: LustreError: Skipped 23 previous similar messages Jun 05 00:27:52 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 05 00:27:52 fir-md1-s2 kernel: Lustre: Skipped 10 previous similar messages Jun 05 00:31:39 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d0906bcac00, cur 1591342299 expire 1591342149 last 1591342072 Jun 05 00:31:39 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 05 00:37:37 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.49.23.15@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 00:37:37 fir-md1-s2 kernel: LustreError: Skipped 22 previous similar messages Jun 05 00:39:06 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 05 00:39:06 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Jun 05 00:39:06 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 05 00:39:06 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 05 00:45:23 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d02b4a7dc00, cur 1591343123 expire 1591342973 last 1591342896 Jun 05 00:45:23 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 05 00:47:45 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.34@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 00:47:45 fir-md1-s2 kernel: LustreError: Skipped 22 previous similar messages Jun 05 00:49:08 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 05 00:49:08 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Jun 05 00:49:08 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 05 00:49:08 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages Jun 05 00:57:58 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.47@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 00:57:58 fir-md1-s2 kernel: LustreError: Skipped 24 previous similar messages Jun 05 00:59:40 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d0929b0cc00, cur 1591343980 expire 1591343830 last 1591343753 Jun 05 00:59:40 fir-md1-s2 kernel: Lustre: Skipped 6 previous similar messages Jun 05 01:00:04 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 05 01:00:04 fir-md1-s2 kernel: Lustre: Skipped 11 previous similar messages Jun 05 01:02:10 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) reconnecting Jun 05 01:02:10 fir-md1-s2 kernel: Lustre: Skipped 6 previous similar messages Jun 05 01:08:01 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.4.16@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 01:08:01 fir-md1-s2 kernel: LustreError: Skipped 22 previous similar messages Jun 05 01:12:08 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 05 01:12:08 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 05 01:12:12 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) reconnecting Jun 05 01:12:12 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Jun 05 01:13:24 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d11c0ad5400, cur 1591344804 expire 1591344654 last 1591344577 Jun 05 01:13:24 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 05 01:18:45 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.8.19@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 01:18:45 fir-md1-s2 kernel: LustreError: Skipped 22 previous similar messages Jun 05 01:23:00 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 05 01:23:00 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 05 01:23:00 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 05 01:23:00 fir-md1-s2 kernel: Lustre: Skipped 10 previous similar messages Jun 05 01:23:26 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d22e6e45c00, cur 1591345406 expire 1591345256 last 1591345179 Jun 05 01:23:26 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages Jun 05 01:29:09 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.34@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 01:29:09 fir-md1-s2 kernel: LustreError: Skipped 25 previous similar messages Jun 05 01:33:02 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 05 01:33:02 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 05 01:35:08 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 05 01:35:08 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Jun 05 01:36:28 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d0936f7fc00, cur 1591346188 expire 1591346038 last 1591345961 Jun 05 01:36:28 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 05 01:39:14 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.8.19@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 01:39:14 fir-md1-s2 kernel: LustreError: Skipped 22 previous similar messages Jun 05 01:44:24 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 05 01:44:24 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages Jun 05 01:45:10 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 05 01:45:10 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Jun 05 01:49:58 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.49.23.15@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 01:49:58 fir-md1-s2 kernel: LustreError: Skipped 23 previous similar messages Jun 05 01:51:27 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d22e7fb9400, cur 1591347087 expire 1591346937 last 1591346860 Jun 05 01:51:27 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 05 01:54:47 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 05 01:54:47 fir-md1-s2 kernel: Lustre: Skipped 10 previous similar messages Jun 05 01:58:11 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) reconnecting Jun 05 01:58:11 fir-md1-s2 kernel: Lustre: Skipped 6 previous similar messages Jun 05 02:00:16 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.47@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 02:00:16 fir-md1-s2 kernel: LustreError: Skipped 25 previous similar messages Jun 05 02:04:29 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d147948d400, cur 1591347869 expire 1591347719 last 1591347642 Jun 05 02:04:29 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 05 02:05:39 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 05 02:05:39 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages Jun 05 02:08:14 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) reconnecting Jun 05 02:08:14 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Jun 05 02:11:09 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.4.16@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 02:11:09 fir-md1-s2 kernel: LustreError: Skipped 21 previous similar messages Jun 05 02:15:41 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 05 02:15:41 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages Jun 05 02:19:02 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 05 02:19:02 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 05 02:19:28 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d0469183800, cur 1591348768 expire 1591348618 last 1591348541 Jun 05 02:19:28 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 05 02:21:19 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.49.23.15@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 02:21:19 fir-md1-s2 kernel: LustreError: Skipped 25 previous similar messages Jun 05 02:26:12 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 05 02:26:12 fir-md1-s2 kernel: Lustre: Skipped 10 previous similar messages Jun 05 02:31:09 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 05 02:31:09 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Jun 05 02:31:46 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.49.23.15@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 02:31:46 fir-md1-s2 kernel: LustreError: Skipped 23 previous similar messages Jun 05 02:32:30 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d147a667800, cur 1591349550 expire 1591349400 last 1591349323 Jun 05 02:32:30 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 05 02:36:15 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 05 02:36:15 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages Jun 05 02:41:11 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 05 02:41:11 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Jun 05 02:41:48 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.49.23.15@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 02:41:48 fir-md1-s2 kernel: LustreError: Skipped 22 previous similar messages Jun 05 02:47:03 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 05 02:47:03 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 05 02:47:29 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d0961761000, cur 1591350449 expire 1591350299 last 1591350222 Jun 05 02:47:29 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 05 02:52:07 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.47@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 02:52:07 fir-md1-s2 kernel: LustreError: Skipped 25 previous similar messages Jun 05 02:54:13 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) reconnecting Jun 05 02:54:13 fir-md1-s2 kernel: Lustre: Skipped 6 previous similar messages Jun 05 02:57:05 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 05 02:57:05 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 05 03:00:31 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d092778a400, cur 1591351231 expire 1591351081 last 1591351004 Jun 05 03:00:31 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 05 03:02:10 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.4.16@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 03:02:10 fir-md1-s2 kernel: LustreError: Skipped 20 previous similar messages Jun 05 03:04:15 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) reconnecting Jun 05 03:04:15 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Jun 05 03:08:26 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 05 03:08:26 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages Jun 05 03:12:12 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.4.16@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 03:12:12 fir-md1-s2 kernel: LustreError: Skipped 21 previous similar messages Jun 05 03:15:04 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 05 03:15:04 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 05 03:15:30 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d03a9b20000, cur 1591352130 expire 1591351980 last 1591351903 Jun 05 03:15:30 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 05 03:18:49 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 05 03:18:49 fir-md1-s2 kernel: Lustre: Skipped 10 previous similar messages Jun 05 03:22:38 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.47@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 03:22:38 fir-md1-s2 kernel: LustreError: Skipped 26 previous similar messages Jun 05 03:27:11 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 05 03:27:11 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Jun 05 03:28:31 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d0cdfae7c00, cur 1591352911 expire 1591352761 last 1591352684 Jun 05 03:28:31 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 05 03:29:42 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 05 03:29:42 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages Jun 05 03:32:41 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.4.16@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 03:32:41 fir-md1-s2 kernel: LustreError: Skipped 20 previous similar messages Jun 05 03:37:13 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 05 03:37:13 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Jun 05 03:39:44 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 05 03:39:44 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages Jun 05 03:42:42 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.4.16@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 03:42:42 fir-md1-s2 kernel: LustreError: Skipped 22 previous similar messages Jun 05 03:43:31 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d0986f0fc00, cur 1591353811 expire 1591353661 last 1591353584 Jun 05 03:43:31 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 05 03:47:15 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 05 03:47:15 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 05 03:50:14 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 05 03:50:14 fir-md1-s2 kernel: Lustre: Skipped 10 previous similar messages Jun 05 03:52:45 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.47@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 03:52:45 fir-md1-s2 kernel: LustreError: Skipped 25 previous similar messages Jun 05 03:56:32 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d2302b14400, cur 1591354592 expire 1591354442 last 1591354365 Jun 05 03:56:32 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 05 04:00:13 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 05 04:00:13 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Jun 05 04:00:16 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 05 04:00:16 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages Jun 05 04:03:11 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.4.16@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 04:03:11 fir-md1-s2 kernel: LustreError: Skipped 20 previous similar messages Jun 05 04:10:15 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 05 04:10:15 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 05 04:11:32 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d2479738c00, cur 1591355492 expire 1591355342 last 1591355265 Jun 05 04:11:32 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 05 04:11:56 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 05 04:11:56 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 05 04:13:32 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.8.19@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 04:13:32 fir-md1-s2 kernel: LustreError: Skipped 25 previous similar messages Jun 05 04:23:16 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) reconnecting Jun 05 04:23:16 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 05 04:23:16 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 05 04:23:16 fir-md1-s2 kernel: Lustre: Skipped 10 previous similar messages Jun 05 04:24:06 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.4.16@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 04:24:06 fir-md1-s2 kernel: LustreError: Skipped 23 previous similar messages Jun 05 04:24:32 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d0963f0e000, cur 1591356272 expire 1591356122 last 1591356045 Jun 05 04:24:32 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 05 04:34:08 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) reconnecting Jun 05 04:34:08 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 05 04:34:08 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 05 04:34:08 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 05 04:34:32 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d02b6771c00, cur 1591356872 expire 1591356722 last 1591356645 Jun 05 04:34:32 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages Jun 05 04:34:59 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.47@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 04:34:59 fir-md1-s2 kernel: LustreError: Skipped 23 previous similar messages Jun 05 04:44:10 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 05 04:44:10 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 05 04:45:15 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.34@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 04:45:15 fir-md1-s2 kernel: LustreError: Skipped 25 previous similar messages Jun 05 04:46:16 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) reconnecting Jun 05 04:46:16 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 05 04:47:57 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d090d3d9c00, cur 1591357677 expire 1591357527 last 1591357450 Jun 05 04:47:57 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 05 04:55:28 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.47@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 04:55:28 fir-md1-s2 kernel: LustreError: Skipped 22 previous similar messages Jun 05 04:56:15 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 05 04:56:15 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 05 04:56:18 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) reconnecting Jun 05 04:56:18 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Jun 05 05:02:32 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d092a7cf000, cur 1591358552 expire 1591358402 last 1591358325 Jun 05 05:02:32 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 05 05:05:30 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.47@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 05:05:30 fir-md1-s2 kernel: LustreError: Skipped 21 previous similar messages Jun 05 05:06:17 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 05 05:06:17 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 05 05:09:38 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 05 05:09:38 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 05 05:15:57 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.4.16@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 05:15:57 fir-md1-s2 kernel: LustreError: Skipped 25 previous similar messages Jun 05 05:15:58 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d02b465fc00, cur 1591359358 expire 1591359208 last 1591359131 Jun 05 05:15:58 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 05 05:16:47 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 05 05:16:47 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 05 05:19:40 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 05 05:19:40 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 05 05:25:59 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.47@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 05:25:59 fir-md1-s2 kernel: LustreError: Skipped 21 previous similar messages Jun 05 05:26:49 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 05 05:26:49 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages Jun 05 05:30:10 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) reconnecting Jun 05 05:30:10 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Jun 05 05:30:33 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d1463a5a400, cur 1591360233 expire 1591360083 last 1591360006 Jun 05 05:30:33 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 05 05:36:26 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.4.16@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 05:36:26 fir-md1-s2 kernel: LustreError: Skipped 23 previous similar messages Jun 05 05:37:39 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 05 05:37:39 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 05 05:42:18 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) reconnecting Jun 05 05:42:18 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 05 05:43:59 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d0cdee6bc00, cur 1591361039 expire 1591360889 last 1591360812 Jun 05 05:43:59 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 05 05:46:54 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.47@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 05:46:54 fir-md1-s2 kernel: LustreError: Skipped 25 previous similar messages Jun 05 05:47:41 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 05 05:47:41 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 05 05:52:20 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) reconnecting Jun 05 05:52:20 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Jun 05 05:57:35 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.34@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 05:57:35 fir-md1-s2 kernel: LustreError: Skipped 21 previous similar messages Jun 05 05:58:34 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d230860e000, cur 1591361914 expire 1591361764 last 1591361687 Jun 05 05:58:34 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 05 05:59:01 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 05 05:59:01 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages Jun 05 06:02:22 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) reconnecting Jun 05 06:02:22 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Jun 05 06:07:57 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.49.23.15@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 06:07:57 fir-md1-s2 kernel: LustreError: Skipped 25 previous similar messages Jun 05 06:09:50 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 05 06:09:50 fir-md1-s2 kernel: Lustre: Skipped 10 previous similar messages Jun 05 06:12:25 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d22f8be7400, cur 1591362745 expire 1591362595 last 1591362518 Jun 05 06:12:25 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 05 06:15:20 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) reconnecting Jun 05 06:15:20 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 05 06:18:24 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.49.23.15@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 06:18:24 fir-md1-s2 kernel: LustreError: Skipped 23 previous similar messages Jun 05 06:20:18 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 05 06:20:18 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 05 06:25:22 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) reconnecting Jun 05 06:25:22 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Jun 05 06:26:35 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d0963f03c00, cur 1591363595 expire 1591363445 last 1591363368 Jun 05 06:26:35 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 05 06:28:26 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.49.23.15@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 06:28:26 fir-md1-s2 kernel: LustreError: Skipped 22 previous similar messages Jun 05 06:30:20 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 05 06:30:20 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages Jun 05 06:36:11 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 05 06:36:11 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 05 06:38:28 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.49.23.15@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 06:38:28 fir-md1-s2 kernel: LustreError: Skipped 24 previous similar messages Jun 05 06:40:26 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d0986fde400, cur 1591364426 expire 1591364276 last 1591364199 Jun 05 06:40:26 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 05 06:40:50 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 05 06:40:50 fir-md1-s2 kernel: Lustre: Skipped 10 previous similar messages Jun 05 06:48:19 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 05 06:48:19 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Jun 05 06:48:46 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.4.16@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 06:48:46 fir-md1-s2 kernel: LustreError: Skipped 21 previous similar messages Jun 05 06:50:52 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 05 06:50:52 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages Jun 05 06:54:36 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d0469183400, cur 1591365276 expire 1591365126 last 1591365049 Jun 05 06:54:36 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 05 06:58:21 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 05 06:58:21 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Jun 05 06:58:48 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.4.16@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 06:58:48 fir-md1-s2 kernel: LustreError: Skipped 21 previous similar messages Jun 05 07:02:31 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 05 07:02:31 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 05 07:08:27 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d090c628c00, cur 1591366107 expire 1591365957 last 1591365880 Jun 05 07:08:27 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 05 07:08:51 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.47@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 07:08:51 fir-md1-s2 kernel: LustreError: Skipped 26 previous similar messages Jun 05 07:11:21 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) reconnecting Jun 05 07:11:21 fir-md1-s2 kernel: Lustre: Skipped 6 previous similar messages Jun 05 07:13:52 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 05 07:13:52 fir-md1-s2 kernel: Lustre: Skipped 10 previous similar messages Jun 05 07:19:18 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.4.16@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 07:19:18 fir-md1-s2 kernel: LustreError: Skipped 20 previous similar messages Jun 05 07:21:23 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) reconnecting Jun 05 07:21:23 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Jun 05 07:22:37 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d22fb77b000, cur 1591366957 expire 1591366807 last 1591366730 Jun 05 07:22:37 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 05 07:24:44 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 05 07:24:44 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 05 07:29:20 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.4.16@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 07:29:20 fir-md1-s2 kernel: LustreError: Skipped 22 previous similar messages Jun 05 07:32:13 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 05 07:32:13 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 05 07:34:46 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 05 07:34:46 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 05 07:36:28 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d22f5e81000, cur 1591367788 expire 1591367638 last 1591367561 Jun 05 07:36:28 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 05 07:40:27 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.34@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 07:40:27 fir-md1-s2 kernel: LustreError: Skipped 26 previous similar messages Jun 05 07:44:20 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 05 07:44:20 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Jun 05 07:46:51 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 05 07:46:51 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 05 07:50:29 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.34@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 07:50:29 fir-md1-s2 kernel: LustreError: Skipped 21 previous similar messages Jun 05 07:50:38 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d02b0f7f400, cur 1591368638 expire 1591368488 last 1591368411 Jun 05 07:50:38 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 05 07:54:22 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 05 07:54:22 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Jun 05 07:56:53 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 05 07:56:53 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 05 08:00:34 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.8.19@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 08:00:34 fir-md1-s2 kernel: LustreError: Skipped 23 previous similar messages Jun 05 08:04:29 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d047962d800, cur 1591369469 expire 1591369319 last 1591369242 Jun 05 08:04:29 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 05 08:07:23 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) reconnecting Jun 05 08:07:23 fir-md1-s2 kernel: Lustre: Skipped 6 previous similar messages Jun 05 08:07:23 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 05 08:07:23 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 05 08:10:44 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.4.16@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 08:10:44 fir-md1-s2 kernel: LustreError: Skipped 23 previous similar messages Jun 05 08:17:25 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) reconnecting Jun 05 08:17:25 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Jun 05 08:17:25 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 05 08:17:25 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages Jun 05 08:18:39 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d2300e6a800, cur 1591370319 expire 1591370169 last 1591370092 Jun 05 08:18:39 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 05 08:21:36 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.47@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 08:21:36 fir-md1-s2 kernel: LustreError: Skipped 21 previous similar messages Jun 05 08:28:15 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 05 08:28:15 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 05 08:28:15 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 05 08:28:15 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 05 08:28:41 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d147948a800, cur 1591370921 expire 1591370771 last 1591370694 Jun 05 08:28:41 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Jun 05 08:31:53 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.34@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 08:31:53 fir-md1-s2 kernel: LustreError: Skipped 20 previous similar messages Jun 05 08:38:17 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 05 08:38:17 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 05 08:40:22 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 05 08:40:22 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Jun 05 08:41:42 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d099835d400, cur 1591371702 expire 1591371552 last 1591371475 Jun 05 08:41:42 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 05 08:42:06 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.47@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 08:42:06 fir-md1-s2 kernel: LustreError: Skipped 17 previous similar messages Jun 05 08:49:37 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 05 08:49:37 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages Jun 05 08:50:24 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) reconnecting Jun 05 08:50:24 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Jun 05 08:52:22 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.34@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 08:52:22 fir-md1-s2 kernel: LustreError: Skipped 17 previous similar messages Jun 05 08:56:42 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 232937e8-bc43-4 (at 10.49.27.31@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d1187bb8800, cur 1591372602 expire 1591372452 last 1591372375 Jun 05 08:56:42 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 05 09:00:01 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 05 09:00:01 fir-md1-s2 kernel: Lustre: Skipped 11 previous similar messages Jun 05 09:02:24 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.3.34@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 09:02:24 fir-md1-s2 kernel: LustreError: Skipped 19 previous similar messages Jun 05 09:03:25 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) reconnecting Jun 05 09:03:25 fir-md1-s2 kernel: Lustre: Skipped 6 previous similar messages Jun 05 09:06:47 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 5bccfc24-b020-4 (at 10.50.3.47@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8cf44bfa8000, cur 1591373207 expire 1591373057 last 1591372980 Jun 05 09:06:47 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 05 09:10:57 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.17@o2ib1) Jun 05 09:10:57 fir-md1-s2 kernel: Lustre: Skipped 6 previous similar messages Jun 05 09:13:20 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.8.19@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 09:13:20 fir-md1-s2 kernel: LustreError: Skipped 10 previous similar messages Jun 05 09:13:27 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client efcc39a6-2179-4 (at 10.49.23.17@o2ib1) reconnecting Jun 05 09:13:27 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Jun 05 09:17:27 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 8cbe46f9-727e-4 (at 10.50.4.16@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d092a3c5400, cur 1591373847 expire 1591373697 last 1591373620 Jun 05 09:17:27 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages Jun 05 09:17:49 fir-md1-s2 kernel: Lustre: 23943:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1591373862/real 1591373862] req@ffff8d0805f93600 x1668533450699328/t0(0) o104->fir-MDT0001@10.50.10.63@o2ib2:15/16 lens 296/224 e 0 to 1 dl 1591373869 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jun 05 09:17:49 fir-md1-s2 kernel: Lustre: 23943:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jun 05 09:17:51 fir-md1-s2 kernel: Lustre: 23996:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1591373864/real 1591373864] req@ffff8d0766330d80 x1668533450731840/t0(0) o104->fir-MDT0001@10.50.10.63@o2ib2:15/16 lens 296/224 e 0 to 1 dl 1591373871 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jun 05 09:17:56 fir-md1-s2 kernel: Lustre: 24060:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1591373869/real 1591373869] req@ffff8cec0060d100 x1668533450699264/t0(0) o104->fir-MDT0001@10.50.10.63@o2ib2:15/16 lens 296/224 e 0 to 1 dl 1591373876 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jun 05 09:17:58 fir-md1-s2 kernel: Lustre: 23996:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1591373871/real 1591373871] req@ffff8d0766330d80 x1668533450731840/t0(0) o104->fir-MDT0001@10.50.10.63@o2ib2:15/16 lens 296/224 e 0 to 1 dl 1591373878 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jun 05 09:17:58 fir-md1-s2 kernel: Lustre: 23996:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jun 05 09:18:03 fir-md1-s2 kernel: Lustre: 24060:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1591373876/real 1591373876] req@ffff8cec0060d100 x1668533450699264/t0(0) o104->fir-MDT0001@10.50.10.63@o2ib2:15/16 lens 296/224 e 0 to 1 dl 1591373883 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jun 05 09:18:03 fir-md1-s2 kernel: Lustre: 24060:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jun 05 09:18:12 fir-md1-s2 kernel: Lustre: 23996:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1591373885/real 1591373885] req@ffff8d0766330d80 x1668533450731840/t0(0) o104->fir-MDT0001@10.50.10.63@o2ib2:15/16 lens 296/224 e 0 to 1 dl 1591373892 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jun 05 09:18:12 fir-md1-s2 kernel: Lustre: 23996:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 3 previous similar messages Jun 05 09:18:31 fir-md1-s2 kernel: Lustre: 23943:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1591373904/real 1591373904] req@ffff8d0805f93600 x1668533450699328/t0(0) o104->fir-MDT0001@10.50.10.63@o2ib2:15/16 lens 296/224 e 0 to 1 dl 1591373911 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jun 05 09:18:31 fir-md1-s2 kernel: Lustre: 23943:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 6 previous similar messages Jun 05 09:19:04 fir-md1-s2 kernel: Lustre: 22036:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1591373937/real 1591373937] req@ffff8d01a62f0900 x1668533451972928/t0(0) o104->fir-MDT0001@10.50.10.64@o2ib2:15/16 lens 296/224 e 0 to 1 dl 1591373944 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jun 05 09:19:04 fir-md1-s2 kernel: Lustre: 22036:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 14 previous similar messages Jun 05 09:19:27 fir-md1-s2 kernel: LustreError: 24060:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) ### client (nid 10.50.10.63@o2ib2) failed to reply to blocking AST (req@ffff8cec0060d100 x1668533450699264 status 0 rc -110), evict it ns: mdt-fir-MDT0001_UUID lock: ffff8d21d9332880/0x1587f60376bbdb99 lrc: 4/0,0 mode: PR/PR res: [0x240059289:0x1d12:0x0].0x0 bits 0x1b/0x0 rrc: 13 type: IBT flags: 0x60200400000020 nid: 10.50.10.63@o2ib2 remote: 0x2be61d30de31888 expref: 159897 pid: 23711 timeout: 139962 lvb_type: 0 Jun 05 09:19:27 fir-md1-s2 kernel: LustreError: 138-a: fir-MDT0001: A client on nid 10.50.10.63@o2ib2 was evicted due to a lock blocking callback time out: rc -110 Jun 05 09:19:27 fir-md1-s2 kernel: LustreError: 20543:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 105s: evicting client at 10.50.10.63@o2ib2 ns: mdt-fir-MDT0001_UUID lock: ffff8d21ed7fc800/0x1587f60376bbb6d9 lrc: 3/0,0 mode: PR/PR res: [0x2400591b9:0x17d71:0x0].0x0 bits 0x1b/0x0 rrc: 13 type: IBT flags: 0x60200400000020 nid: 10.50.10.63@o2ib2 remote: 0x2be61d30de31731 expref: 159898 pid: 23711 timeout: 0 lvb_type: 0 Jun 05 09:19:27 fir-md1-s2 kernel: LustreError: 24060:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) Skipped 10 previous similar messages Jun 05 09:20:08 fir-md1-s2 kernel: Lustre: 23908:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1591374001/real 1591374001] req@ffff8ce74bfbd100 x1668533452247168/t0(0) o104->fir-MDT0001@10.50.10.64@o2ib2:15/16 lens 296/224 e 0 to 1 dl 1591374008 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jun 05 09:20:08 fir-md1-s2 kernel: Lustre: 23908:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 144 previous similar messages Jun 05 09:20:23 fir-md1-s2 kernel: LustreError: 23992:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) ### client (nid 10.50.10.63@o2ib2) returned error from blocking AST (req@ffff8d01a62fc800 x1668533453681600 status -107 rc -107), evict it ns: mdt-fir-MDT0001_UUID lock: ffff8d0763bc3180/0x1587f603773051e9 lrc: 4/0,0 mode: PR/PR res: [0x240056dc3:0x162e1:0x0].0x0 bits 0x1b/0x0 rrc: 17 type: IBT flags: 0x60200400000020 nid: 10.50.10.63@o2ib2 remote: 0x2be61d30de34f77 expref: 26222 pid: 23965 timeout: 140025 lvb_type: 0 Jun 05 09:20:23 fir-md1-s2 kernel: LustreError: 138-a: fir-MDT0001: A client on nid 10.50.10.63@o2ib2 was evicted due to a lock blocking callback time out: rc -107 Jun 05 09:20:23 fir-md1-s2 kernel: LustreError: Skipped 10 previous similar messages Jun 05 09:20:23 fir-md1-s2 kernel: LustreError: 20543:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 0s: evicting client at 10.50.10.63@o2ib2 ns: mdt-fir-MDT0001_UUID lock: ffff8d0763bc3180/0x1587f603773051e9 lrc: 3/0,0 mode: PR/PR res: [0x240056dc3:0x162e1:0x0].0x0 bits 0x1b/0x0 rrc: 17 type: IBT flags: 0x60200400000020 nid: 10.50.10.63@o2ib2 remote: 0x2be61d30de34f77 expref: 26149 pid: 23965 timeout: 0 lvb_type: 0 Jun 05 09:20:31 fir-md1-s2 kernel: LustreError: 23654:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) ### client (nid 10.50.10.64@o2ib2) failed to reply to blocking AST (req@ffff8ce8643a0000 x1668533452296192 status 0 rc -5), evict it ns: mdt-fir-MDT0001_UUID lock: ffff8d0788e3c140/0x1587f60377310366 lrc: 4/0,0 mode: PR/PR res: [0x2400592aa:0xafea:0x0].0x0 bits 0x1b/0x0 rrc: 16 type: IBT flags: 0x60200400000020 nid: 10.50.10.64@o2ib2 remote: 0xb24f566e66b36bd8 expref: 145146 pid: 24042 timeout: 140027 lvb_type: 0 Jun 05 09:20:31 fir-md1-s2 kernel: LustreError: 24052:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) ### client (nid 10.50.10.64@o2ib2) failed to reply to blocking AST (req@ffff8ce6d8b92d00 x1668533452307200 status 0 rc -5), evict it ns: mdt-fir-MDT0001_UUID lock: ffff8d21effd9d40/0x1587f60377333e1e lrc: 4/0,0 mode: PR/PR res: [0x2400592aa:0xaff9:0x0].0x0 bits 0x1b/0x0 rrc: 16 type: IBT flags: 0x60200400000020 nid: 10.50.10.64@o2ib2 remote: 0xb24f566e66b36eb7 expref: 145146 pid: 23993 timeout: 140027 lvb_type: 0 Jun 05 09:20:31 fir-md1-s2 kernel: LustreError: 24052:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) Skipped 3 previous similar messages Jun 05 09:20:31 fir-md1-s2 kernel: LustreError: 138-a: fir-MDT0001: A client on nid 10.50.10.64@o2ib2 was evicted due to a lock blocking callback time out: rc -5 Jun 05 09:20:31 fir-md1-s2 kernel: LustreError: Skipped 4 previous similar messages Jun 05 09:20:31 fir-md1-s2 kernel: LustreError: 20543:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 94s: evicting client at 10.50.10.64@o2ib2 ns: mdt-fir-MDT0001_UUID lock: ffff8cf314b2de80/0x1587f603772c5118 lrc: 3/0,0 mode: PR/PR res: [0x240056dc3:0x162ce:0x0].0x0 bits 0x1b/0x0 rrc: 17 type: IBT flags: 0x60200400000020 nid: 10.50.10.64@o2ib2 remote: 0xb24f566e66b36890 expref: 145147 pid: 22036 timeout: 0 lvb_type: 0 Jun 05 09:20:31 fir-md1-s2 kernel: LustreError: 20543:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 3 previous similar messages Jun 05 09:20:31 fir-md1-s2 kernel: LustreError: 23654:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) Skipped 10 previous similar messages Jun 05 09:22:16 fir-md1-s2 kernel: Lustre: 20937:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1591374129/real 1591374129] req@ffff8cec13302400 x1668533454212032/t0(0) o104->fir-MDT0001@10.50.10.64@o2ib2:15/16 lens 296/224 e 0 to 1 dl 1591374136 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jun 05 09:22:16 fir-md1-s2 kernel: Lustre: 20937:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 204 previous similar messages Jun 05 09:24:03 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.49.23.15@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 09:24:03 fir-md1-s2 kernel: LustreError: Skipped 9 previous similar messages Jun 05 09:35:20 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.49.23.15@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 09:35:20 fir-md1-s2 kernel: LustreError: Skipped 10 previous similar messages Jun 05 09:39:57 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 79af667d-8adb-4 (at 10.50.8.18@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8cf44b42d000, cur 1591375197 expire 1591375047 last 1591374970 Jun 05 09:39:57 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 05 09:40:50 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.50.8.18@o2ib2) Jun 05 09:40:50 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages Jun 05 09:43:45 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.50.12.9@o2ib2) Jun 05 09:43:45 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Jun 05 09:45:31 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.8.19@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 09:45:31 fir-md1-s2 kernel: LustreError: Skipped 8 previous similar messages Jun 05 09:49:03 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.31@o2ib1) Jun 05 09:49:03 fir-md1-s2 kernel: Lustre: Skipped 2 previous similar messages Jun 05 09:56:04 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.50.4.16@o2ib2) Jun 05 09:56:04 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Jun 05 09:57:05 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.49.23.15@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 09:57:05 fir-md1-s2 kernel: LustreError: Skipped 10 previous similar messages Jun 05 10:07:32 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.49.23.15@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 10:07:32 fir-md1-s2 kernel: LustreError: Skipped 9 previous similar messages Jun 05 10:17:34 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.49.23.15@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 10:17:34 fir-md1-s2 kernel: LustreError: Skipped 8 previous similar messages Jun 05 10:29:17 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.49.23.15@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 10:29:17 fir-md1-s2 kernel: LustreError: Skipped 10 previous similar messages Jun 05 10:39:53 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.8.19@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 10:39:53 fir-md1-s2 kernel: LustreError: Skipped 8 previous similar messages Jun 05 10:48:08 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 41d1e2dc-df8a-4 (at 10.50.13.10@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8cf44b798c00, cur 1591379288 expire 1591379138 last 1591379061 Jun 05 10:51:10 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.8.19@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 10:51:10 fir-md1-s2 kernel: LustreError: Skipped 10 previous similar messages Jun 05 11:01:29 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.49.23.15@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 11:01:29 fir-md1-s2 kernel: LustreError: Skipped 9 previous similar messages Jun 05 11:04:09 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.50.4.16@o2ib2) Jun 05 11:04:09 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Jun 05 11:12:55 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.8.19@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 11:12:55 fir-md1-s2 kernel: LustreError: Skipped 9 previous similar messages Jun 05 11:23:22 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.8.19@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 11:23:22 fir-md1-s2 kernel: LustreError: Skipped 9 previous similar messages Jun 05 11:33:24 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.8.19@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 11:33:24 fir-md1-s2 kernel: LustreError: Skipped 8 previous similar messages Jun 05 11:44:08 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.49.23.15@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 11:44:08 fir-md1-s2 kernel: LustreError: Skipped 9 previous similar messages Jun 05 11:55:50 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.49.23.15@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 11:55:50 fir-md1-s2 kernel: LustreError: Skipped 10 previous similar messages Jun 05 12:07:07 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.49.23.15@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 12:07:07 fir-md1-s2 kernel: LustreError: Skipped 9 previous similar messages Jun 05 12:17:18 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.8.19@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 12:17:18 fir-md1-s2 kernel: LustreError: Skipped 9 previous similar messages Jun 05 12:28:02 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.49.23.15@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 12:28:02 fir-md1-s2 kernel: LustreError: Skipped 9 previous similar messages Jun 05 12:39:03 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.8.19@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 12:39:03 fir-md1-s2 kernel: LustreError: Skipped 9 previous similar messages Jun 05 12:49:46 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.49.23.15@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 12:49:46 fir-md1-s2 kernel: LustreError: Skipped 9 previous similar messages Jun 05 12:59:57 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.8.19@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 12:59:57 fir-md1-s2 kernel: LustreError: Skipped 8 previous similar messages Jun 05 13:10:41 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.49.23.15@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 13:10:41 fir-md1-s2 kernel: LustreError: Skipped 9 previous similar messages Jun 05 13:21:58 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.49.23.15@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 13:21:58 fir-md1-s2 kernel: LustreError: Skipped 10 previous similar messages Jun 05 13:32:09 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.8.19@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 13:32:09 fir-md1-s2 kernel: LustreError: Skipped 8 previous similar messages Jun 05 13:43:30 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.50.4.65@o2ib2) Jun 05 13:43:43 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.49.23.15@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 13:43:43 fir-md1-s2 kernel: LustreError: Skipped 10 previous similar messages Jun 05 13:45:09 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.50.9.37@o2ib2) Jun 05 13:45:10 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 52de0f47-9c11-4 (at 10.50.9.37@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8cf44c313400, cur 1591389910 expire 1591389760 last 1591389683 Jun 05 13:54:10 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.49.23.15@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 13:54:10 fir-md1-s2 kernel: LustreError: Skipped 9 previous similar messages Jun 05 14:04:12 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.49.23.15@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 14:04:12 fir-md1-s2 kernel: LustreError: Skipped 8 previous similar messages Jun 05 14:08:59 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.50.12.14@o2ib2) Jun 05 14:09:46 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.50.13.10@o2ib2) Jun 05 14:14:24 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client cd9bae51-00a4-4 (at 10.50.8.19@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8cf44bbb7000, cur 1591391664 expire 1591391514 last 1591391437 Jun 05 14:14:24 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Jun 05 14:14:53 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.50.8.19@o2ib2) Jun 05 14:15:55 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.49.23.15@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 14:15:55 fir-md1-s2 kernel: LustreError: Skipped 8 previous similar messages Jun 05 14:20:52 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.27@o2ib1) Jun 05 14:27:12 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.49.23.15@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 14:27:12 fir-md1-s2 kernel: LustreError: Skipped 4 previous similar messages Jun 05 14:33:06 fir-md1-s2 kernel: LNet: Service thread pid 23945 was inactive for 200.66s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jun 05 14:33:06 fir-md1-s2 kernel: Pid: 23945, comm: mdt03_081 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 SMP Thu Nov 7 15:26:16 PST 2019 Jun 05 14:33:06 fir-md1-s2 kernel: Call Trace: Jun 05 14:33:06 fir-md1-s2 kernel: [] ldlm_completion_ast+0x430/0x860 [ptlrpc] Jun 05 14:33:06 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] Jun 05 14:33:06 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Jun 05 14:33:06 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x360 [mdt] Jun 05 14:33:06 fir-md1-s2 kernel: [] mdt_reint_object_lock+0x2c/0x60 [mdt] Jun 05 14:33:06 fir-md1-s2 kernel: [] mdt_layout_change+0x20b/0x480 [mdt] Jun 05 14:33:06 fir-md1-s2 kernel: [] mdt_intent_layout+0x8a0/0xe00 [mdt] Jun 05 14:33:06 fir-md1-s2 kernel: [] mdt_intent_policy+0x435/0xd80 [mdt] Jun 05 14:33:06 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x356/0xa20 [ptlrpc] Jun 05 14:33:06 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa56/0x15f0 [ptlrpc] Jun 05 14:33:06 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Jun 05 14:33:06 fir-md1-s2 kernel: [] tgt_request_handle+0xada/0x1570 [ptlrpc] Jun 05 14:33:06 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jun 05 14:33:06 fir-md1-s2 kernel: [] ptlrpc_main+0xb34/0x1470 [ptlrpc] Jun 05 14:33:06 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Jun 05 14:33:06 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jun 05 14:33:06 fir-md1-s2 kernel: [] 0xffffffffffffffff Jun 05 14:33:06 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1591392786.23945 Jun 05 14:33:06 fir-md1-s2 kernel: Pid: 23763, comm: mdt03_042 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 SMP Thu Nov 7 15:26:16 PST 2019 Jun 05 14:33:06 fir-md1-s2 kernel: Call Trace: Jun 05 14:33:06 fir-md1-s2 kernel: [] ldlm_completion_ast+0x430/0x860 [ptlrpc] Jun 05 14:33:06 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] Jun 05 14:33:06 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Jun 05 14:33:06 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x360 [mdt] Jun 05 14:33:06 fir-md1-s2 kernel: [] mdt_reint_object_lock+0x2c/0x60 [mdt] Jun 05 14:33:06 fir-md1-s2 kernel: [] mdt_layout_change+0x20b/0x480 [mdt] Jun 05 14:33:06 fir-md1-s2 kernel: [] mdt_intent_layout+0x8a0/0xe00 [mdt] Jun 05 14:33:06 fir-md1-s2 kernel: [] mdt_intent_policy+0x435/0xd80 [mdt] Jun 05 14:33:06 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x356/0xa20 [ptlrpc] Jun 05 14:33:06 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa56/0x15f0 [ptlrpc] Jun 05 14:33:06 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Jun 05 14:33:06 fir-md1-s2 kernel: [] tgt_request_handle+0xada/0x1570 [ptlrpc] Jun 05 14:33:06 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jun 05 14:33:06 fir-md1-s2 kernel: [] ptlrpc_main+0xb34/0x1470 [ptlrpc] Jun 05 14:33:06 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Jun 05 14:33:06 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jun 05 14:33:06 fir-md1-s2 kernel: [] 0xffffffffffffffff Jun 05 14:33:06 fir-md1-s2 kernel: Pid: 23770, comm: mdt03_044 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 SMP Thu Nov 7 15:26:16 PST 2019 Jun 05 14:33:06 fir-md1-s2 kernel: Call Trace: Jun 05 14:33:06 fir-md1-s2 kernel: [] ldlm_completion_ast+0x430/0x860 [ptlrpc] Jun 05 14:33:06 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] Jun 05 14:33:06 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Jun 05 14:33:06 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x360 [mdt] Jun 05 14:33:06 fir-md1-s2 kernel: [] mdt_reint_object_lock+0x2c/0x60 [mdt] Jun 05 14:33:06 fir-md1-s2 kernel: [] mdt_layout_change+0x20b/0x480 [mdt] Jun 05 14:33:06 fir-md1-s2 kernel: [] mdt_intent_layout+0x8a0/0xe00 [mdt] Jun 05 14:33:06 fir-md1-s2 kernel: [] mdt_intent_policy+0x435/0xd80 [mdt] Jun 05 14:33:06 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x356/0xa20 [ptlrpc] Jun 05 14:33:06 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa56/0x15f0 [ptlrpc] Jun 05 14:33:06 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Jun 05 14:33:06 fir-md1-s2 kernel: [] tgt_request_handle+0xada/0x1570 [ptlrpc] Jun 05 14:33:06 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jun 05 14:33:06 fir-md1-s2 kernel: [] ptlrpc_main+0xb34/0x1470 [ptlrpc] Jun 05 14:33:06 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Jun 05 14:33:06 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jun 05 14:33:06 fir-md1-s2 kernel: [] 0xffffffffffffffff Jun 05 14:33:06 fir-md1-s2 kernel: Pid: 24002, comm: mdt03_095 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 SMP Thu Nov 7 15:26:16 PST 2019 Jun 05 14:33:06 fir-md1-s2 kernel: Call Trace: Jun 05 14:33:06 fir-md1-s2 kernel: [] ldlm_completion_ast+0x430/0x860 [ptlrpc] Jun 05 14:33:07 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] Jun 05 14:33:07 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Jun 05 14:33:07 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x360 [mdt] Jun 05 14:33:07 fir-md1-s2 kernel: [] mdt_reint_object_lock+0x2c/0x60 [mdt] Jun 05 14:33:07 fir-md1-s2 kernel: [] mdt_layout_change+0x20b/0x480 [mdt] Jun 05 14:33:07 fir-md1-s2 kernel: [] mdt_intent_layout+0x8a0/0xe00 [mdt] Jun 05 14:33:07 fir-md1-s2 kernel: [] mdt_intent_policy+0x435/0xd80 [mdt] Jun 05 14:33:07 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x356/0xa20 [ptlrpc] Jun 05 14:33:07 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa56/0x15f0 [ptlrpc] Jun 05 14:33:07 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Jun 05 14:33:07 fir-md1-s2 kernel: [] tgt_request_handle+0xada/0x1570 [ptlrpc] Jun 05 14:33:07 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jun 05 14:33:07 fir-md1-s2 kernel: [] ptlrpc_main+0xb34/0x1470 [ptlrpc] Jun 05 14:33:07 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Jun 05 14:33:07 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jun 05 14:33:07 fir-md1-s2 kernel: [] 0xffffffffffffffff Jun 05 14:33:07 fir-md1-s2 kernel: LNet: Service thread pid 23636 was inactive for 201.19s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jun 05 14:33:07 fir-md1-s2 kernel: LNet: Skipped 3 previous similar messages Jun 05 14:33:07 fir-md1-s2 kernel: Pid: 23636, comm: mdt01_019 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 SMP Thu Nov 7 15:26:16 PST 2019 Jun 05 14:33:07 fir-md1-s2 kernel: Call Trace: Jun 05 14:33:07 fir-md1-s2 kernel: [] ldlm_completion_ast+0x430/0x860 [ptlrpc] Jun 05 14:33:07 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] Jun 05 14:33:07 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Jun 05 14:33:07 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x360 [mdt] Jun 05 14:33:07 fir-md1-s2 kernel: [] mdt_reint_object_lock+0x2c/0x60 [mdt] Jun 05 14:33:07 fir-md1-s2 kernel: [] mdt_layout_change+0x20b/0x480 [mdt] Jun 05 14:33:07 fir-md1-s2 kernel: [] mdt_intent_layout+0x8a0/0xe00 [mdt] Jun 05 14:33:07 fir-md1-s2 kernel: [] mdt_intent_policy+0x435/0xd80 [mdt] Jun 05 14:33:07 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x356/0xa20 [ptlrpc] Jun 05 14:33:07 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa56/0x15f0 [ptlrpc] Jun 05 14:33:07 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Jun 05 14:33:07 fir-md1-s2 kernel: [] tgt_request_handle+0xada/0x1570 [ptlrpc] Jun 05 14:33:07 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jun 05 14:33:07 fir-md1-s2 kernel: [] ptlrpc_main+0xb34/0x1470 [ptlrpc] Jun 05 14:33:07 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Jun 05 14:33:07 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jun 05 14:33:07 fir-md1-s2 kernel: [] 0xffffffffffffffff Jun 05 14:33:07 fir-md1-s2 kernel: LNet: Service thread pid 23919 was inactive for 201.35s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Jun 05 14:34:45 fir-md1-s2 kernel: LustreError: 23987:0:(ldlm_request.c:130:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1591392585, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0001_UUID lock: ffff8d0e8ffe1440/0x1587f604687bc2df lrc: 3/0,1 mode: --/EX res: [0x24003e244:0x38b4:0x0].0x0 bits 0x8/0x0 rrc: 92 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 23987 timeout: 0 lvb_type: 0 Jun 05 14:34:45 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1591392885.23919 Jun 05 14:34:45 fir-md1-s2 kernel: LustreError: 23987:0:(ldlm_request.c:130:ldlm_expired_completion_wait()) Skipped 40 previous similar messages Jun 05 14:34:45 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 05bde955-e83a-4 (at 10.50.1.51@o2ib2) reconnecting Jun 05 14:34:45 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.50.1.53@o2ib2) Jun 05 14:34:45 fir-md1-s2 kernel: Lustre: Skipped 2 previous similar messages Jun 05 14:38:04 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.49.23.15@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 14:38:04 fir-md1-s2 kernel: LustreError: Skipped 4 previous similar messages Jun 05 14:39:46 fir-md1-s2 kernel: Lustre: 23950:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-1), not sending early reply req@ffff8d0468feb180 x1659675613835456/t0(0) o101->8d039ea4-d34b-4@10.50.1.56@o2ib2:416/0 lens 376/1600 e 12 to 0 dl 1591393191 ref 2 fl Interpret:/0/0 rc 0/0 Jun 05 14:39:46 fir-md1-s2 kernel: Lustre: 23950:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 11 previous similar messages Jun 05 14:39:47 fir-md1-s2 kernel: Lustre: 20556:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-2), not sending early reply req@ffff8d09a7f7ec00 x1659566700454592/t0(0) o101->b1944413-b28e-4@10.50.1.52@o2ib2:417/0 lens 376/1600 e 12 to 0 dl 1591393192 ref 2 fl Interpret:/0/0 rc 0/0 Jun 05 14:39:47 fir-md1-s2 kernel: Lustre: 20556:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 16 previous similar messages Jun 05 14:39:49 fir-md1-s2 kernel: Lustre: 23760:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-3), not sending early reply req@ffff8d2171322d00 x1659566700454400/t0(0) o101->b1944413-b28e-4@10.50.1.52@o2ib2:418/0 lens 376/1600 e 12 to 0 dl 1591393193 ref 2 fl Interpret:/0/0 rc 0/0 Jun 05 14:39:49 fir-md1-s2 kernel: Lustre: 23760:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 11 previous similar messages Jun 05 14:39:52 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 15368feb-5478-4 (at 10.50.1.53@o2ib2) reconnecting Jun 05 14:39:52 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages Jun 05 14:39:52 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.50.1.55@o2ib2) Jun 05 14:39:52 fir-md1-s2 kernel: Lustre: Skipped 6 previous similar messages Jun 05 14:45:00 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 05bde955-e83a-4 (at 10.50.1.51@o2ib2) reconnecting Jun 05 14:45:00 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.50.1.54@o2ib2) Jun 05 14:45:00 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 05 14:45:00 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Jun 05 14:45:15 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.27@o2ib1) Jun 05 14:45:15 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 05 14:45:46 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 0f444f62-1c09-4 (at 10.49.27.27@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d0936f7e800, cur 1591393546 expire 1591393396 last 1591393319 Jun 05 14:48:06 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.49.23.15@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 14:48:06 fir-md1-s2 kernel: LustreError: Skipped 4 previous similar messages Jun 05 14:50:07 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client a72a2280-4c81-4 (at 10.50.1.54@o2ib2) reconnecting Jun 05 14:50:07 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.50.1.56@o2ib2) Jun 05 14:50:07 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Jun 05 14:55:14 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.50.1.56@o2ib2) Jun 05 14:55:14 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 05 14:57:12 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client f61c46d3-6738-4 (at 10.49.27.27@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d02cd3da800, cur 1591394232 expire 1591394082 last 1591394005 Jun 05 15:00:14 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.49.23.15@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 15:00:14 fir-md1-s2 kernel: LustreError: Skipped 4 previous similar messages Jun 05 15:00:21 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 15368feb-5478-4 (at 10.50.1.53@o2ib2) reconnecting Jun 05 15:00:21 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.50.1.56@o2ib2) Jun 05 15:00:21 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 05 15:00:21 fir-md1-s2 kernel: Lustre: Skipped 12 previous similar messages Jun 05 15:05:28 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.50.1.56@o2ib2) Jun 05 15:05:28 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 05 15:06:16 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 7b1fbed1-770f-4 (at 10.49.27.27@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d02b5f6a800, cur 1591394776 expire 1591394626 last 1591394549 Jun 05 15:10:35 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client a72a2280-4c81-4 (at 10.50.1.54@o2ib2) reconnecting Jun 05 15:10:35 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.50.1.51@o2ib2) Jun 05 15:10:35 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Jun 05 15:10:35 fir-md1-s2 kernel: Lustre: Skipped 11 previous similar messages Jun 05 15:11:56 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.49.23.15@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 15:11:56 fir-md1-s2 kernel: LustreError: Skipped 5 previous similar messages Jun 05 15:20:49 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 8d039ea4-d34b-4 (at 10.50.1.56@o2ib2) reconnecting Jun 05 15:20:49 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages Jun 05 15:20:49 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.50.1.53@o2ib2) Jun 05 15:20:49 fir-md1-s2 kernel: Lustre: Skipped 11 previous similar messages Jun 05 15:23:14 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.49.23.15@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 15:23:14 fir-md1-s2 kernel: LustreError: Skipped 4 previous similar messages Jun 05 15:27:22 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client e149fb1f-7c26-4 (at 10.50.14.3@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8cf44b68a400, cur 1591396042 expire 1591395892 last 1591395815 Jun 05 15:31:03 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 15368feb-5478-4 (at 10.50.1.53@o2ib2) reconnecting Jun 05 15:31:03 fir-md1-s2 kernel: Lustre: Skipped 10 previous similar messages Jun 05 15:31:03 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.50.1.53@o2ib2) Jun 05 15:31:03 fir-md1-s2 kernel: Lustre: Skipped 12 previous similar messages Jun 05 15:34:06 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.49.23.15@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 15:34:06 fir-md1-s2 kernel: LustreError: Skipped 4 previous similar messages Jun 05 15:41:17 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 15368feb-5478-4 (at 10.50.1.53@o2ib2) reconnecting Jun 05 15:41:17 fir-md1-s2 kernel: Lustre: Skipped 11 previous similar messages Jun 05 15:41:17 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.50.1.53@o2ib2) Jun 05 15:41:17 fir-md1-s2 kernel: Lustre: Skipped 12 previous similar messages Jun 05 15:41:37 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 14a0f6f9-d5db-4 (at 10.50.14.3@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8ce7d46fa800, cur 1591396897 expire 1591396747 last 1591396670 Jun 05 15:44:08 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.49.23.15@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 15:44:08 fir-md1-s2 kernel: LustreError: Skipped 4 previous similar messages Jun 05 15:44:43 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client cfb768b3-3475-4 (at 10.49.27.27@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d0935350000, cur 1591397083 expire 1591396933 last 1591396856 Jun 05 15:51:31 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 15368feb-5478-4 (at 10.50.1.53@o2ib2) reconnecting Jun 05 15:51:31 fir-md1-s2 kernel: Lustre: Skipped 11 previous similar messages Jun 05 15:51:31 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.50.1.53@o2ib2) Jun 05 15:51:31 fir-md1-s2 kernel: Lustre: Skipped 12 previous similar messages Jun 05 15:56:16 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.49.23.15@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 15:56:16 fir-md1-s2 kernel: LustreError: Skipped 4 previous similar messages Jun 05 15:57:35 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 412ca044-9605-4 (at 10.50.14.3@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8ce880727400, cur 1591397855 expire 1591397705 last 1591397628 Jun 05 16:01:45 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 15368feb-5478-4 (at 10.50.1.53@o2ib2) reconnecting Jun 05 16:01:45 fir-md1-s2 kernel: Lustre: Skipped 11 previous similar messages Jun 05 16:01:45 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.50.1.53@o2ib2) Jun 05 16:01:45 fir-md1-s2 kernel: Lustre: Skipped 12 previous similar messages Jun 05 16:07:58 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.49.23.15@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 16:07:58 fir-md1-s2 kernel: LustreError: Skipped 5 previous similar messages Jun 05 16:09:39 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client e5d6c16d-39e6-4 (at 10.50.14.3@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d032fa17800, cur 1591398579 expire 1591398429 last 1591398352 Jun 05 16:11:59 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 15368feb-5478-4 (at 10.50.1.53@o2ib2) reconnecting Jun 05 16:11:59 fir-md1-s2 kernel: Lustre: Skipped 11 previous similar messages Jun 05 16:11:59 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.50.1.53@o2ib2) Jun 05 16:11:59 fir-md1-s2 kernel: Lustre: Skipped 12 previous similar messages Jun 05 16:19:16 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.49.23.15@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 16:19:16 fir-md1-s2 kernel: LustreError: Skipped 4 previous similar messages Jun 05 16:22:13 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 15368feb-5478-4 (at 10.50.1.53@o2ib2) reconnecting Jun 05 16:22:13 fir-md1-s2 kernel: Lustre: Skipped 11 previous similar messages Jun 05 16:22:13 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.50.1.53@o2ib2) Jun 05 16:22:13 fir-md1-s2 kernel: Lustre: Skipped 11 previous similar messages Jun 05 16:30:08 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.49.23.15@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 16:30:08 fir-md1-s2 kernel: LustreError: Skipped 4 previous similar messages Jun 05 16:32:27 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 15368feb-5478-4 (at 10.50.1.53@o2ib2) reconnecting Jun 05 16:32:27 fir-md1-s2 kernel: Lustre: Skipped 11 previous similar messages Jun 05 16:32:27 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.50.1.53@o2ib2) Jun 05 16:32:27 fir-md1-s2 kernel: Lustre: Skipped 11 previous similar messages Jun 05 16:35:10 fir-md1-s2 kernel: LustreError: 20543:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.49.23.15@o2ib1 ns: mdt-fir-MDT0001_UUID lock: ffff8d024b70b180/0x1587f604549222f9 lrc: 3/0,0 mode: PR/PR res: [0x240000406:0x138:0x0].0x0 bits 0x13/0x0 rrc: 601 type: IBT flags: 0x60200400000020 nid: 10.49.23.15@o2ib1 remote: 0x633e163c38ffc6c8 expref: 16 pid: 23987 timeout: 166013 lvb_type: 0 Jun 05 16:35:10 fir-md1-s2 kernel: LustreError: 20543:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 5 previous similar messages Jun 05 16:40:10 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.49.23.15@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 16:40:10 fir-md1-s2 kernel: LustreError: Skipped 4 previous similar messages Jun 05 16:42:41 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 15368feb-5478-4 (at 10.50.1.53@o2ib2) reconnecting Jun 05 16:42:41 fir-md1-s2 kernel: Lustre: Skipped 11 previous similar messages Jun 05 16:42:41 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.50.1.53@o2ib2) Jun 05 16:42:41 fir-md1-s2 kernel: Lustre: Skipped 12 previous similar messages Jun 05 16:52:17 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.49.23.15@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 16:52:17 fir-md1-s2 kernel: LustreError: Skipped 4 previous similar messages Jun 05 16:52:55 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 15368feb-5478-4 (at 10.50.1.53@o2ib2) reconnecting Jun 05 16:52:55 fir-md1-s2 kernel: Lustre: Skipped 11 previous similar messages Jun 05 16:52:55 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.50.1.53@o2ib2) Jun 05 16:52:55 fir-md1-s2 kernel: Lustre: Skipped 11 previous similar messages Jun 05 16:55:19 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 448cf5ef-d74b-4 (at 10.50.7.9@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8cf44beb4c00, cur 1591401319 expire 1591401169 last 1591401092 Jun 05 17:03:09 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 15368feb-5478-4 (at 10.50.1.53@o2ib2) reconnecting Jun 05 17:03:09 fir-md1-s2 kernel: Lustre: Skipped 11 previous similar messages Jun 05 17:03:09 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.50.1.53@o2ib2) Jun 05 17:03:09 fir-md1-s2 kernel: Lustre: Skipped 13 previous similar messages Jun 05 17:04:00 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.49.23.15@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 17:04:00 fir-md1-s2 kernel: LustreError: Skipped 5 previous similar messages Jun 05 17:13:23 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 15368feb-5478-4 (at 10.50.1.53@o2ib2) reconnecting Jun 05 17:13:23 fir-md1-s2 kernel: Lustre: Skipped 11 previous similar messages Jun 05 17:13:23 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.50.1.53@o2ib2) Jun 05 17:13:23 fir-md1-s2 kernel: Lustre: Skipped 11 previous similar messages Jun 05 17:15:17 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.49.23.15@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 17:15:17 fir-md1-s2 kernel: LustreError: Skipped 4 previous similar messages Jun 05 17:23:37 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 15368feb-5478-4 (at 10.50.1.53@o2ib2) reconnecting Jun 05 17:23:37 fir-md1-s2 kernel: Lustre: Skipped 11 previous similar messages Jun 05 17:23:37 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.50.1.53@o2ib2) Jun 05 17:23:37 fir-md1-s2 kernel: Lustre: Skipped 13 previous similar messages Jun 05 17:26:10 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.49.23.15@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 17:26:10 fir-md1-s2 kernel: LustreError: Skipped 4 previous similar messages Jun 05 17:27:08 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 9237d27b-fa8f-4 (at 10.50.13.11@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d24755ef400, cur 1591403228 expire 1591403078 last 1591403001 Jun 05 17:27:08 fir-md1-s2 kernel: Lustre: Skipped 2 previous similar messages Jun 05 17:33:51 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 15368feb-5478-4 (at 10.50.1.53@o2ib2) reconnecting Jun 05 17:33:51 fir-md1-s2 kernel: Lustre: Skipped 11 previous similar messages Jun 05 17:33:51 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.50.1.53@o2ib2) Jun 05 17:33:51 fir-md1-s2 kernel: Lustre: Skipped 11 previous similar messages Jun 05 17:36:12 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.49.23.15@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 17:36:12 fir-md1-s2 kernel: LustreError: Skipped 4 previous similar messages Jun 05 17:36:33 fir-md1-s2 kernel: Lustre: 20996:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1591403786/real 1591403786] req@ffff8d01297ee300 x1668535016611904/t0(0) o104->fir-MDT0001@10.50.14.3@o2ib2:15/16 lens 296/224 e 0 to 1 dl 1591403793 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jun 05 17:36:33 fir-md1-s2 kernel: Lustre: 20996:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 34 previous similar messages Jun 05 17:37:03 fir-md1-s2 kernel: LNet: Service thread pid 23987 completed after 11237.58s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Jun 05 17:37:03 fir-md1-s2 kernel: LNet: Skipped 22 previous similar messages Jun 05 17:37:08 fir-md1-s2 kernel: Lustre: 20987:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1591403821/real 1591403821] req@ffff8d2197647080 x1668535016612032/t0(0) o104->fir-MDT0001@10.50.14.3@o2ib2:15/16 lens 296/224 e 0 to 1 dl 1591403828 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jun 05 17:37:08 fir-md1-s2 kernel: Lustre: 20987:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 8 previous similar messages Jun 05 17:37:22 fir-md1-s2 kernel: LustreError: 20996:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) ### client (nid 10.50.14.3@o2ib2) returned error from blocking AST (req@ffff8d01297ee300 x1668535016611904 status -107 rc -107), evict it ns: mdt-fir-MDT0001_UUID lock: ffff8d075cb9e780/0x1587f604bd9dbb23 lrc: 4/0,0 mode: PR/PR res: [0x240052915:0x1190:0x0].0x0 bits 0x13/0x0 rrc: 4 type: IBT flags: 0x60200400000020 nid: 10.50.14.3@o2ib2 remote: 0x571bb9ced2389249 expref: 361 pid: 23659 timeout: 169894 lvb_type: 0 Jun 05 17:37:22 fir-md1-s2 kernel: LustreError: 138-a: fir-MDT0001: A client on nid 10.50.14.3@o2ib2 was evicted due to a lock blocking callback time out: rc -107 Jun 05 17:37:22 fir-md1-s2 kernel: LustreError: Skipped 11 previous similar messages Jun 05 17:37:22 fir-md1-s2 kernel: LustreError: 20543:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 56s: evicting client at 10.50.14.3@o2ib2 ns: mdt-fir-MDT0001_UUID lock: ffff8d23b2fb2d00/0x1587f604bd9dbb3f lrc: 3/0,0 mode: PR/PR res: [0x24005b282:0xa5d:0x0].0x0 bits 0x13/0x0 rrc: 5 type: IBT flags: 0x60200400000020 nid: 10.50.14.3@o2ib2 remote: 0x571bb9ced2389250 expref: 362 pid: 24064 timeout: 0 lvb_type: 0 Jun 05 17:37:22 fir-md1-s2 kernel: LustreError: 20996:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) Skipped 3 previous similar messages Jun 05 17:48:19 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.49.23.15@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 17:48:19 fir-md1-s2 kernel: LustreError: Skipped 4 previous similar messages Jun 05 17:50:31 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.50.13.11@o2ib2) Jun 05 17:50:31 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages Jun 05 17:58:21 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.49.23.15@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 17:58:21 fir-md1-s2 kernel: LustreError: Skipped 4 previous similar messages Jun 05 18:08:49 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.49.23.15@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 18:08:49 fir-md1-s2 kernel: LustreError: Skipped 4 previous similar messages Jun 05 18:18:51 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.49.23.15@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 18:18:51 fir-md1-s2 kernel: LustreError: Skipped 3 previous similar messages Jun 05 18:30:33 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.49.23.15@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 18:30:33 fir-md1-s2 kernel: LustreError: Skipped 5 previous similar messages Jun 05 18:41:51 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.49.23.15@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 18:41:51 fir-md1-s2 kernel: LustreError: Skipped 4 previous similar messages Jun 05 18:52:43 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.49.23.15@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 18:52:43 fir-md1-s2 kernel: LustreError: Skipped 4 previous similar messages Jun 05 19:02:45 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.49.23.15@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 19:02:45 fir-md1-s2 kernel: LustreError: Skipped 4 previous similar messages Jun 05 19:14:52 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.49.23.15@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 19:14:52 fir-md1-s2 kernel: LustreError: Skipped 4 previous similar messages Jun 05 19:26:35 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.49.23.15@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 19:26:35 fir-md1-s2 kernel: LustreError: Skipped 5 previous similar messages Jun 05 19:37:52 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.49.23.15@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 19:37:52 fir-md1-s2 kernel: LustreError: Skipped 4 previous similar messages Jun 05 19:48:45 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.49.23.15@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 19:48:45 fir-md1-s2 kernel: LustreError: Skipped 4 previous similar messages Jun 05 19:58:47 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.49.23.15@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 19:58:47 fir-md1-s2 kernel: LustreError: Skipped 4 previous similar messages Jun 05 20:10:54 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.49.23.15@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 20:10:54 fir-md1-s2 kernel: LustreError: Skipped 4 previous similar messages Jun 05 20:20:56 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.49.23.15@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 20:20:56 fir-md1-s2 kernel: LustreError: Skipped 4 previous similar messages Jun 05 20:31:24 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.49.23.15@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server. Jun 05 20:31:24 fir-md1-s2 kernel: LustreError: Skipped 4 previous similar messages Jun 05 20:43:32 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 34df34d0-90af-4 (at 10.49.23.15@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d2426a0a800, cur 1591415012 expire 1591414862 last 1591414785 Jun 05 21:12:24 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.23.15@o2ib1) Jun 05 23:38:44 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.27@o2ib1) Jun 05 23:39:22 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client e03f06dd-9904-4 (at 10.49.27.27@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d0931b0a800, cur 1591425562 expire 1591425412 last 1591425335 Jun 05 23:56:57 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client eac6489f-dbb9-4 (at 10.50.16.3@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8cf44be7cc00, cur 1591426617 expire 1591426467 last 1591426390 Jun 06 00:14:01 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.27@o2ib1) Jun 06 00:14:56 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client cb323f20-972d-4 (at 10.49.27.27@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d0986f0ac00, cur 1591427696 expire 1591427546 last 1591427469 Jun 06 00:21:38 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.50.16.3@o2ib2) Jun 06 05:03:17 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 813f5bb8-31cd-4 (at 10.49.28.2@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8cf44bb79000, cur 1591444997 expire 1591444847 last 1591444770 Jun 06 05:26:32 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.28.2@o2ib1) Jun 06 06:47:49 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 95e8f6b1-e21f-4 (at 10.50.9.37@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8ce6c6f3a400, cur 1591451269 expire 1591451119 last 1591451042 Jun 06 06:48:47 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.50.9.37@o2ib2) Jun 06 06:52:17 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.27@o2ib1) Jun 06 06:53:09 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 69c8be54-086f-4 (at 10.49.27.27@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d0468726800, cur 1591451589 expire 1591451439 last 1591451362 Jun 06 10:01:09 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.27@o2ib1) Jun 06 10:01:54 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 1ab64b73-de91-4 (at 10.49.27.27@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8cffe3a53400, cur 1591462914 expire 1591462764 last 1591462687 Jun 06 10:06:46 fir-md1-s2 kernel: Lustre: 23866:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1591463199/real 1591463199] req@ffff8cf314666300 x1668536601096064/t0(0) o104->fir-MDT0001@10.49.27.27@o2ib1:15/16 lens 296/224 e 0 to 1 dl 1591463206 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jun 06 10:06:46 fir-md1-s2 kernel: Lustre: 23866:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 4 previous similar messages Jun 06 10:07:00 fir-md1-s2 kernel: Lustre: 23866:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1591463213/real 1591463213] req@ffff8cf314666300 x1668536601096064/t0(0) o104->fir-MDT0001@10.49.27.27@o2ib1:15/16 lens 296/224 e 0 to 1 dl 1591463220 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jun 06 10:07:00 fir-md1-s2 kernel: Lustre: 23866:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jun 06 10:07:21 fir-md1-s2 kernel: Lustre: 23866:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1591463234/real 1591463234] req@ffff8cf314666300 x1668536601096064/t0(0) o104->fir-MDT0001@10.49.27.27@o2ib1:15/16 lens 296/224 e 0 to 1 dl 1591463241 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jun 06 10:07:21 fir-md1-s2 kernel: Lustre: 23866:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Jun 06 10:07:56 fir-md1-s2 kernel: Lustre: 23866:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1591463269/real 1591463269] req@ffff8cf314666300 x1668536601096064/t0(0) o104->fir-MDT0001@10.49.27.27@o2ib1:15/16 lens 296/224 e 0 to 1 dl 1591463276 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jun 06 10:07:56 fir-md1-s2 kernel: Lustre: 23866:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 4 previous similar messages Jun 06 10:09:02 fir-md1-s2 kernel: Lustre: 23996:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1591463335/real 1591463335] req@ffff8d0bda5a0000 x1668536601552064/t0(0) o104->fir-MDT0001@10.49.27.27@o2ib1:15/16 lens 296/224 e 0 to 1 dl 1591463342 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jun 06 10:09:02 fir-md1-s2 kernel: Lustre: 23996:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 12 previous similar messages Jun 06 10:09:13 fir-md1-s2 kernel: LustreError: 23866:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) ### client (nid 10.49.27.27@o2ib1) failed to reply to blocking AST (req@ffff8cf314666300 x1668536601096064 status 0 rc -110), evict it ns: mdt-fir-MDT0001_UUID lock: ffff8d072b2698c0/0x1587f6074d67486c lrc: 4/0,0 mode: PR/PR res: [0x24005a222:0xb62b:0x0].0x0 bits 0x13/0x0 rrc: 33 type: IBT flags: 0x60200400000020 nid: 10.49.27.27@o2ib1 remote: 0xeaf926f5effa8a08 expref: 5010 pid: 20958 timeout: 229399 lvb_type: 0 Jun 06 10:09:13 fir-md1-s2 kernel: LustreError: 138-a: fir-MDT0001: A client on nid 10.49.27.27@o2ib1 was evicted due to a lock blocking callback time out: rc -110 Jun 06 10:09:13 fir-md1-s2 kernel: LustreError: Skipped 2 previous similar messages Jun 06 10:09:13 fir-md1-s2 kernel: LustreError: 20543:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 154s: evicting client at 10.49.27.27@o2ib1 ns: mdt-fir-MDT0001_UUID lock: ffff8d072b2698c0/0x1587f6074d67486c lrc: 3/0,0 mode: PR/PR res: [0x24005a222:0xb62b:0x0].0x0 bits 0x13/0x0 rrc: 33 type: IBT flags: 0x60200400000020 nid: 10.49.27.27@o2ib1 remote: 0xeaf926f5effa8a08 expref: 5011 pid: 20958 timeout: 0 lvb_type: 0 Jun 06 10:09:13 fir-md1-s2 kernel: LustreError: 20543:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message Jun 06 10:09:23 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.27@o2ib1) Jun 06 10:21:15 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.27@o2ib1) Jun 06 10:21:45 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 551e7d2d-a59a-4 (at 10.49.27.27@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d0391e58c00, cur 1591464105 expire 1591463955 last 1591463878 Jun 06 10:30:32 fir-md1-s2 kernel: Lustre: 23992:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1591464625/real 1591464625] req@ffff8d0476c5ba80 x1668536630034112/t0(0) o104->fir-MDT0001@10.49.27.27@o2ib1:15/16 lens 296/224 e 0 to 1 dl 1591464632 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jun 06 10:30:32 fir-md1-s2 kernel: Lustre: 23992:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 5 previous similar messages Jun 06 10:30:53 fir-md1-s2 kernel: Lustre: 23992:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1591464646/real 1591464646] req@ffff8d0476c5ba80 x1668536630034112/t0(0) o104->fir-MDT0001@10.49.27.27@o2ib1:15/16 lens 296/224 e 0 to 1 dl 1591464653 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jun 06 10:30:53 fir-md1-s2 kernel: Lustre: 23992:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Jun 06 10:31:29 fir-md1-s2 kernel: Lustre: 23992:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1591464681/real 1591464681] req@ffff8d0476c5ba80 x1668536630034112/t0(0) o104->fir-MDT0001@10.49.27.27@o2ib1:15/16 lens 296/224 e 0 to 1 dl 1591464688 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jun 06 10:31:29 fir-md1-s2 kernel: Lustre: 23992:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 4 previous similar messages Jun 06 10:32:33 fir-md1-s2 kernel: Lustre: 20958:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1591464746/real 1591464746] req@ffff8d088777b600 x1668536630383936/t0(0) o104->fir-MDT0001@10.49.27.27@o2ib1:15/16 lens 296/224 e 0 to 1 dl 1591464753 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jun 06 10:32:33 fir-md1-s2 kernel: Lustre: 20958:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 13 previous similar messages Jun 06 10:32:42 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.27@o2ib1) Jun 06 10:32:46 fir-md1-s2 kernel: LustreError: 23992:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) ### client (nid 10.49.27.27@o2ib1) returned error from blocking AST (req@ffff8d0476c5ba80 x1668536630034112 status -107 rc -107), evict it ns: mdt-fir-MDT0001_UUID lock: ffff8d02a5a8a640/0x1587f6075d6743ee lrc: 4/0,0 mode: PR/PR res: [0x24005a46e:0x45a:0x0].0x0 bits 0x13/0x0 rrc: 5 type: IBT flags: 0x60200400000020 nid: 10.49.27.27@o2ib1 remote: 0xb16e1c301cc5fbee expref: 6383 pid: 24063 timeout: 230818 lvb_type: 0 Jun 06 10:32:46 fir-md1-s2 kernel: LustreError: 138-a: fir-MDT0001: A client on nid 10.49.27.27@o2ib1 was evicted due to a lock blocking callback time out: rc -107 Jun 06 10:32:46 fir-md1-s2 kernel: LustreError: 20543:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 141s: evicting client at 10.49.27.27@o2ib1 ns: mdt-fir-MDT0001_UUID lock: ffff8d02a5a8a640/0x1587f6075d6743ee lrc: 3/0,0 mode: PR/PR res: [0x24005a46e:0x45a:0x0].0x0 bits 0x13/0x0 rrc: 5 type: IBT flags: 0x60200400000020 nid: 10.49.27.27@o2ib1 remote: 0xb16e1c301cc5fbee expref: 6384 pid: 24063 timeout: 0 lvb_type: 0 Jun 06 10:43:55 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.27@o2ib1) Jun 06 10:44:15 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 12ee7b97-b20d-4 (at 10.49.27.27@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d0cdfae1000, cur 1591465455 expire 1591465305 last 1591465228 Jun 06 12:44:06 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.27@o2ib1) Jun 06 12:44:59 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 8a6497f1-c50b-4 (at 10.49.27.27@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d240c308000, cur 1591472699 expire 1591472549 last 1591472472 Jun 06 13:07:46 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 95587c94-fbb9-4 (at 10.49.29.8@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8cf44c314c00, cur 1591474066 expire 1591473916 last 1591473839 Jun 06 13:08:52 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.29.8@o2ib1) Jun 06 13:36:02 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.50.14.3@o2ib2) Jun 06 13:37:09 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client e4cc0ef7-a588-4 (at 10.50.14.3@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d0986fde800, cur 1591475829 expire 1591475679 last 1591475602 Jun 06 13:40:21 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.50.14.3@o2ib2) Jun 06 13:41:17 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client c7ac835d-8cb6-4 (at 10.50.14.3@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d0936f79000, cur 1591476077 expire 1591475927 last 1591475850 Jun 06 13:51:32 fir-md1-s2 kernel: Lustre: 23959:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1591476685/real 1591476685] req@ffff8ceac7346780 x1668536948785088/t0(0) o104->fir-MDT0001@10.49.27.27@o2ib1:15/16 lens 296/224 e 0 to 1 dl 1591476692 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jun 06 13:51:32 fir-md1-s2 kernel: Lustre: 23959:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 3 previous similar messages Jun 06 13:51:53 fir-md1-s2 kernel: Lustre: 23959:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1591476706/real 1591476706] req@ffff8ceac7346780 x1668536948785088/t0(0) o104->fir-MDT0001@10.49.27.27@o2ib1:15/16 lens 296/224 e 0 to 1 dl 1591476713 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jun 06 13:51:53 fir-md1-s2 kernel: Lustre: 23959:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Jun 06 13:52:26 fir-md1-s2 kernel: Lustre: 23660:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1591476739/real 1591476739] req@ffff8d02c477ec00 x1668536949015488/t0(0) o104->fir-MDT0001@10.49.27.27@o2ib1:15/16 lens 296/224 e 0 to 1 dl 1591476746 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jun 06 13:52:26 fir-md1-s2 kernel: Lustre: 23660:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 7 previous similar messages Jun 06 13:52:48 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.50.14.3@o2ib2) Jun 06 13:53:10 fir-md1-s2 kernel: LustreError: 23959:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) ### client (nid 10.49.27.27@o2ib1) failed to reply to blocking AST (req@ffff8ceac7346780 x1668536948785088 status 0 rc -110), evict it ns: mdt-fir-MDT0001_UUID lock: ffff8d0ebfb29440/0x1587f607f89c2219 lrc: 4/0,0 mode: PR/PR res: [0x24005a19e:0xf5f4:0x0].0x0 bits 0x13/0x0 rrc: 9 type: IBT flags: 0x60200400000020 nid: 10.49.27.27@o2ib1 remote: 0x65bf448c10074093 expref: 6498 pid: 23776 timeout: 242786 lvb_type: 0 Jun 06 13:53:10 fir-md1-s2 kernel: LustreError: 23959:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) Skipped 1 previous similar message Jun 06 13:53:10 fir-md1-s2 kernel: LustreError: 138-a: fir-MDT0001: A client on nid 10.49.27.27@o2ib1 was evicted due to a lock blocking callback time out: rc -110 Jun 06 13:53:10 fir-md1-s2 kernel: LustreError: Skipped 1 previous similar message Jun 06 13:53:10 fir-md1-s2 kernel: LustreError: 20543:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 105s: evicting client at 10.49.27.27@o2ib1 ns: mdt-fir-MDT0001_UUID lock: ffff8d0ebfb29440/0x1587f607f89c2219 lrc: 3/0,0 mode: PR/PR res: [0x24005a19e:0xf5f4:0x0].0x0 bits 0x13/0x0 rrc: 9 type: IBT flags: 0x60200400000020 nid: 10.49.27.27@o2ib1 remote: 0x65bf448c10074093 expref: 6499 pid: 23776 timeout: 0 lvb_type: 0 Jun 06 13:53:58 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 2da258f5-b367-4 (at 10.50.14.3@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d093a27ec00, cur 1591476838 expire 1591476688 last 1591476611 Jun 06 13:54:10 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.27@o2ib1) Jun 06 14:06:50 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 0d58400b-0d96-4 (at 10.50.14.3@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8ce4a6d8ec00, cur 1591477610 expire 1591477460 last 1591477383 Jun 06 14:11:03 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.50.14.3@o2ib2) Jun 06 14:21:01 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.50.14.3@o2ib2) Jun 06 14:22:10 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client f07c092d-72e4-4 (at 10.50.14.3@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8ce4a77e2c00, cur 1591478530 expire 1591478380 last 1591478303 Jun 06 16:12:18 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.29.6@o2ib1) Jun 06 17:25:30 fir-md1-s2 kernel: LNet: Service thread pid 21092 was inactive for 200.38s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jun 06 17:25:30 fir-md1-s2 kernel: Pid: 21092, comm: mdt00_006 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 SMP Thu Nov 7 15:26:16 PST 2019 Jun 06 17:25:30 fir-md1-s2 kernel: Call Trace: Jun 06 17:25:30 fir-md1-s2 kernel: [] ldlm_completion_ast+0x430/0x860 [ptlrpc] Jun 06 17:25:30 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] Jun 06 17:25:30 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Jun 06 17:25:30 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x360 [mdt] Jun 06 17:25:30 fir-md1-s2 kernel: [] mdt_reint_object_lock+0x2c/0x60 [mdt] Jun 06 17:25:30 fir-md1-s2 kernel: [] mdt_layout_change+0x20b/0x480 [mdt] Jun 06 17:25:30 fir-md1-s2 kernel: [] mdt_intent_layout+0x8a0/0xe00 [mdt] Jun 06 17:25:30 fir-md1-s2 kernel: [] mdt_intent_policy+0x435/0xd80 [mdt] Jun 06 17:25:30 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x356/0xa20 [ptlrpc] Jun 06 17:25:30 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa56/0x15f0 [ptlrpc] Jun 06 17:25:30 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Jun 06 17:25:30 fir-md1-s2 kernel: [] tgt_request_handle+0xada/0x1570 [ptlrpc] Jun 06 17:25:30 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jun 06 17:25:30 fir-md1-s2 kernel: [] ptlrpc_main+0xb34/0x1470 [ptlrpc] Jun 06 17:25:30 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Jun 06 17:25:30 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jun 06 17:25:30 fir-md1-s2 kernel: [] 0xffffffffffffffff Jun 06 17:25:30 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1591489530.21092 Jun 06 17:25:30 fir-md1-s2 kernel: Pid: 23663, comm: mdt00_019 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 SMP Thu Nov 7 15:26:16 PST 2019 Jun 06 17:25:30 fir-md1-s2 kernel: Call Trace: Jun 06 17:25:30 fir-md1-s2 kernel: [] ldlm_completion_ast+0x430/0x860 [ptlrpc] Jun 06 17:25:30 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] Jun 06 17:25:30 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Jun 06 17:25:30 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x360 [mdt] Jun 06 17:25:30 fir-md1-s2 kernel: [] mdt_reint_object_lock+0x2c/0x60 [mdt] Jun 06 17:25:30 fir-md1-s2 kernel: [] mdt_layout_change+0x20b/0x480 [mdt] Jun 06 17:25:30 fir-md1-s2 kernel: [] mdt_intent_layout+0x8a0/0xe00 [mdt] Jun 06 17:25:30 fir-md1-s2 kernel: [] mdt_intent_policy+0x435/0xd80 [mdt] Jun 06 17:25:30 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x356/0xa20 [ptlrpc] Jun 06 17:25:30 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa56/0x15f0 [ptlrpc] Jun 06 17:25:30 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Jun 06 17:25:30 fir-md1-s2 kernel: [] tgt_request_handle+0xada/0x1570 [ptlrpc] Jun 06 17:25:30 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jun 06 17:25:30 fir-md1-s2 kernel: [] ptlrpc_main+0xb34/0x1470 [ptlrpc] Jun 06 17:25:30 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Jun 06 17:25:30 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jun 06 17:25:30 fir-md1-s2 kernel: [] 0xffffffffffffffff Jun 06 17:25:30 fir-md1-s2 kernel: Pid: 23702, comm: mdt00_029 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 SMP Thu Nov 7 15:26:16 PST 2019 Jun 06 17:25:30 fir-md1-s2 kernel: Call Trace: Jun 06 17:25:30 fir-md1-s2 kernel: [] ldlm_completion_ast+0x430/0x860 [ptlrpc] Jun 06 17:25:30 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] Jun 06 17:25:30 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Jun 06 17:25:30 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x360 [mdt] Jun 06 17:25:30 fir-md1-s2 kernel: [] mdt_reint_object_lock+0x2c/0x60 [mdt] Jun 06 17:25:30 fir-md1-s2 kernel: [] mdt_layout_change+0x20b/0x480 [mdt] Jun 06 17:25:30 fir-md1-s2 kernel: [] mdt_intent_layout+0x8a0/0xe00 [mdt] Jun 06 17:25:30 fir-md1-s2 kernel: [] mdt_intent_policy+0x435/0xd80 [mdt] Jun 06 17:25:30 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x356/0xa20 [ptlrpc] Jun 06 17:25:30 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa56/0x15f0 [ptlrpc] Jun 06 17:25:30 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Jun 06 17:25:30 fir-md1-s2 kernel: [] tgt_request_handle+0xada/0x1570 [ptlrpc] Jun 06 17:25:30 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jun 06 17:25:30 fir-md1-s2 kernel: [] ptlrpc_main+0xb34/0x1470 [ptlrpc] Jun 06 17:25:30 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Jun 06 17:25:30 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jun 06 17:25:30 fir-md1-s2 kernel: [] 0xffffffffffffffff Jun 06 17:25:30 fir-md1-s2 kernel: Pid: 23905, comm: mdt00_073 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 SMP Thu Nov 7 15:26:16 PST 2019 Jun 06 17:25:30 fir-md1-s2 kernel: Call Trace: Jun 06 17:25:30 fir-md1-s2 kernel: [] ldlm_completion_ast+0x430/0x860 [ptlrpc] Jun 06 17:25:30 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] Jun 06 17:25:30 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Jun 06 17:25:30 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x360 [mdt] Jun 06 17:25:30 fir-md1-s2 kernel: [] mdt_reint_object_lock+0x2c/0x60 [mdt] Jun 06 17:25:30 fir-md1-s2 kernel: [] mdt_layout_change+0x20b/0x480 [mdt] Jun 06 17:25:30 fir-md1-s2 kernel: [] mdt_intent_layout+0x8a0/0xe00 [mdt] Jun 06 17:25:30 fir-md1-s2 kernel: [] mdt_intent_policy+0x435/0xd80 [mdt] Jun 06 17:25:30 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x356/0xa20 [ptlrpc] Jun 06 17:25:30 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa56/0x15f0 [ptlrpc] Jun 06 17:25:30 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Jun 06 17:25:30 fir-md1-s2 kernel: [] tgt_request_handle+0xada/0x1570 [ptlrpc] Jun 06 17:25:31 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jun 06 17:25:31 fir-md1-s2 kernel: [] ptlrpc_main+0xb34/0x1470 [ptlrpc] Jun 06 17:25:31 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Jun 06 17:25:31 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jun 06 17:25:31 fir-md1-s2 kernel: [] 0xffffffffffffffff Jun 06 17:25:31 fir-md1-s2 kernel: LNet: Service thread pid 23668 was inactive for 200.90s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jun 06 17:25:31 fir-md1-s2 kernel: LNet: Skipped 3 previous similar messages Jun 06 17:25:31 fir-md1-s2 kernel: Pid: 23668, comm: mdt03_025 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 SMP Thu Nov 7 15:26:16 PST 2019 Jun 06 17:25:31 fir-md1-s2 kernel: Call Trace: Jun 06 17:25:31 fir-md1-s2 kernel: [] ldlm_completion_ast+0x430/0x860 [ptlrpc] Jun 06 17:25:31 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] Jun 06 17:25:31 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Jun 06 17:25:31 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x360 [mdt] Jun 06 17:25:31 fir-md1-s2 kernel: [] mdt_reint_object_lock+0x2c/0x60 [mdt] Jun 06 17:25:31 fir-md1-s2 kernel: [] mdt_layout_change+0x20b/0x480 [mdt] Jun 06 17:25:31 fir-md1-s2 kernel: [] mdt_intent_layout+0x8a0/0xe00 [mdt] Jun 06 17:25:31 fir-md1-s2 kernel: [] mdt_intent_policy+0x435/0xd80 [mdt] Jun 06 17:25:31 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x356/0xa20 [ptlrpc] Jun 06 17:25:31 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa56/0x15f0 [ptlrpc] Jun 06 17:25:31 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Jun 06 17:25:31 fir-md1-s2 kernel: [] tgt_request_handle+0xada/0x1570 [ptlrpc] Jun 06 17:25:31 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jun 06 17:25:31 fir-md1-s2 kernel: [] ptlrpc_main+0xb34/0x1470 [ptlrpc] Jun 06 17:25:31 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Jun 06 17:25:31 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jun 06 17:25:31 fir-md1-s2 kernel: [] 0xffffffffffffffff Jun 06 17:25:31 fir-md1-s2 kernel: LNet: Service thread pid 20935 was inactive for 201.04s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Jun 06 17:25:31 fir-md1-s2 kernel: LNet: Skipped 36 previous similar messages Jun 06 17:27:10 fir-md1-s2 kernel: LustreError: 23770:0:(ldlm_request.c:130:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1591489330, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0001_UUID lock: ffff8d0139bfe540/0x1587f608b587843c lrc: 3/0,1 mode: --/EX res: [0x24003e244:0x3b51:0x0].0x0 bits 0x8/0x0 rrc: 92 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 23770 timeout: 0 lvb_type: 0 Jun 06 17:27:10 fir-md1-s2 kernel: LustreError: 23770:0:(ldlm_request.c:130:ldlm_expired_completion_wait()) Skipped 40 previous similar messages Jun 06 17:27:10 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client b1944413-b28e-4 (at 10.50.1.52@o2ib2) reconnecting Jun 06 17:27:10 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.50.1.56@o2ib2) Jun 06 17:27:10 fir-md1-s2 kernel: Lustre: Skipped 7 previous similar messages Jun 06 17:32:11 fir-md1-s2 kernel: Lustre: 23966:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-1), not sending early reply req@ffff8d09a7f78d80 x1659566995502016/t0(0) o101->b1944413-b28e-4@10.50.1.52@o2ib2:521/0 lens 376/1600 e 12 to 0 dl 1591489936 ref 2 fl Interpret:/0/0 rc 0/0 Jun 06 17:32:11 fir-md1-s2 kernel: Lustre: 23966:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Jun 06 17:32:12 fir-md1-s2 kernel: Lustre: 23960:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-2), not sending early reply req@ffff8cec0060f500 x1661063055895872/t0(0) o101->05bde955-e83a-4@10.50.1.51@o2ib2:522/0 lens 376/1600 e 12 to 0 dl 1591489937 ref 2 fl Interpret:/0/0 rc 0/0 Jun 06 17:32:12 fir-md1-s2 kernel: Lustre: 23960:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 8 previous similar messages Jun 06 17:32:13 fir-md1-s2 kernel: Lustre: 23688:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (4/-3), not sending early reply req@ffff8d1fd620cc80 x1659655954238656/t0(0) o101->27170965-5314-4@10.50.1.55@o2ib2:522/0 lens 376/1600 e 12 to 0 dl 1591489937 ref 2 fl Interpret:/0/0 rc 0/0 Jun 06 17:32:13 fir-md1-s2 kernel: Lustre: 23688:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 13 previous similar messages Jun 06 17:32:17 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 8d039ea4-d34b-4 (at 10.50.1.56@o2ib2) reconnecting Jun 06 17:32:17 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.50.1.51@o2ib2) Jun 06 17:32:17 fir-md1-s2 kernel: Lustre: Skipped 6 previous similar messages Jun 06 17:32:17 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 06 17:32:18 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.50.1.55@o2ib2) Jun 06 17:32:18 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Jun 06 17:37:24 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 05bde955-e83a-4 (at 10.50.1.51@o2ib2) reconnecting Jun 06 17:37:24 fir-md1-s2 kernel: Lustre: Skipped 2 previous similar messages Jun 06 17:37:24 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.50.1.53@o2ib2) Jun 06 17:42:31 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 15368feb-5478-4 (at 10.50.1.53@o2ib2) reconnecting Jun 06 17:42:31 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.50.1.54@o2ib2) Jun 06 17:42:31 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 06 17:42:31 fir-md1-s2 kernel: Lustre: Skipped 6 previous similar messages Jun 06 17:47:38 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.50.1.54@o2ib2) Jun 06 17:47:38 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 06 17:52:45 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 05bde955-e83a-4 (at 10.50.1.51@o2ib2) reconnecting Jun 06 17:52:45 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.50.1.56@o2ib2) Jun 06 17:52:45 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages Jun 06 17:52:45 fir-md1-s2 kernel: Lustre: Skipped 11 previous similar messages Jun 06 17:57:52 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.50.1.56@o2ib2) Jun 06 17:57:52 fir-md1-s2 kernel: Lustre: Skipped 6 previous similar messages Jun 06 18:02:59 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client a72a2280-4c81-4 (at 10.50.1.54@o2ib2) reconnecting Jun 06 18:02:59 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages Jun 06 18:02:59 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.50.1.53@o2ib2) Jun 06 18:02:59 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Jun 06 18:08:06 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.50.1.53@o2ib2) Jun 06 18:08:06 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 06 18:13:13 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 8d039ea4-d34b-4 (at 10.50.1.56@o2ib2) reconnecting Jun 06 18:13:13 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages Jun 06 18:13:13 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.50.1.51@o2ib2) Jun 06 18:13:13 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages Jun 06 18:23:27 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 05bde955-e83a-4 (at 10.50.1.51@o2ib2) reconnecting Jun 06 18:23:27 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.50.1.53@o2ib2) Jun 06 18:23:27 fir-md1-s2 kernel: Lustre: Skipped 11 previous similar messages Jun 06 18:23:27 fir-md1-s2 kernel: Lustre: Skipped 11 previous similar messages Jun 06 18:33:41 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 15368feb-5478-4 (at 10.50.1.53@o2ib2) reconnecting Jun 06 18:33:41 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.50.1.54@o2ib2) Jun 06 18:33:41 fir-md1-s2 kernel: Lustre: Skipped 11 previous similar messages Jun 06 18:33:41 fir-md1-s2 kernel: Lustre: Skipped 10 previous similar messages Jun 06 18:43:55 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client a72a2280-4c81-4 (at 10.50.1.54@o2ib2) reconnecting Jun 06 18:43:55 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages Jun 06 18:43:55 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.50.1.56@o2ib2) Jun 06 18:43:55 fir-md1-s2 kernel: Lustre: Skipped 11 previous similar messages Jun 06 18:54:09 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 8d039ea4-d34b-4 (at 10.50.1.56@o2ib2) reconnecting Jun 06 18:54:09 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages Jun 06 18:54:09 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.50.1.51@o2ib2) Jun 06 18:54:09 fir-md1-s2 kernel: Lustre: Skipped 11 previous similar messages Jun 06 18:54:44 fir-md1-s2 kernel: LNet: Service thread pid 23770 completed after 5554.39s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Jun 06 18:54:44 fir-md1-s2 kernel: LNet: Skipped 32 previous similar messages Jun 06 19:34:03 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client d39bceae-36bf-4 (at 10.50.15.1@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8cf44be78000, cur 1591497243 expire 1591497093 last 1591497016 Jun 06 20:06:41 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.50.15.1@o2ib2) Jun 06 20:06:41 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 06 20:52:36 fir-md1-s2 kernel: LNet: Service thread pid 23666 was inactive for 200.05s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jun 06 20:52:36 fir-md1-s2 kernel: Pid: 23666, comm: mdt01_023 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 SMP Thu Nov 7 15:26:16 PST 2019 Jun 06 20:52:36 fir-md1-s2 kernel: Call Trace: Jun 06 20:52:36 fir-md1-s2 kernel: [] ldlm_completion_ast+0x430/0x860 [ptlrpc] Jun 06 20:52:36 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] Jun 06 20:52:36 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Jun 06 20:52:36 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x360 [mdt] Jun 06 20:52:36 fir-md1-s2 kernel: [] mdt_reint_object_lock+0x2c/0x60 [mdt] Jun 06 20:52:36 fir-md1-s2 kernel: [] mdt_layout_change+0x20b/0x480 [mdt] Jun 06 20:52:36 fir-md1-s2 kernel: [] mdt_intent_layout+0x8a0/0xe00 [mdt] Jun 06 20:52:36 fir-md1-s2 kernel: [] mdt_intent_policy+0x435/0xd80 [mdt] Jun 06 20:52:36 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x356/0xa20 [ptlrpc] Jun 06 20:52:36 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa56/0x15f0 [ptlrpc] Jun 06 20:52:36 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Jun 06 20:52:36 fir-md1-s2 kernel: [] tgt_request_handle+0xada/0x1570 [ptlrpc] Jun 06 20:52:36 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jun 06 20:52:36 fir-md1-s2 kernel: [] ptlrpc_main+0xb34/0x1470 [ptlrpc] Jun 06 20:52:36 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Jun 06 20:52:36 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jun 06 20:52:36 fir-md1-s2 kernel: [] 0xffffffffffffffff Jun 06 20:52:36 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1591501956.23666 Jun 06 20:52:36 fir-md1-s2 kernel: Pid: 24033, comm: mdt01_103 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 SMP Thu Nov 7 15:26:16 PST 2019 Jun 06 20:52:36 fir-md1-s2 kernel: Call Trace: Jun 06 20:52:36 fir-md1-s2 kernel: [] ldlm_completion_ast+0x430/0x860 [ptlrpc] Jun 06 20:52:36 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] Jun 06 20:52:36 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Jun 06 20:52:36 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x360 [mdt] Jun 06 20:52:36 fir-md1-s2 kernel: [] mdt_reint_object_lock+0x2c/0x60 [mdt] Jun 06 20:52:36 fir-md1-s2 kernel: [] mdt_layout_change+0x20b/0x480 [mdt] Jun 06 20:52:36 fir-md1-s2 kernel: [] mdt_intent_layout+0x8a0/0xe00 [mdt] Jun 06 20:52:36 fir-md1-s2 kernel: [] mdt_intent_policy+0x435/0xd80 [mdt] Jun 06 20:52:36 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x356/0xa20 [ptlrpc] Jun 06 20:52:36 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa56/0x15f0 [ptlrpc] Jun 06 20:52:36 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Jun 06 20:52:36 fir-md1-s2 kernel: [] tgt_request_handle+0xada/0x1570 [ptlrpc] Jun 06 20:52:36 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jun 06 20:52:36 fir-md1-s2 kernel: [] ptlrpc_main+0xb34/0x1470 [ptlrpc] Jun 06 20:52:36 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Jun 06 20:52:36 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jun 06 20:52:36 fir-md1-s2 kernel: [] 0xffffffffffffffff Jun 06 20:52:36 fir-md1-s2 kernel: Pid: 23947, comm: mdt01_086 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 SMP Thu Nov 7 15:26:16 PST 2019 Jun 06 20:52:36 fir-md1-s2 kernel: Call Trace: Jun 06 20:52:36 fir-md1-s2 kernel: [] ldlm_completion_ast+0x430/0x860 [ptlrpc] Jun 06 20:52:36 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] Jun 06 20:52:36 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Jun 06 20:52:36 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x360 [mdt] Jun 06 20:52:36 fir-md1-s2 kernel: [] mdt_reint_object_lock+0x2c/0x60 [mdt] Jun 06 20:52:36 fir-md1-s2 kernel: [] mdt_layout_change+0x20b/0x480 [mdt] Jun 06 20:52:36 fir-md1-s2 kernel: [] mdt_intent_layout+0x8a0/0xe00 [mdt] Jun 06 20:52:36 fir-md1-s2 kernel: [] mdt_intent_policy+0x435/0xd80 [mdt] Jun 06 20:52:36 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x356/0xa20 [ptlrpc] Jun 06 20:52:36 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa56/0x15f0 [ptlrpc] Jun 06 20:52:36 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Jun 06 20:52:36 fir-md1-s2 kernel: [] tgt_request_handle+0xada/0x1570 [ptlrpc] Jun 06 20:52:36 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jun 06 20:52:36 fir-md1-s2 kernel: [] ptlrpc_main+0xb34/0x1470 [ptlrpc] Jun 06 20:52:36 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Jun 06 20:52:36 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jun 06 20:52:36 fir-md1-s2 kernel: [] 0xffffffffffffffff Jun 06 20:52:36 fir-md1-s2 kernel: Pid: 23931, comm: mdt03_078 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 SMP Thu Nov 7 15:26:16 PST 2019 Jun 06 20:52:36 fir-md1-s2 kernel: Call Trace: Jun 06 20:52:36 fir-md1-s2 kernel: [] ldlm_completion_ast+0x430/0x860 [ptlrpc] Jun 06 20:52:36 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] Jun 06 20:52:36 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Jun 06 20:52:36 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x360 [mdt] Jun 06 20:52:36 fir-md1-s2 kernel: [] mdt_reint_object_lock+0x2c/0x60 [mdt] Jun 06 20:52:36 fir-md1-s2 kernel: [] mdt_layout_change+0x20b/0x480 [mdt] Jun 06 20:52:36 fir-md1-s2 kernel: [] mdt_intent_layout+0x8a0/0xe00 [mdt] Jun 06 20:52:36 fir-md1-s2 kernel: [] mdt_intent_policy+0x435/0xd80 [mdt] Jun 06 20:52:36 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x356/0xa20 [ptlrpc] Jun 06 20:52:36 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa56/0x15f0 [ptlrpc] Jun 06 20:52:36 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Jun 06 20:52:36 fir-md1-s2 kernel: [] tgt_request_handle+0xada/0x1570 [ptlrpc] Jun 06 20:52:36 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jun 06 20:52:36 fir-md1-s2 kernel: [] ptlrpc_main+0xb34/0x1470 [ptlrpc] Jun 06 20:52:36 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Jun 06 20:52:36 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jun 06 20:52:36 fir-md1-s2 kernel: [] 0xffffffffffffffff Jun 06 20:52:36 fir-md1-s2 kernel: LNet: Service thread pid 23887 was inactive for 200.58s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jun 06 20:52:36 fir-md1-s2 kernel: LNet: Skipped 3 previous similar messages Jun 06 20:52:36 fir-md1-s2 kernel: Pid: 23887, comm: mdt01_070 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 SMP Thu Nov 7 15:26:16 PST 2019 Jun 06 20:52:36 fir-md1-s2 kernel: Call Trace: Jun 06 20:52:36 fir-md1-s2 kernel: [] ldlm_completion_ast+0x430/0x860 [ptlrpc] Jun 06 20:52:36 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] Jun 06 20:52:36 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Jun 06 20:52:36 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x360 [mdt] Jun 06 20:52:36 fir-md1-s2 kernel: [] mdt_reint_object_lock+0x2c/0x60 [mdt] Jun 06 20:52:36 fir-md1-s2 kernel: [] mdt_layout_change+0x20b/0x480 [mdt] Jun 06 20:52:36 fir-md1-s2 kernel: [] mdt_intent_layout+0x8a0/0xe00 [mdt] Jun 06 20:52:36 fir-md1-s2 kernel: [] mdt_intent_policy+0x435/0xd80 [mdt] Jun 06 20:52:36 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x356/0xa20 [ptlrpc] Jun 06 20:52:36 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa56/0x15f0 [ptlrpc] Jun 06 20:52:36 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Jun 06 20:52:36 fir-md1-s2 kernel: [] tgt_request_handle+0xada/0x1570 [ptlrpc] Jun 06 20:52:36 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jun 06 20:52:36 fir-md1-s2 kernel: [] ptlrpc_main+0xb34/0x1470 [ptlrpc] Jun 06 20:52:36 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Jun 06 20:52:36 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jun 06 20:52:36 fir-md1-s2 kernel: [] 0xffffffffffffffff Jun 06 20:52:36 fir-md1-s2 kernel: LNet: Service thread pid 23669 was inactive for 200.73s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Jun 06 20:52:36 fir-md1-s2 kernel: LNet: Skipped 36 previous similar messages Jun 06 20:54:16 fir-md1-s2 kernel: LustreError: 24052:0:(ldlm_request.c:130:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1591501756, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0001_UUID lock: ffff8ce9de266c00/0x1587f6095ae650b3 lrc: 3/0,1 mode: --/EX res: [0x24003e244:0x3bb4:0x0].0x0 bits 0x8/0x0 rrc: 97 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 24052 timeout: 0 lvb_type: 0 Jun 06 20:54:16 fir-md1-s2 kernel: LustreError: 23963:0:(ldlm_request.c:130:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1591501756, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0001_UUID lock: ffff8cebb6e3c380/0x1587f6095ae65097 lrc: 3/0,1 mode: --/EX res: [0x24003e244:0x3bb4:0x0].0x0 bits 0x8/0x0 rrc: 96 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 23963 timeout: 0 lvb_type: 0 Jun 06 20:54:16 fir-md1-s2 kernel: LustreError: 23963:0:(ldlm_request.c:130:ldlm_expired_completion_wait()) Skipped 10 previous similar messages Jun 06 20:54:16 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client b1944413-b28e-4 (at 10.50.1.52@o2ib2) reconnecting Jun 06 20:54:16 fir-md1-s2 kernel: Lustre: Skipped 2 previous similar messages Jun 06 20:54:16 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.50.1.52@o2ib2) Jun 06 20:54:16 fir-md1-s2 kernel: LustreError: 24052:0:(ldlm_request.c:130:ldlm_expired_completion_wait()) Skipped 20 previous similar messages Jun 06 20:59:11 fir-md1-s2 kernel: Lustre: 21158:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8ce6902ccc80 x1659655984583808/t0(0) o101->27170965-5314-4@10.50.1.55@o2ib2:106/0 lens 376/1600 e 24 to 0 dl 1591502356 ref 2 fl Interpret:/0/0 rc 0/0 Jun 06 20:59:11 fir-md1-s2 kernel: Lustre: 21158:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 21 previous similar messages Jun 06 20:59:11 fir-md1-s2 kernel: Lustre: 22723:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8d0012e24380 x1659655984584256/t0(0) o101->27170965-5314-4@10.50.1.55@o2ib2:106/0 lens 376/1600 e 24 to 0 dl 1591502356 ref 2 fl Interpret:/0/0 rc 0/0 Jun 06 20:59:11 fir-md1-s2 kernel: Lustre: 22723:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 6 previous similar messages Jun 06 20:59:17 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 27170965-5314-4 (at 10.50.1.55@o2ib2) reconnecting Jun 06 20:59:17 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.50.1.54@o2ib2) Jun 06 20:59:17 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Jun 06 20:59:17 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 06 20:59:18 fir-md1-s2 kernel: Lustre: 20935:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-2), not sending early reply req@ffff8d246df19b00 x1661063086464128/t0(0) o101->05bde955-e83a-4@10.50.1.51@o2ib2:113/0 lens 376/1600 e 12 to 0 dl 1591502363 ref 2 fl Interpret:/0/0 rc 0/0 Jun 06 20:59:18 fir-md1-s2 kernel: Lustre: 20935:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 11 previous similar messages Jun 06 21:04:32 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client b1944413-b28e-4 (at 10.50.1.52@o2ib2) reconnecting Jun 06 21:04:32 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.50.1.56@o2ib2) Jun 06 21:04:32 fir-md1-s2 kernel: Lustre: Skipped 6 previous similar messages Jun 06 21:04:32 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Jun 06 21:09:18 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.50.1.55@o2ib2) Jun 06 21:09:18 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 06 21:09:40 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 8d039ea4-d34b-4 (at 10.50.1.56@o2ib2) reconnecting Jun 06 21:09:40 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Jun 06 21:14:48 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.50.1.53@o2ib2) Jun 06 21:14:48 fir-md1-s2 kernel: Lustre: Skipped 7 previous similar messages Jun 06 21:19:19 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.50.1.54@o2ib2) Jun 06 21:19:19 fir-md1-s2 kernel: Lustre: Skipped 2 previous similar messages Jun 06 21:19:56 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 71155e77-becc-4 (at 10.50.1.50@o2ib2) reconnecting Jun 06 21:19:56 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages Jun 06 21:25:04 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.50.1.50@o2ib2) Jun 06 21:25:04 fir-md1-s2 kernel: Lustre: Skipped 6 previous similar messages Jun 06 21:25:16 fir-md1-s2 kernel: LNet: Service thread pid 23679 completed after 2160.70s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Jun 06 21:25:16 fir-md1-s2 kernel: LNet: Skipped 14 previous similar messages Jun 06 21:33:45 fir-md1-s2 kernel: LustreError: 23699:0:(ldlm_request.c:130:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1591504125, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0001_UUID lock: ffff8d0251ecca40/0x1587f609781a8d60 lrc: 3/0,1 mode: --/EX res: [0x24003e244:0x3bb7:0x0].0x0 bits 0x8/0x0 rrc: 93 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 23699 timeout: 0 lvb_type: 0 Jun 06 21:33:45 fir-md1-s2 kernel: LustreError: 23699:0:(ldlm_request.c:130:ldlm_expired_completion_wait()) Skipped 32 previous similar messages Jun 06 21:33:45 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 71155e77-becc-4 (at 10.50.1.50@o2ib2) reconnecting Jun 06 21:33:45 fir-md1-s2 kernel: Lustre: Skipped 10 previous similar messages Jun 06 21:41:15 fir-md1-s2 kernel: Lustre: 23606:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-150), not sending early reply req@ffff8d02dfa41200 x1659307581833984/t0(0) o101->15368feb-5478-4@10.50.1.53@o2ib2:365/0 lens 376/1600 e 0 to 0 dl 1591504880 ref 2 fl Interpret:/0/0 rc 0/0 Jun 06 21:41:15 fir-md1-s2 kernel: Lustre: 23606:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 27 previous similar messages Jun 06 21:41:16 fir-md1-s2 kernel: Lustre: 24032:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (4/-151), not sending early reply req@ffff8d1161bf8480 x1659567028878464/t0(0) o101->b1944413-b28e-4@10.50.1.52@o2ib2:365/0 lens 376/1600 e 0 to 0 dl 1591504880 ref 2 fl Interpret:/0/0 rc 0/0 Jun 06 21:41:16 fir-md1-s2 kernel: Lustre: 24032:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 11 previous similar messages Jun 06 21:41:21 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.50.1.53@o2ib2) Jun 06 21:41:21 fir-md1-s2 kernel: Lustre: Skipped 10 previous similar messages Jun 06 21:46:15 fir-md1-s2 kernel: Lustre: 22735:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-450), not sending early reply req@ffff8d0472974800 x1659655987447104/t0(0) o101->27170965-5314-4@10.50.1.55@o2ib2:665/0 lens 376/1600 e 0 to 0 dl 1591505180 ref 2 fl Interpret:/0/0 rc 0/0 Jun 06 21:46:15 fir-md1-s2 kernel: Lustre: 22735:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 14 previous similar messages Jun 06 21:46:21 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 27170965-5314-4 (at 10.50.1.55@o2ib2) reconnecting Jun 06 21:46:21 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Jun 06 21:48:47 fir-md1-s2 kernel: LNet: Service thread pid 20939 was inactive for 1202.15s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jun 06 21:48:47 fir-md1-s2 kernel: Pid: 20939, comm: mdt01_004 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 SMP Thu Nov 7 15:26:16 PST 2019 Jun 06 21:48:47 fir-md1-s2 kernel: Call Trace: Jun 06 21:48:47 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x860 [ptlrpc] Jun 06 21:48:47 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] Jun 06 21:48:47 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Jun 06 21:48:47 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x360 [mdt] Jun 06 21:48:47 fir-md1-s2 kernel: [] mdt_reint_object_lock+0x2c/0x60 [mdt] Jun 06 21:48:47 fir-md1-s2 kernel: [] mdt_layout_change+0x20b/0x480 [mdt] Jun 06 21:48:47 fir-md1-s2 kernel: [] mdt_intent_layout+0x8a0/0xe00 [mdt] Jun 06 21:48:47 fir-md1-s2 kernel: [] mdt_intent_policy+0x435/0xd80 [mdt] Jun 06 21:48:47 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x356/0xa20 [ptlrpc] Jun 06 21:48:47 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa56/0x15f0 [ptlrpc] Jun 06 21:48:47 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Jun 06 21:48:47 fir-md1-s2 kernel: [] tgt_request_handle+0xada/0x1570 [ptlrpc] Jun 06 21:48:47 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jun 06 21:48:47 fir-md1-s2 kernel: [] ptlrpc_main+0xb34/0x1470 [ptlrpc] Jun 06 21:48:47 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Jun 06 21:48:47 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jun 06 21:48:47 fir-md1-s2 kernel: [] 0xffffffffffffffff Jun 06 21:48:47 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1591505327.20939 Jun 06 21:48:47 fir-md1-s2 kernel: Pid: 23699, comm: mdt01_032 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 SMP Thu Nov 7 15:26:16 PST 2019 Jun 06 21:48:47 fir-md1-s2 kernel: Call Trace: Jun 06 21:48:47 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x860 [ptlrpc] Jun 06 21:48:47 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] Jun 06 21:48:47 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Jun 06 21:48:47 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x360 [mdt] Jun 06 21:48:47 fir-md1-s2 kernel: [] mdt_reint_object_lock+0x2c/0x60 [mdt] Jun 06 21:48:47 fir-md1-s2 kernel: [] mdt_layout_change+0x20b/0x480 [mdt] Jun 06 21:48:47 fir-md1-s2 kernel: [] mdt_intent_layout+0x8a0/0xe00 [mdt] Jun 06 21:48:47 fir-md1-s2 kernel: [] mdt_intent_policy+0x435/0xd80 [mdt] Jun 06 21:48:47 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x356/0xa20 [ptlrpc] Jun 06 21:48:47 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa56/0x15f0 [ptlrpc] Jun 06 21:48:47 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Jun 06 21:48:47 fir-md1-s2 kernel: [] tgt_request_handle+0xada/0x1570 [ptlrpc] Jun 06 21:48:47 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jun 06 21:48:47 fir-md1-s2 kernel: [] ptlrpc_main+0xb34/0x1470 [ptlrpc] Jun 06 21:48:47 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Jun 06 21:48:47 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jun 06 21:48:47 fir-md1-s2 kernel: [] 0xffffffffffffffff Jun 06 21:48:47 fir-md1-s2 kernel: Pid: 23702, comm: mdt00_029 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 SMP Thu Nov 7 15:26:16 PST 2019 Jun 06 21:48:47 fir-md1-s2 kernel: Call Trace: Jun 06 21:48:47 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x860 [ptlrpc] Jun 06 21:48:47 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] Jun 06 21:48:47 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Jun 06 21:48:47 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x360 [mdt] Jun 06 21:48:47 fir-md1-s2 kernel: [] mdt_reint_object_lock+0x2c/0x60 [mdt] Jun 06 21:48:47 fir-md1-s2 kernel: [] mdt_layout_change+0x20b/0x480 [mdt] Jun 06 21:48:47 fir-md1-s2 kernel: [] mdt_intent_layout+0x8a0/0xe00 [mdt] Jun 06 21:48:47 fir-md1-s2 kernel: [] mdt_intent_policy+0x435/0xd80 [mdt] Jun 06 21:48:47 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x356/0xa20 [ptlrpc] Jun 06 21:48:47 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa56/0x15f0 [ptlrpc] Jun 06 21:48:47 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Jun 06 21:48:47 fir-md1-s2 kernel: [] tgt_request_handle+0xada/0x1570 [ptlrpc] Jun 06 21:48:47 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jun 06 21:48:47 fir-md1-s2 kernel: [] ptlrpc_main+0xb34/0x1470 [ptlrpc] Jun 06 21:48:47 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Jun 06 21:48:47 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jun 06 21:48:47 fir-md1-s2 kernel: [] 0xffffffffffffffff Jun 06 21:48:47 fir-md1-s2 kernel: Pid: 23760, comm: mdt03_041 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 SMP Thu Nov 7 15:26:16 PST 2019 Jun 06 21:48:47 fir-md1-s2 kernel: Call Trace: Jun 06 21:48:47 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x860 [ptlrpc] Jun 06 21:48:47 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] Jun 06 21:48:47 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Jun 06 21:48:47 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x360 [mdt] Jun 06 21:48:47 fir-md1-s2 kernel: [] mdt_reint_object_lock+0x2c/0x60 [mdt] Jun 06 21:48:47 fir-md1-s2 kernel: [] mdt_layout_change+0x20b/0x480 [mdt] Jun 06 21:48:47 fir-md1-s2 kernel: [] mdt_intent_layout+0x8a0/0xe00 [mdt] Jun 06 21:48:47 fir-md1-s2 kernel: [] mdt_intent_policy+0x435/0xd80 [mdt] Jun 06 21:48:47 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x356/0xa20 [ptlrpc] Jun 06 21:48:47 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa56/0x15f0 [ptlrpc] Jun 06 21:48:47 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Jun 06 21:48:47 fir-md1-s2 kernel: [] tgt_request_handle+0xada/0x1570 [ptlrpc] Jun 06 21:48:47 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jun 06 21:48:47 fir-md1-s2 kernel: [] ptlrpc_main+0xb34/0x1470 [ptlrpc] Jun 06 21:48:47 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Jun 06 21:48:47 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jun 06 21:48:47 fir-md1-s2 kernel: [] 0xffffffffffffffff Jun 06 21:48:47 fir-md1-s2 kernel: LNet: Service thread pid 23719 was inactive for 1202.67s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jun 06 21:48:47 fir-md1-s2 kernel: LNet: Skipped 3 previous similar messages Jun 06 21:48:47 fir-md1-s2 kernel: Pid: 23719, comm: mdt03_033 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 SMP Thu Nov 7 15:26:16 PST 2019 Jun 06 21:48:47 fir-md1-s2 kernel: Call Trace: Jun 06 21:48:47 fir-md1-s2 kernel: [] ldlm_completion_ast+0x4e5/0x860 [ptlrpc] Jun 06 21:48:47 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] Jun 06 21:48:47 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Jun 06 21:48:47 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x360 [mdt] Jun 06 21:48:47 fir-md1-s2 kernel: [] mdt_reint_object_lock+0x2c/0x60 [mdt] Jun 06 21:48:47 fir-md1-s2 kernel: [] mdt_layout_change+0x20b/0x480 [mdt] Jun 06 21:48:47 fir-md1-s2 kernel: [] mdt_intent_layout+0x8a0/0xe00 [mdt] Jun 06 21:48:47 fir-md1-s2 kernel: [] mdt_intent_policy+0x435/0xd80 [mdt] Jun 06 21:48:47 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x356/0xa20 [ptlrpc] Jun 06 21:48:47 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa56/0x15f0 [ptlrpc] Jun 06 21:48:47 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Jun 06 21:48:47 fir-md1-s2 kernel: [] tgt_request_handle+0xada/0x1570 [ptlrpc] Jun 06 21:48:47 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jun 06 21:48:47 fir-md1-s2 kernel: [] ptlrpc_main+0xb34/0x1470 [ptlrpc] Jun 06 21:48:47 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Jun 06 21:48:47 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jun 06 21:48:47 fir-md1-s2 kernel: [] 0xffffffffffffffff Jun 06 21:48:47 fir-md1-s2 kernel: LNet: Service thread pid 23743 was inactive for 1202.82s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Jun 06 21:48:47 fir-md1-s2 kernel: LNet: Skipped 39 previous similar messages Jun 06 21:53:15 fir-md1-s2 kernel: LNet: Service thread pid 23976 completed after 1470.02s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Jun 06 21:53:15 fir-md1-s2 kernel: LNet: Skipped 62 previous similar messages Jun 07 03:16:45 fir-md1-s2 kernel: Lustre: 23909:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1591524998/real 1591524998] req@ffff8d0e7ea15e80 x1668540318099584/t0(0) o104->fir-MDT0001@10.49.27.27@o2ib1:15/16 lens 296/224 e 0 to 1 dl 1591525005 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jun 07 03:16:45 fir-md1-s2 kernel: Lustre: 23909:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 13 previous similar messages Jun 07 03:16:59 fir-md1-s2 kernel: Lustre: 23909:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1591525012/real 1591525012] req@ffff8d0e7ea15e80 x1668540318099584/t0(0) o104->fir-MDT0001@10.49.27.27@o2ib1:15/16 lens 296/224 e 0 to 1 dl 1591525019 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jun 07 03:16:59 fir-md1-s2 kernel: Lustre: 23909:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jun 07 03:17:20 fir-md1-s2 kernel: Lustre: 23909:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1591525033/real 1591525033] req@ffff8d0e7ea15e80 x1668540318099584/t0(0) o104->fir-MDT0001@10.49.27.27@o2ib1:15/16 lens 296/224 e 0 to 1 dl 1591525040 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jun 07 03:17:20 fir-md1-s2 kernel: Lustre: 23909:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Jun 07 03:17:46 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.27.27@o2ib1) Jun 07 03:17:46 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Jun 07 03:17:48 fir-md1-s2 kernel: LustreError: 23909:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) ### client (nid 10.49.27.27@o2ib1) returned error from blocking AST (req@ffff8d0e7ea15e80 x1668540318099584 status -107 rc -107), evict it ns: mdt-fir-MDT0001_UUID lock: ffff8d01f629ee40/0x1587f60a813e710b lrc: 4/0,0 mode: PR/PR res: [0x240056d6f:0x8169:0x0].0x0 bits 0x13/0x0 rrc: 4 type: IBT flags: 0x60200400000020 nid: 10.49.27.27@o2ib1 remote: 0x4d92f61a379a4fc expref: 117567 pid: 23774 timeout: 291071 lvb_type: 0 Jun 07 03:17:48 fir-md1-s2 kernel: LustreError: 138-a: fir-MDT0001: A client on nid 10.49.27.27@o2ib1 was evicted due to a lock blocking callback time out: rc -107 Jun 07 03:17:48 fir-md1-s2 kernel: LustreError: 20543:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 70s: evicting client at 10.49.27.27@o2ib1 ns: mdt-fir-MDT0001_UUID lock: ffff8d01f629ee40/0x1587f60a813e710b lrc: 3/0,0 mode: PR/PR res: [0x240056d6f:0x8169:0x0].0x0 bits 0x13/0x0 rrc: 4 type: IBT flags: 0x60200400000020 nid: 10.49.27.27@o2ib1 remote: 0x4d92f61a379a4fc expref: 117568 pid: 23774 timeout: 0 lvb_type: 0 Jun 07 03:18:12 fir-md1-s2 kernel: LustreError: 23765:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) ### client (nid 10.49.27.27@o2ib1) returned error from blocking AST (req@ffff8cf26f413180 x1668540318516352 status -107 rc -107), evict it ns: mdt-fir-MDT0001_UUID lock: ffff8d0f5d278000/0x1587f60a813e7357 lrc: 4/0,0 mode: PR/PR res: [0x240059ffa:0x1b358:0x0].0x0 bits 0x13/0x0 rrc: 37 type: IBT flags: 0x60200400000020 nid: 10.49.27.27@o2ib1 remote: 0x4d92f61a379a511 expref: 43782 pid: 24023 timeout: 291094 lvb_type: 0 Jun 07 03:18:12 fir-md1-s2 kernel: LustreError: 138-a: fir-MDT0001: A client on nid 10.49.27.27@o2ib1 was evicted due to a lock blocking callback time out: rc -107 Jun 07 03:18:12 fir-md1-s2 kernel: LustreError: 20543:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 0s: evicting client at 10.49.27.27@o2ib1 ns: mdt-fir-MDT0001_UUID lock: ffff8d0f5d278000/0x1587f60a813e7357 lrc: 3/0,0 mode: PR/PR res: [0x240059ffa:0x1b358:0x0].0x0 bits 0x13/0x0 rrc: 35 type: IBT flags: 0x60200400000020 nid: 10.49.27.27@o2ib1 remote: 0x4d92f61a379a511 expref: 43694 pid: 24023 timeout: 0 lvb_type: 0 Jun 07 03:18:24 fir-md1-s2 kernel: LustreError: 23819:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) ### client (nid 10.49.27.27@o2ib1) returned error from blocking AST (req@ffff8cf46f615a00 x1668540319747584 status -107 rc -107), evict it ns: mdt-fir-MDT0001_UUID lock: ffff8d067a7e7740/0x1587f60a81347ad4 lrc: 4/0,0 mode: CR/CR res: [0x24005b2a2:0x70d:0x0].0x0 bits 0x9/0x0 rrc: 153 type: IBT flags: 0x60200400000020 nid: 10.49.27.27@o2ib1 remote: 0x4d92f61a3796657 expref: 27262 pid: 23909 timeout: 291106 lvb_type: 0 Jun 07 03:18:24 fir-md1-s2 kernel: LustreError: 23819:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) Skipped 1 previous similar message Jun 07 03:18:24 fir-md1-s2 kernel: LustreError: 138-a: fir-MDT0001: A client on nid 10.49.27.27@o2ib1 was evicted due to a lock blocking callback time out: rc -107 Jun 07 03:18:24 fir-md1-s2 kernel: LustreError: Skipped 1 previous similar message Jun 07 03:18:24 fir-md1-s2 kernel: LustreError: 20543:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 0s: evicting client at 10.49.27.27@o2ib1 ns: mdt-fir-MDT0001_UUID lock: ffff8d067a7e7740/0x1587f60a81347ad4 lrc: 3/0,0 mode: CR/CR res: [0x24005b2a2:0x70d:0x0].0x0 bits 0x9/0x0 rrc: 153 type: IBT flags: 0x60200400000020 nid: 10.49.27.27@o2ib1 remote: 0x4d92f61a3796657 expref: 27173 pid: 23909 timeout: 0 lvb_type: 0 Jun 07 03:18:24 fir-md1-s2 kernel: LustreError: 20543:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message Jun 07 03:18:26 fir-md1-s2 kernel: LustreError: 23917:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) ### client (nid 10.49.27.27@o2ib1) returned error from blocking AST (req@ffff8d246e3d2d00 x1668540320024448 status -107 rc -107), evict it ns: mdt-fir-MDT0001_UUID lock: ffff8d0710303180/0x1587f60a812f9112 lrc: 4/0,0 mode: CR/CR res: [0x240055a7a:0x15123:0x0].0x0 bits 0x9/0x0 rrc: 157 type: IBT flags: 0x60200400000020 nid: 10.49.27.27@o2ib1 remote: 0x4d92f61a37957a8 expref: 24099 pid: 23793 timeout: 291109 lvb_type: 0 Jun 07 03:18:26 fir-md1-s2 kernel: LustreError: 138-a: fir-MDT0001: A client on nid 10.49.27.27@o2ib1 was evicted due to a lock blocking callback time out: rc -107 Jun 07 03:18:26 fir-md1-s2 kernel: LustreError: 20543:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 0s: evicting client at 10.49.27.27@o2ib1 ns: mdt-fir-MDT0001_UUID lock: ffff8d0710303180/0x1587f60a812f9112 lrc: 3/0,0 mode: CR/CR res: [0x240055a7a:0x15123:0x0].0x0 bits 0x9/0x0 rrc: 156 type: IBT flags: 0x60200400000020 nid: 10.49.27.27@o2ib1 remote: 0x4d92f61a37957a8 expref: 24032 pid: 23793 timeout: 0 lvb_type: 0 Jun 07 03:18:44 fir-md1-s2 kernel: LustreError: 22735:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) ### client (nid 10.49.27.27@o2ib1) returned error from blocking AST (req@ffff8d03e1267500 x1668540320429312 status -107 rc -107), evict it ns: mdt-fir-MDT0001_UUID lock: ffff8d0e17370480/0x1587f60a7d3c83f1 lrc: 4/0,0 mode: CR/CR res: [0x24005a149:0x10488:0x0].0x0 bits 0x9/0x0 rrc: 61 type: IBT flags: 0x60200400000020 nid: 10.49.27.27@o2ib1 remote: 0x4d92f61a375e7b5 expref: 4768 pid: 23744 timeout: 291126 lvb_type: 0 Jun 07 03:18:44 fir-md1-s2 kernel: LustreError: 138-a: fir-MDT0001: A client on nid 10.49.27.27@o2ib1 was evicted due to a lock blocking callback time out: rc -107 Jun 07 03:18:44 fir-md1-s2 kernel: LustreError: 20543:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 0s: evicting client at 10.49.27.27@o2ib1 ns: mdt-fir-MDT0001_UUID lock: ffff8d0e17370480/0x1587f60a7d3c83f1 lrc: 3/0,0 mode: CR/CR res: [0x24005a149:0x10488:0x0].0x0 bits 0x9/0x0 rrc: 60 type: IBT flags: 0x60200400000020 nid: 10.49.27.27@o2ib1 remote: 0x4d92f61a375e7b5 expref: 4712 pid: 23744 timeout: 0 lvb_type: 0 Jun 07 04:35:28 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.21.21@o2ib1) Jun 07 04:36:17 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 24a6eb40-01b1-4 (at 10.49.21.21@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8cf44b9b0400, cur 1591529777 expire 1591529627 last 1591529550 Jun 07 04:45:46 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.21.21@o2ib1) Jun 07 04:46:34 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 0adaeabc-2d30-4 (at 10.49.21.21@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d09353d9400, cur 1591530394 expire 1591530244 last 1591530167 Jun 07 04:54:01 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.21.21@o2ib1) Jun 07 04:54:48 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client ca57e23a-edf3-4 (at 10.49.21.21@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d02b2eb3800, cur 1591530888 expire 1591530738 last 1591530661 Jun 07 10:06:17 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.49.19.2@o2ib1) Jun 07 18:20:49 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 50704ba7-c3d8-4 (at 10.49.25.17@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8d10245cc400, cur 1591579249 expire 1591579099 last 1591579022 Jun 09 05:26:03 fir-md1-s2 kernel: Lustre: 23923:0:(mdd_device.c:1811:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 Jun 09 05:26:03 fir-md1-s2 kernel: Lustre: 23963:0:(mdd_device.c:1811:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 Jun 09 05:26:03 fir-md1-s2 kernel: Lustre: 23963:0:(mdd_device.c:1811:mdd_changelog_clear()) Skipped 592 previous similar messages Jun 09 05:26:04 fir-md1-s2 kernel: Lustre: 23812:0:(mdd_device.c:1811:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 Jun 09 05:26:04 fir-md1-s2 kernel: Lustre: 23812:0:(mdd_device.c:1811:mdd_changelog_clear()) Skipped 1349 previous similar messages Jun 09 05:26:06 fir-md1-s2 kernel: Lustre: 23613:0:(mdd_device.c:1811:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 Jun 09 05:26:06 fir-md1-s2 kernel: Lustre: 23613:0:(mdd_device.c:1811:mdd_changelog_clear()) Skipped 2327 previous similar messages Jun 09 05:26:10 fir-md1-s2 kernel: Lustre: 23690:0:(mdd_device.c:1811:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 Jun 09 05:26:10 fir-md1-s2 kernel: Lustre: 23690:0:(mdd_device.c:1811:mdd_changelog_clear()) Skipped 4304 previous similar messages Jun 09 05:26:18 fir-md1-s2 kernel: Lustre: 23902:0:(mdd_device.c:1811:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 Jun 09 05:26:18 fir-md1-s2 kernel: Lustre: 23902:0:(mdd_device.c:1811:mdd_changelog_clear()) Skipped 3725 previous similar messages Jun 09 05:26:34 fir-md1-s2 kernel: Lustre: 23806:0:(mdd_device.c:1811:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 Jun 09 05:26:34 fir-md1-s2 kernel: Lustre: 23806:0:(mdd_device.c:1811:mdd_changelog_clear()) Skipped 4714 previous similar messages Jun 09 05:27:06 fir-md1-s2 kernel: Lustre: 23995:0:(mdd_device.c:1811:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 Jun 09 05:27:06 fir-md1-s2 kernel: Lustre: 23995:0:(mdd_device.c:1811:mdd_changelog_clear()) Skipped 9926 previous similar messages Jun 09 05:28:10 fir-md1-s2 kernel: Lustre: 23663:0:(mdd_device.c:1811:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 Jun 09 05:28:10 fir-md1-s2 kernel: Lustre: 23663:0:(mdd_device.c:1811:mdd_changelog_clear()) Skipped 20552 previous similar messages Jun 09 07:53:55 fir-md1-s2 kernel: Lustre: 23941:0:(mdd_device.c:1811:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 Jun 09 07:53:55 fir-md1-s2 kernel: Lustre: 23941:0:(mdd_device.c:1811:mdd_changelog_clear()) Skipped 19052 previous similar messages Jun 09 07:54:11 fir-md1-s2 kernel: Lustre: 23941:0:(mdd_device.c:1811:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 Jun 09 07:54:11 fir-md1-s2 kernel: Lustre: 23941:0:(mdd_device.c:1811:mdd_changelog_clear()) Skipped 7819 previous similar messages Jun 09 07:54:43 fir-md1-s2 kernel: Lustre: 23941:0:(mdd_device.c:1811:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 Jun 09 07:54:43 fir-md1-s2 kernel: Lustre: 23941:0:(mdd_device.c:1811:mdd_changelog_clear()) Skipped 10628 previous similar messages Jun 09 07:55:47 fir-md1-s2 kernel: Lustre: 23654:0:(mdd_device.c:1811:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 Jun 09 07:55:47 fir-md1-s2 kernel: Lustre: 23654:0:(mdd_device.c:1811:mdd_changelog_clear()) Skipped 19881 previous similar messages Jun 09 09:24:47 fir-md1-s2 kernel: Lustre: 20552:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1591719880/real 1591719880] req@ffff8cf23cf81200 x1668545386040384/t0(0) o104->fir-MDT0001@10.50.14.3@o2ib2:15/16 lens 296/224 e 0 to 1 dl 1591719887 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Jun 09 09:24:47 fir-md1-s2 kernel: Lustre: 20552:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 4 previous similar messages Jun 09 09:24:54 fir-md1-s2 kernel: Lustre: 20552:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1591719887/real 1591719887] req@ffff8cf23cf81200 x1668545386040384/t0(0) o104->fir-MDT0001@10.50.14.3@o2ib2:15/16 lens 296/224 e 0 to 1 dl 1591719894 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jun 09 09:25:08 fir-md1-s2 kernel: Lustre: 20552:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1591719901/real 1591719901] req@ffff8cf23cf81200 x1668545386040384/t0(0) o104->fir-MDT0001@10.50.14.3@o2ib2:15/16 lens 296/224 e 0 to 1 dl 1591719908 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jun 09 09:25:08 fir-md1-s2 kernel: Lustre: 20552:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 1 previous similar message Jun 09 09:25:29 fir-md1-s2 kernel: Lustre: 20552:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1591719922/real 1591719922] req@ffff8cf23cf81200 x1668545386040384/t0(0) o104->fir-MDT0001@10.50.14.3@o2ib2:15/16 lens 296/224 e 0 to 1 dl 1591719929 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jun 09 09:25:29 fir-md1-s2 kernel: Lustre: 20552:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Jun 09 09:26:02 fir-md1-s2 kernel: Lustre: 22724:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1591719955/real 1591719955] req@ffff8d131022d100 x1668545386397824/t0(0) o104->fir-MDT0001@10.50.14.3@o2ib2:15/16 lens 296/224 e 0 to 1 dl 1591719962 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jun 09 09:26:02 fir-md1-s2 kernel: Lustre: 22724:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 5 previous similar messages Jun 09 09:26:25 fir-md1-s2 kernel: LustreError: 20552:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) ### client (nid 10.50.14.3@o2ib2) failed to reply to blocking AST (req@ffff8cf23cf81200 x1668545386040384 status 0 rc -110), evict it ns: mdt-fir-MDT0001_UUID lock: ffff8cf0552c7bc0/0x1587f610dbfb85ed lrc: 4/0,0 mode: PR/PR res: [0x24005b2d1:0x2168:0x0].0x0 bits 0x13/0x0 rrc: 4 type: IBT flags: 0x60200400000020 nid: 10.50.14.3@o2ib2 remote: 0xe5914c2fae0bcbbe expref: 7342 pid: 23930 timeout: 485981 lvb_type: 0 Jun 09 09:26:25 fir-md1-s2 kernel: LustreError: 138-a: fir-MDT0001: A client on nid 10.50.14.3@o2ib2 was evicted due to a lock blocking callback time out: rc -110 Jun 09 09:26:25 fir-md1-s2 kernel: LustreError: 20543:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 105s: evicting client at 10.50.14.3@o2ib2 ns: mdt-fir-MDT0001_UUID lock: ffff8cf0552c7bc0/0x1587f610dbfb85ed lrc: 3/0,0 mode: PR/PR res: [0x24005b2d1:0x2168:0x0].0x0 bits 0x13/0x0 rrc: 4 type: IBT flags: 0x60200400000020 nid: 10.50.14.3@o2ib2 remote: 0xe5914c2fae0bcbbe expref: 7343 pid: 23930 timeout: 0 lvb_type: 0 Jun 09 09:27:08 fir-md1-s2 kernel: Lustre: 23636:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1591720021/real 1591720021] req@ffff8cfd8df86300 x1668545386580352/t0(0) o104->fir-MDT0001@10.50.14.3@o2ib2:15/16 lens 296/224 e 0 to 1 dl 1591720028 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Jun 09 09:27:08 fir-md1-s2 kernel: Lustre: 23636:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 12 previous similar messages Jun 09 09:28:01 fir-md1-s2 kernel: LNet: Service thread pid 23883 was inactive for 200.15s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jun 09 09:28:01 fir-md1-s2 kernel: Pid: 23883, comm: mdt01_069 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 SMP Thu Nov 7 15:26:16 PST 2019 Jun 09 09:28:01 fir-md1-s2 kernel: Call Trace: Jun 09 09:28:01 fir-md1-s2 kernel: [] ldlm_completion_ast+0x430/0x860 [ptlrpc] Jun 09 09:28:01 fir-md1-s2 kernel: [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] Jun 09 09:28:01 fir-md1-s2 kernel: [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] Jun 09 09:28:01 fir-md1-s2 kernel: [] osp_md_object_lock+0x162/0x2d0 [osp] Jun 09 09:28:01 fir-md1-s2 kernel: [] lod_object_lock+0xf4/0x780 [lod] Jun 09 09:28:01 fir-md1-s2 kernel: [] mdd_object_lock+0x3e/0xe0 [mdd] Jun 09 09:28:01 fir-md1-s2 kernel: [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] Jun 09 09:28:01 fir-md1-s2 kernel: [] mdt_remote_object_lock+0x2a/0x30 [mdt] Jun 09 09:28:01 fir-md1-s2 kernel: [] mdt_rename_lock+0xbe/0x4b0 [mdt] Jun 09 09:28:01 fir-md1-s2 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Jun 09 09:28:01 fir-md1-s2 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Jun 09 09:28:01 fir-md1-s2 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Jun 09 09:28:01 fir-md1-s2 kernel: [] mdt_reint+0x67/0x140 [mdt] Jun 09 09:28:01 fir-md1-s2 kernel: [] tgt_request_handle+0xada/0x1570 [ptlrpc] Jun 09 09:28:01 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jun 09 09:28:01 fir-md1-s2 kernel: [] ptlrpc_main+0xb34/0x1470 [ptlrpc] Jun 09 09:28:01 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Jun 09 09:28:01 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jun 09 09:28:01 fir-md1-s2 kernel: [] 0xffffffffffffffff Jun 09 09:28:01 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1591720081.23883 Jun 09 09:28:01 fir-md1-s2 kernel: Pid: 23880, comm: mdt01_068 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 SMP Thu Nov 7 15:26:16 PST 2019 Jun 09 09:28:01 fir-md1-s2 kernel: Call Trace: Jun 09 09:28:01 fir-md1-s2 kernel: [] ldlm_completion_ast+0x430/0x860 [ptlrpc] Jun 09 09:28:01 fir-md1-s2 kernel: [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] Jun 09 09:28:01 fir-md1-s2 kernel: [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] Jun 09 09:28:01 fir-md1-s2 kernel: [] osp_md_object_lock+0x162/0x2d0 [osp] Jun 09 09:28:01 fir-md1-s2 kernel: [] lod_object_lock+0xf4/0x780 [lod] Jun 09 09:28:01 fir-md1-s2 kernel: [] mdd_object_lock+0x3e/0xe0 [mdd] Jun 09 09:28:01 fir-md1-s2 kernel: [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] Jun 09 09:28:01 fir-md1-s2 kernel: [] mdt_remote_object_lock+0x2a/0x30 [mdt] Jun 09 09:28:01 fir-md1-s2 kernel: [] mdt_rename_lock+0xbe/0x4b0 [mdt] Jun 09 09:28:01 fir-md1-s2 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Jun 09 09:28:01 fir-md1-s2 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Jun 09 09:28:01 fir-md1-s2 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Jun 09 09:28:01 fir-md1-s2 kernel: [] mdt_reint+0x67/0x140 [mdt] Jun 09 09:28:01 fir-md1-s2 kernel: [] tgt_request_handle+0xada/0x1570 [ptlrpc] Jun 09 09:28:01 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jun 09 09:28:01 fir-md1-s2 kernel: [] ptlrpc_main+0xb34/0x1470 [ptlrpc] Jun 09 09:28:01 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Jun 09 09:28:01 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jun 09 09:28:01 fir-md1-s2 kernel: [] 0xffffffffffffffff Jun 09 09:28:01 fir-md1-s2 kernel: Pid: 22732, comm: mdt02_013 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 SMP Thu Nov 7 15:26:16 PST 2019 Jun 09 09:28:01 fir-md1-s2 kernel: Call Trace: Jun 09 09:28:01 fir-md1-s2 kernel: [] ldlm_completion_ast+0x430/0x860 [ptlrpc] Jun 09 09:28:01 fir-md1-s2 kernel: [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] Jun 09 09:28:01 fir-md1-s2 kernel: [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] Jun 09 09:28:01 fir-md1-s2 kernel: [] osp_md_object_lock+0x162/0x2d0 [osp] Jun 09 09:28:01 fir-md1-s2 kernel: [] lod_object_lock+0xf4/0x780 [lod] Jun 09 09:28:01 fir-md1-s2 kernel: [] mdd_object_lock+0x3e/0xe0 [mdd] Jun 09 09:28:01 fir-md1-s2 kernel: [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] Jun 09 09:28:01 fir-md1-s2 kernel: [] mdt_remote_object_lock+0x2a/0x30 [mdt] Jun 09 09:28:01 fir-md1-s2 kernel: [] mdt_rename_lock+0xbe/0x4b0 [mdt] Jun 09 09:28:01 fir-md1-s2 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Jun 09 09:28:01 fir-md1-s2 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Jun 09 09:28:01 fir-md1-s2 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Jun 09 09:28:01 fir-md1-s2 kernel: [] mdt_reint+0x67/0x140 [mdt] Jun 09 09:28:01 fir-md1-s2 kernel: [] tgt_request_handle+0xada/0x1570 [ptlrpc] Jun 09 09:28:01 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jun 09 09:28:01 fir-md1-s2 kernel: [] ptlrpc_main+0xb34/0x1470 [ptlrpc] Jun 09 09:28:01 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Jun 09 09:28:01 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jun 09 09:28:01 fir-md1-s2 kernel: [] 0xffffffffffffffff Jun 09 09:28:01 fir-md1-s2 kernel: Pid: 23606, comm: mdt01_017 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 SMP Thu Nov 7 15:26:16 PST 2019 Jun 09 09:28:01 fir-md1-s2 kernel: Call Trace: Jun 09 09:28:01 fir-md1-s2 kernel: [] ldlm_completion_ast+0x430/0x860 [ptlrpc] Jun 09 09:28:02 fir-md1-s2 kernel: [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] Jun 09 09:28:02 fir-md1-s2 kernel: [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] Jun 09 09:28:02 fir-md1-s2 kernel: [] osp_md_object_lock+0x162/0x2d0 [osp] Jun 09 09:28:02 fir-md1-s2 kernel: [] lod_object_lock+0xf4/0x780 [lod] Jun 09 09:28:02 fir-md1-s2 kernel: [] mdd_object_lock+0x3e/0xe0 [mdd] Jun 09 09:28:02 fir-md1-s2 kernel: [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] Jun 09 09:28:02 fir-md1-s2 kernel: [] mdt_remote_object_lock+0x2a/0x30 [mdt] Jun 09 09:28:02 fir-md1-s2 kernel: [] mdt_rename_lock+0xbe/0x4b0 [mdt] Jun 09 09:28:02 fir-md1-s2 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Jun 09 09:28:02 fir-md1-s2 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Jun 09 09:28:02 fir-md1-s2 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Jun 09 09:28:02 fir-md1-s2 kernel: [] mdt_reint+0x67/0x140 [mdt] Jun 09 09:28:02 fir-md1-s2 kernel: [] tgt_request_handle+0xada/0x1570 [ptlrpc] Jun 09 09:28:02 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jun 09 09:28:02 fir-md1-s2 kernel: [] ptlrpc_main+0xb34/0x1470 [ptlrpc] Jun 09 09:28:02 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Jun 09 09:28:02 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jun 09 09:28:02 fir-md1-s2 kernel: [] 0xffffffffffffffff Jun 09 09:28:02 fir-md1-s2 kernel: LNet: Service thread pid 23953 was inactive for 200.54s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Jun 09 09:28:02 fir-md1-s2 kernel: LNet: Skipped 3 previous similar messages Jun 09 09:28:02 fir-md1-s2 kernel: Pid: 23953, comm: mdt03_082 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 SMP Thu Nov 7 15:26:16 PST 2019 Jun 09 09:28:02 fir-md1-s2 kernel: Call Trace: Jun 09 09:28:02 fir-md1-s2 kernel: [] ldlm_completion_ast+0x430/0x860 [ptlrpc] Jun 09 09:28:02 fir-md1-s2 kernel: [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] Jun 09 09:28:02 fir-md1-s2 kernel: [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] Jun 09 09:28:02 fir-md1-s2 kernel: [] osp_md_object_lock+0x162/0x2d0 [osp] Jun 09 09:28:02 fir-md1-s2 kernel: [] lod_object_lock+0xf4/0x780 [lod] Jun 09 09:28:02 fir-md1-s2 kernel: [] mdd_object_lock+0x3e/0xe0 [mdd] Jun 09 09:28:02 fir-md1-s2 kernel: [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] Jun 09 09:28:02 fir-md1-s2 kernel: [] mdt_remote_object_lock+0x2a/0x30 [mdt] Jun 09 09:28:02 fir-md1-s2 kernel: [] mdt_rename_lock+0xbe/0x4b0 [mdt] Jun 09 09:28:02 fir-md1-s2 kernel: [] mdt_reint_rename+0x2c5/0x2b90 [mdt] Jun 09 09:28:02 fir-md1-s2 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Jun 09 09:28:02 fir-md1-s2 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Jun 09 09:28:02 fir-md1-s2 kernel: [] mdt_reint+0x67/0x140 [mdt] Jun 09 09:28:02 fir-md1-s2 kernel: [] tgt_request_handle+0xada/0x1570 [ptlrpc] Jun 09 09:28:02 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Jun 09 09:28:02 fir-md1-s2 kernel: [] ptlrpc_main+0xb34/0x1470 [ptlrpc] Jun 09 09:28:02 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Jun 09 09:28:02 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Jun 09 09:28:02 fir-md1-s2 kernel: [] 0xffffffffffffffff Jun 09 09:28:02 fir-md1-s2 kernel: LNet: Service thread pid 24004 was inactive for 200.69s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Jun 09 09:28:02 fir-md1-s2 kernel: LNet: Skipped 38 previous similar messages Jun 09 09:28:02 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1591720082.20958 Jun 09 09:28:03 fir-md1-s2 kernel: LNet: Service thread pid 23680 was inactive for 200.45s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Jun 09 09:28:03 fir-md1-s2 kernel: LNet: Skipped 27 previous similar messages Jun 09 09:28:03 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1591720083.23680 Jun 09 09:28:04 fir-md1-s2 kernel: LNet: Service thread pid 23898 was inactive for 200.37s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Jun 09 09:28:04 fir-md1-s2 kernel: LNet: Skipped 8 previous similar messages Jun 09 09:28:04 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1591720084.23898 Jun 09 09:28:05 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1591720085.23693 Jun 09 09:28:06 fir-md1-s2 kernel: LNet: Service thread pid 22722 was inactive for 200.36s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Jun 09 09:28:06 fir-md1-s2 kernel: LNet: Skipped 19 previous similar messages Jun 09 09:28:06 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1591720086.22722 Jun 09 09:28:07 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1591720087.23955 Jun 09 09:28:08 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1591720088.23866 Jun 09 09:28:09 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1591720089.22735 Jun 09 09:28:10 fir-md1-s2 kernel: LNet: Service thread pid 23916 was inactive for 200.37s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Jun 09 09:28:10 fir-md1-s2 kernel: LNet: Skipped 44 previous similar messages Jun 09 09:28:10 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1591720090.23916 Jun 09 09:28:11 fir-md1-s2 kernel: LNet: Service thread pid 23636 completed after 210.08s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Jun 09 09:28:11 fir-md1-s2 kernel: LNet: Skipped 21 previous similar messages Jun 09 09:28:35 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.50.14.3@o2ib2) Jun 09 09:31:35 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.50.14.3@o2ib2) Jun 09 09:32:22 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 8779fdcd-13cc-4 (at 10.50.14.3@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8ce6f637ac00, cur 1591720342 expire 1591720192 last 1591720115