Apr 30 08:30:10 fir-md1-s2 kernel: LNet: HW NUMA nodes: 4, HW CPU cores: 48, npartitions: 4 Apr 30 08:30:10 fir-md1-s2 kernel: alg: No test for adler32 (adler32-zlib) Apr 30 08:30:11 fir-md1-s2 kernel: Lustre: Lustre: Build Version: 2.12.0.pl9 Apr 30 08:30:11 fir-md1-s2 kernel: LNet: Using FastReg for registration Apr 30 08:30:11 fir-md1-s2 kernel: LNetError: 7276:0:(o2iblnd_cb.c:2469:kiblnd_passive_connect()) Can't accept conn from 10.0.10.102@o2ib7 on NA (ib0:0:10.0.10.52): bad dst nid 10.0.10.52@o2ib7 Apr 30 08:30:11 fir-md1-s2 kernel: LNet: Added LNI 10.0.10.52@o2ib7 [8/256/0/180] Apr 30 08:31:14 fir-md1-s2 kernel: LDISKFS-fs (dm-1): file extents enabled Apr 30 08:31:14 fir-md1-s2 kernel: LDISKFS-fs (dm-3): file extents enabled Apr 30 08:31:14 fir-md1-s2 kernel: , maximum tree depth=5 Apr 30 08:31:14 fir-md1-s2 kernel: , maximum tree depth=5 Apr 30 08:31:14 fir-md1-s2 kernel: LDISKFS-fs (dm-1): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,acl,no_mbcache,nodelalloc Apr 30 08:31:14 fir-md1-s2 kernel: LDISKFS-fs (dm-3): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,acl,no_mbcache,nodelalloc Apr 30 08:31:14 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.23.31@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 30 08:31:14 fir-md1-s2 kernel: Lustre: fir-MDT0003: Not available for connect from 10.9.107.67@o2ib4 (not set up) Apr 30 08:31:15 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.9.107.26@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 30 08:31:15 fir-md1-s2 kernel: LustreError: Skipped 19 previous similar messages Apr 30 08:31:15 fir-md1-s2 kernel: Lustre: fir-MDT0003: Imperative Recovery not enabled, recovery window 300-900 Apr 30 08:31:15 fir-md1-s2 kernel: Lustre: fir-MDD0003: changelog on Apr 30 08:31:15 fir-md1-s2 kernel: Lustre: fir-MDT0003: in recovery but waiting for the first client to connect Apr 30 08:31:15 fir-md1-s2 kernel: Lustre: fir-MDT0003: Will be in recovery for at least 5:00, or until 1334 clients reconnect Apr 30 08:31:15 fir-md1-s2 kernel: Lustre: fir-MDT0001: Not available for connect from 10.9.108.32@o2ib4 (not set up) Apr 30 08:31:15 fir-md1-s2 kernel: Lustre: Skipped 10 previous similar messages Apr 30 08:31:16 fir-md1-s2 kernel: LustreError: 11-0: fir-MDT0003-osp-MDT0001: operation mds_connect to node 0@lo failed: rc = -114 Apr 30 08:31:16 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to 10.0.10.108@o2ib7 (at 10.0.10.108@o2ib7) Apr 30 08:31:16 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Apr 30 08:31:16 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.8.16.3@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 30 08:31:16 fir-md1-s2 kernel: LustreError: Skipped 20 previous similar messages Apr 30 08:31:17 fir-md1-s2 kernel: Lustre: fir-MDT0001: Imperative Recovery not enabled, recovery window 300-900 Apr 30 08:31:17 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to 07289107-15cf-b70a-a8d8-67d0d32bbec1 (at 10.9.108.29@o2ib4) Apr 30 08:31:17 fir-md1-s2 kernel: Lustre: Skipped 24 previous similar messages Apr 30 08:31:17 fir-md1-s2 kernel: Lustre: fir-MDD0001: changelog on Apr 30 08:31:17 fir-md1-s2 kernel: Lustre: fir-MDT0001: in recovery but waiting for the first client to connect Apr 30 08:31:17 fir-md1-s2 kernel: Lustre: fir-MDT0001: Will be in recovery for at least 5:00, or until 1334 clients reconnect Apr 30 08:31:18 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.8.25.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 30 08:31:18 fir-md1-s2 kernel: LustreError: Skipped 165 previous similar messages Apr 30 08:31:18 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to (at 10.8.1.12@o2ib6) Apr 30 08:31:18 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Apr 30 08:31:20 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.109.1@o2ib4) Apr 30 08:31:20 fir-md1-s2 kernel: Lustre: Skipped 6 previous similar messages Apr 30 08:31:20 fir-md1-s2 kernel: Lustre: fir-MDT0003: Denying connection for new client e6faa00b-070f-4d22-51ac-e59042b5a00c(at 10.8.12.33@o2ib6), waiting for 1334 known clients (29 recovered, 3 in progress, and 0 evicted) already passed deadline 0:04 Apr 30 08:31:20 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 30 08:31:22 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.106.25@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 30 08:31:22 fir-md1-s2 kernel: LustreError: Skipped 366 previous similar messages Apr 30 08:31:24 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to 55b451b3-fa82-3731-68a1-db9159c37dee (at 10.9.101.12@o2ib4) Apr 30 08:31:24 fir-md1-s2 kernel: Lustre: Skipped 83 previous similar messages Apr 30 08:31:30 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.8.27.34@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 30 08:31:30 fir-md1-s2 kernel: LustreError: Skipped 724 previous similar messages Apr 30 08:31:32 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to (at 10.9.101.18@o2ib4) Apr 30 08:31:32 fir-md1-s2 kernel: Lustre: Skipped 109 previous similar messages Apr 30 08:31:50 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 10.0.10.106@o2ib7 (at 10.0.10.106@o2ib7) Apr 30 08:31:50 fir-md1-s2 kernel: Lustre: Skipped 2441 previous similar messages Apr 30 08:31:52 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.0.10.51@o2ib7 (no target). If you are running an HA pair check that the target is mounted on the other server. Apr 30 08:31:52 fir-md1-s2 kernel: LustreError: Skipped 1222 previous similar messages Apr 30 08:32:04 fir-md1-s2 kernel: Lustre: fir-MDT0001: Denying connection for new client e6faa00b-070f-4d22-51ac-e59042b5a00c(at 10.8.12.33@o2ib6), waiting for 1334 known clients (1239 recovered, 92 in progress, and 0 evicted) already passed deadline 0:45 Apr 30 08:32:04 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 30 08:32:17 fir-md1-s2 kernel: Lustre: fir-MDT0003: Recovery already passed deadline 1:01, It is most likely due to DNE recovery is failed or stuck, please wait a few more minutes or abort the recovery. Apr 30 08:32:17 fir-md1-s2 kernel: Lustre: Skipped 6 previous similar messages Apr 30 08:32:17 fir-md1-s2 kernel: Lustre: fir-MDT0003: Recovery over after 1:02, of 1334 clients 1334 recovered and 0 were evicted. Apr 30 08:32:18 fir-md1-s2 kernel: LustreError: 122251:0:(mdt_io.c:470:mdt_preprw_write()) fir-MDT0003: WRITE IO to missing obj [0x280025da3:0xbd96:0x0]: rc = -2 Apr 30 08:32:29 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.12.33@o2ib6) Apr 30 08:32:29 fir-md1-s2 kernel: Lustre: Skipped 120 previous similar messages Apr 30 08:36:19 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client e6faa00b-070f-4d22-51ac-e59042b5a00c (at 10.8.12.33@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff932e25f4b400, cur 1556638579 expire 1556638429 last 1556638352 Apr 30 08:37:25 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.12.33@o2ib6) Apr 30 08:37:25 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 30 08:41:14 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client d443b2d2-ef37-1815-642c-90bcf5846a13 (at 10.8.12.33@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9307a1e6c800, cur 1556638874 expire 1556638724 last 1556638647 Apr 30 08:41:14 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 30 08:42:19 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.12.33@o2ib6) Apr 30 08:46:09 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 16340969-a68c-2c9c-b520-102c6eb3e402 (at 10.8.12.33@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931aad605000, cur 1556639169 expire 1556639019 last 1556638942 Apr 30 08:46:09 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 30 08:47:15 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.12.33@o2ib6) Apr 30 08:47:15 fir-md1-s2 kernel: Lustre: Skipped 2 previous similar messages Apr 30 08:51:05 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 766250dd-58a6-e61d-5dc0-8bd30c0c1542 (at 10.8.12.33@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff932076df3800, cur 1556639465 expire 1556639315 last 1556639238 Apr 30 08:51:05 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 30 08:54:52 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 8a117c12-3498-0944-3fba-59968f43547b (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff932077d9d000, cur 1556639692 expire 1556639542 last 1556639465 Apr 30 08:54:52 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 30 08:57:02 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.12.33@o2ib6) Apr 30 08:57:02 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages Apr 30 08:58:11 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client e8c62915-6c71-4c14-7626-f3601bfc0f0f (at 10.8.1.3@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9315d27a5800, cur 1556639891 expire 1556639741 last 1556639664 Apr 30 08:58:11 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages Apr 30 08:58:29 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client e8c62915-6c71-4c14-7626-f3601bfc0f0f (at 10.8.1.3@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9326b8f10c00, cur 1556639909 expire 1556639759 last 1556639682 Apr 30 08:59:45 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 284ea256-9b3e-9c39-95b0-12b15648c3ad (at 10.8.12.33@o2ib6) in 160 seconds. I think it's dead, and I am evicting it. exp ffff93101fe10800, cur 1556639985 expire 1556639835 last 1556639825 Apr 30 09:00:52 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 284ea256-9b3e-9c39-95b0-12b15648c3ad (at 10.8.12.33@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931676b64c00, cur 1556640052 expire 1556639902 last 1556639825 Apr 30 09:05:50 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client d8a8dacc-af5a-8d4e-b1a7-de9e8287dab6 (at 10.8.12.33@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930bffb10c00, cur 1556640350 expire 1556640200 last 1556640123 Apr 30 09:05:50 fir-md1-s2 kernel: Lustre: Skipped 6 previous similar messages Apr 30 09:27:11 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.12.33@o2ib6) Apr 30 09:27:11 fir-md1-s2 kernel: Lustre: Skipped 11 previous similar messages Apr 30 09:28:51 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client b0015eb5-6efa-a3bc-bfd9-109e877d2725 (at 10.8.11.7@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9326b7f56400, cur 1556641731 expire 1556641581 last 1556641504 Apr 30 09:28:51 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 30 09:29:05 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.1.3@o2ib6) Apr 30 09:29:05 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 30 09:30:07 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 3eee9c96-c1b6-76d2-95c6-7eafc5882ffc (at 10.8.12.33@o2ib6) in 174 seconds. I think it's dead, and I am evicting it. exp ffff930803aa1000, cur 1556641807 expire 1556641657 last 1556641633 Apr 30 09:30:07 fir-md1-s2 kernel: Lustre: Skipped 7 previous similar messages Apr 30 09:34:43 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 276ceab7-f552-06e9-23f9-3786e759e6d7 (at 10.8.12.33@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9319a9a16c00, cur 1556642083 expire 1556641933 last 1556641856 Apr 30 09:34:43 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 30 09:54:07 fir-md1-s2 kernel: LNetError: 121175:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Apr 30 10:02:53 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client e80f6b46-7bcd-30a8-8491-3102d8ee0aa0 (at 10.8.25.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930731318000, cur 1556643773 expire 1556643623 last 1556643546 Apr 30 10:02:53 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 30 10:03:00 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.25.9@o2ib6) Apr 30 10:03:00 fir-md1-s2 kernel: Lustre: Skipped 11 previous similar messages Apr 30 11:40:42 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 6f8d180c-697b-87fe-2c39-64e5c1d542ef (at 10.8.26.33@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93107b2c5400, cur 1556649642 expire 1556649492 last 1556649415 Apr 30 11:40:42 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 30 11:41:18 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.26.33@o2ib6) Apr 30 11:41:18 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 30 11:52:43 fir-md1-s2 kernel: LNetError: 121173:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Apr 30 11:54:44 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client e4faccdb-f303-9bdd-51a6-ad7a646ae559 (at 10.8.26.33@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93280ba61000, cur 1556650484 expire 1556650334 last 1556650257 Apr 30 11:54:44 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 30 11:55:36 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.26.33@o2ib6) Apr 30 11:55:36 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 30 12:09:42 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 4b625669-b570-bf89-cdc9-b22aede67358 (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93405a4ef800, cur 1556651382 expire 1556651232 last 1556651155 Apr 30 12:09:42 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 30 12:12:05 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.10.29@o2ib6) Apr 30 12:12:05 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 30 12:20:15 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 327c28a1-fe51-b704-2bde-95368e501f01 (at 10.8.1.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9326b7f51800, cur 1556652015 expire 1556651865 last 1556651788 Apr 30 12:20:15 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 30 12:22:08 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.1.29@o2ib6) Apr 30 12:22:08 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 30 12:24:16 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 459dc0b2-5f7f-24eb-f6a1-6e1030b48b5c (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930f0bf69000, cur 1556652256 expire 1556652106 last 1556652029 Apr 30 12:24:16 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 30 12:28:16 fir-md1-s2 kernel: LNetError: 121174:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Apr 30 12:28:23 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client fdc52cb1-2cc9-0d98-0f2b-dc082fc53acc (at 10.8.8.37@o2ib6) reconnecting Apr 30 12:28:23 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to (at 10.8.8.37@o2ib6) Apr 30 12:28:25 fir-md1-s2 kernel: LNetError: 121174:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Apr 30 12:28:32 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client fdc52cb1-2cc9-0d98-0f2b-dc082fc53acc (at 10.8.8.37@o2ib6) reconnecting Apr 30 12:28:32 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to (at 10.8.8.37@o2ib6) Apr 30 12:32:36 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.10.29@o2ib6) Apr 30 12:32:36 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 30 12:41:02 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client acdb8e1f-3ab2-f130-36a6-60883f4fd9c7 (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931b1cef1800, cur 1556653262 expire 1556653112 last 1556653035 Apr 30 12:41:02 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 30 12:42:03 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.10.29@o2ib6) Apr 30 12:42:03 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 30 13:03:48 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client f4319f02-25fa-20b6-a648-2516dd1744d4 (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9327f4bc7800, cur 1556654628 expire 1556654478 last 1556654401 Apr 30 13:03:48 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 30 13:04:17 fir-md1-s2 kernel: LNetError: 121174:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Apr 30 13:04:19 fir-md1-s2 kernel: LNetError: 121173:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Apr 30 13:04:38 fir-md1-s2 kernel: LNetError: 121178:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Apr 30 13:06:29 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.10.29@o2ib6) Apr 30 13:06:29 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 30 13:12:24 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 2df8c138-8c23-ea09-17c8-c9239f9279b9 (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931954e3d000, cur 1556655144 expire 1556654994 last 1556654917 Apr 30 13:12:24 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 30 13:14:07 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.13.24@o2ib6) Apr 30 13:14:07 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 30 13:14:59 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 919649d8-704c-889b-d1dd-a296af8855ee (at 10.8.13.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93278ceba000, cur 1556655299 expire 1556655149 last 1556655072 Apr 30 13:14:59 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 30 13:16:27 fir-md1-s2 kernel: LNetError: 121172:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Apr 30 13:18:23 fir-md1-s2 kernel: LNetError: 121173:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Apr 30 13:22:15 fir-md1-s2 kernel: LNetError: 121180:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Apr 30 13:37:33 fir-md1-s2 kernel: LNetError: 121180:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Apr 30 13:37:58 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.10.29@o2ib6) Apr 30 13:37:58 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 30 13:42:59 fir-md1-s2 kernel: LNetError: 121179:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Apr 30 13:48:00 fir-md1-s2 kernel: LNetError: 121179:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Apr 30 13:51:25 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client d9eb9aeb-03ec-18ce-b78d-5769086dc54d (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931b18656400, cur 1556657485 expire 1556657335 last 1556657258 Apr 30 13:51:25 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 30 13:54:04 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.10.29@o2ib6) Apr 30 13:54:04 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 30 14:02:43 fir-md1-s2 kernel: LNetError: 121173:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Apr 30 14:02:54 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 4115a52d-9eff-7ac8-6fc7-05e10e61ece9 (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff932c8124a000, cur 1556658174 expire 1556658024 last 1556657947 Apr 30 14:02:54 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 30 14:06:48 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.10.29@o2ib6) Apr 30 14:06:48 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 30 14:10:28 fir-md1-s2 kernel: LNetError: 121178:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Apr 30 14:16:53 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client cb2147ca-63ad-9be6-4549-5a3714e7a68f (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931d3b7da000, cur 1556659013 expire 1556658863 last 1556658786 Apr 30 14:16:53 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 30 14:21:22 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.10.29@o2ib6) Apr 30 14:21:22 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 30 14:30:39 fir-md1-s2 kernel: LNetError: 121176:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Apr 30 14:32:37 fir-md1-s2 kernel: LNetError: 121172:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Apr 30 14:32:43 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 49903e14-e267-be84-fd5c-bda9815f9fe4 (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931f463eb800, cur 1556659963 expire 1556659813 last 1556659736 Apr 30 14:32:43 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 30 14:32:44 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client c4a32940-a9be-512a-496b-f65411562f7a (at 10.9.106.43@o2ib4) reconnecting Apr 30 14:32:44 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to (at 10.9.106.43@o2ib4) Apr 30 14:35:12 fir-md1-s2 kernel: LNetError: 121176:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Apr 30 14:36:07 fir-md1-s2 kernel: LNetError: 121178:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Apr 30 14:36:40 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.10.29@o2ib6) Apr 30 14:36:40 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 30 14:44:34 fir-md1-s2 kernel: LNetError: 121180:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Apr 30 14:46:13 fir-md1-s2 kernel: LNetError: 121179:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Apr 30 14:47:53 fir-md1-s2 kernel: LNetError: 121176:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Apr 30 14:48:54 fir-md1-s2 kernel: LNetError: 121169:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Apr 30 14:48:54 fir-md1-s2 kernel: LNetError: 121169:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1 previous similar message Apr 30 14:50:26 fir-md1-s2 kernel: LNetError: 121179:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Apr 30 14:50:26 fir-md1-s2 kernel: LNetError: 121179:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1 previous similar message Apr 30 14:55:23 fir-md1-s2 kernel: LNetError: 121183:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Apr 30 14:55:23 fir-md1-s2 kernel: LNetError: 121183:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 2 previous similar messages Apr 30 14:55:30 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client aecec54c-c922-2a32-e90d-f9ec22e83d8e (at 10.9.102.28@o2ib4) reconnecting Apr 30 14:55:30 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to (at 10.9.102.28@o2ib4) Apr 30 14:58:11 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 5102b83b-e407-f2c8-158f-7c896c03ad6a (at 10.9.108.66@o2ib4) reconnecting Apr 30 14:58:11 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to (at 10.9.108.66@o2ib4) Apr 30 14:58:33 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client ada1a0b6-ae0b-a8c8-e0dd-bdb0cfd4f651 (at 10.9.108.63@o2ib4) reconnecting Apr 30 14:58:33 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to (at 10.9.108.63@o2ib4) Apr 30 15:02:16 fir-md1-s2 kernel: LNetError: 121181:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Apr 30 15:02:16 fir-md1-s2 kernel: LNetError: 121181:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 5 previous similar messages Apr 30 15:02:23 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 8cfb5fee-049f-f752-8467-9eee2daa3ede (at 10.9.108.27@o2ib4) reconnecting Apr 30 15:02:23 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 30 15:02:23 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to (at 10.9.108.27@o2ib4) Apr 30 15:02:23 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 30 15:02:47 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client cdb4c7a0-6b16-edbd-92e2-6ff9e6ba9d7d (at 10.9.107.61@o2ib4) reconnecting Apr 30 15:02:47 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to (at 10.9.107.61@o2ib4) Apr 30 15:10:04 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 75a243e1-9f6a-0fab-ec0e-ce32dad51415 (at 10.9.106.71@o2ib4) reconnecting Apr 30 15:10:04 fir-md1-s2 kernel: Lustre: Skipped 6 previous similar messages Apr 30 15:10:04 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to (at 10.9.106.71@o2ib4) Apr 30 15:10:04 fir-md1-s2 kernel: Lustre: Skipped 6 previous similar messages Apr 30 15:11:18 fir-md1-s2 kernel: LNetError: 121180:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Apr 30 15:11:18 fir-md1-s2 kernel: LNetError: 121180:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 13 previous similar messages Apr 30 15:11:49 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 8b5d8390-159f-2695-61bf-234d327b2214 (at 10.9.115.10@o2ib4) reconnecting Apr 30 15:11:49 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 30 15:11:49 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to (at 10.9.115.10@o2ib4) Apr 30 15:11:49 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 30 15:12:15 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client ec01363a-b910-254e-075d-e7f3e6df1606 (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931f3c77bc00, cur 1556662335 expire 1556662185 last 1556662108 Apr 30 15:12:15 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 30 15:15:18 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client b03523f3-7393-4682-6529-e841828fdc86 (at 10.9.103.39@o2ib4) reconnecting Apr 30 15:15:18 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 30 15:15:18 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to (at 10.9.103.39@o2ib4) Apr 30 15:15:18 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 30 15:17:39 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.10.29@o2ib6) Apr 30 15:17:39 fir-md1-s2 kernel: Lustre: Skipped 6 previous similar messages Apr 30 15:20:13 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 0f8f808f-b03b-81e6-e30e-46ff547f2e45 (at 10.9.113.3@o2ib4) reconnecting Apr 30 15:20:13 fir-md1-s2 kernel: Lustre: Skipped 6 previous similar messages Apr 30 15:20:13 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to (at 10.9.113.3@o2ib4) Apr 30 15:20:13 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 30 15:22:06 fir-md1-s2 kernel: LNetError: 121173:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Apr 30 15:22:06 fir-md1-s2 kernel: LNetError: 121173:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 16 previous similar messages Apr 30 15:22:21 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 8fbce908-69f3-9567-86d0-3b49733351a8 (at 10.9.104.50@o2ib4) reconnecting Apr 30 15:22:21 fir-md1-s2 kernel: Lustre: Skipped 7 previous similar messages Apr 30 15:31:13 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 157601d4-8202-37f9-9e9d-20f9d37b0eae (at 10.9.104.42@o2ib4) reconnecting Apr 30 15:31:13 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages Apr 30 15:31:13 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.104.42@o2ib4) Apr 30 15:31:13 fir-md1-s2 kernel: Lustre: Skipped 13 previous similar messages Apr 30 15:33:06 fir-md1-s2 kernel: LNetError: 121174:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Apr 30 15:33:06 fir-md1-s2 kernel: LNetError: 121174:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 11 previous similar messages Apr 30 15:39:03 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client a078cd0f-7e7e-03be-ddc4-775ce28fae96 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9310183b0400, cur 1556663943 expire 1556663793 last 1556663716 Apr 30 15:39:03 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 30 15:40:27 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 315cf750-5ce7-61a0-093d-91bfc52b74be (at 10.8.17.10@o2ib6) reconnecting Apr 30 15:40:27 fir-md1-s2 kernel: Lustre: Skipped 10 previous similar messages Apr 30 15:40:27 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to (at 10.8.17.10@o2ib6) Apr 30 15:40:27 fir-md1-s2 kernel: Lustre: Skipped 12 previous similar messages Apr 30 15:44:55 fir-md1-s2 kernel: LNetError: 121179:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Apr 30 15:44:55 fir-md1-s2 kernel: LNetError: 121179:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 27 previous similar messages Apr 30 15:53:48 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 0cb811d4-9df8-f3cc-1525-eefdbc079d76 (at 10.9.105.67@o2ib4) reconnecting Apr 30 15:53:48 fir-md1-s2 kernel: Lustre: Skipped 17 previous similar messages Apr 30 15:53:48 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to (at 10.9.105.67@o2ib4) Apr 30 15:53:48 fir-md1-s2 kernel: Lustre: Skipped 17 previous similar messages Apr 30 15:55:09 fir-md1-s2 kernel: LNetError: 121171:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Apr 30 15:55:09 fir-md1-s2 kernel: LNetError: 121171:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 4 previous similar messages Apr 30 16:03:54 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 446d2031-5a06-05f4-6e28-6d5149ce1c88 (at 10.9.104.39@o2ib4) reconnecting Apr 30 16:03:54 fir-md1-s2 kernel: Lustre: Skipped 11 previous similar messages Apr 30 16:03:54 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.104.39@o2ib4) Apr 30 16:03:54 fir-md1-s2 kernel: Lustre: Skipped 11 previous similar messages Apr 30 16:05:16 fir-md1-s2 kernel: LNetError: 121170:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Apr 30 16:05:16 fir-md1-s2 kernel: LNetError: 121170:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 24 previous similar messages Apr 30 16:14:28 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client c9ce12e9-3cda-482a-6a30-1bff01061762 (at 10.8.8.36@o2ib6) reconnecting Apr 30 16:14:28 fir-md1-s2 kernel: Lustre: Skipped 46 previous similar messages Apr 30 16:14:28 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to (at 10.8.8.36@o2ib6) Apr 30 16:14:28 fir-md1-s2 kernel: Lustre: Skipped 46 previous similar messages Apr 30 16:15:22 fir-md1-s2 kernel: LNetError: 121184:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Apr 30 16:15:22 fir-md1-s2 kernel: LNetError: 121184:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 44 previous similar messages Apr 30 16:24:36 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client d290c414-d21b-2229-b831-d35e3a056cfa (at 10.8.1.12@o2ib6) reconnecting Apr 30 16:24:36 fir-md1-s2 kernel: Lustre: Skipped 17 previous similar messages Apr 30 16:24:36 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to (at 10.8.1.12@o2ib6) Apr 30 16:24:36 fir-md1-s2 kernel: Lustre: Skipped 17 previous similar messages Apr 30 16:25:42 fir-md1-s2 kernel: LNetError: 121177:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Apr 30 16:25:42 fir-md1-s2 kernel: LNetError: 121177:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 20 previous similar messages Apr 30 16:34:43 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client a327e393-246e-f0b0-a4c7-257350ff9a2e (at 10.9.112.13@o2ib4) reconnecting Apr 30 16:34:43 fir-md1-s2 kernel: Lustre: Skipped 23 previous similar messages Apr 30 16:34:43 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to (at 10.9.112.13@o2ib4) Apr 30 16:34:43 fir-md1-s2 kernel: Lustre: Skipped 23 previous similar messages Apr 30 16:35:44 fir-md1-s2 kernel: LNetError: 121170:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Apr 30 16:35:44 fir-md1-s2 kernel: LNetError: 121170:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 22 previous similar messages Apr 30 16:40:23 fir-md1-s2 kernel: LustreError: 122646:0:(tgt_handler.c:644:process_req_last_xid()) @@@ Unexpected xid 5cbfbaa4141e0 vs. last_xid 5cbfbaa414c2f req@ffff9308156c2100 x1631656637186528/t0(0) o101->2caa6d04-c2ab-ba8b-1481-f65514e320bd@10.8.17.12@o2ib6:23/0 lens 600/0 e 0 to 0 dl 1556667653 ref 1 fl Interpret:/2/ffffffff rc 0/-1 Apr 30 16:42:51 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 1d5157f9-7efa-61ef-c6f9-b1db29ae7243 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931b8d787800, cur 1556667771 expire 1556667621 last 1556667544 Apr 30 16:42:51 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 30 16:45:46 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client d290c414-d21b-2229-b831-d35e3a056cfa (at 10.8.1.12@o2ib6) reconnecting Apr 30 16:45:46 fir-md1-s2 kernel: Lustre: Skipped 27 previous similar messages Apr 30 16:45:46 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to (at 10.8.1.12@o2ib6) Apr 30 16:45:46 fir-md1-s2 kernel: Lustre: Skipped 29 previous similar messages Apr 30 16:47:03 fir-md1-s2 kernel: LNetError: 121172:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Apr 30 16:47:03 fir-md1-s2 kernel: LNetError: 121172:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 30 previous similar messages Apr 30 17:00:41 fir-md1-s2 kernel: LNetError: 121175:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Apr 30 17:00:41 fir-md1-s2 kernel: LNetError: 121175:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 11 previous similar messages Apr 30 17:02:55 fir-md1-s2 kernel: Lustre: 122340:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556668968/real 1556668968] req@ffff9327c8f26c00 x1632253735386272/t0(0) o104->fir-MDT0003@10.8.10.20@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556668975 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Apr 30 17:03:02 fir-md1-s2 kernel: Lustre: 122340:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556668975/real 1556668975] req@ffff9327c8f26c00 x1632253735386272/t0(0) o104->fir-MDT0003@10.8.10.20@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556668982 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Apr 30 17:03:05 fir-md1-s2 kernel: Lustre: 122220:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff931ed7be1e00 x1632180367312192/t0(0) o101->a15481a3-7f9b-2fb5-a19f-95b625f6846e@10.8.1.2@o2ib6:10/0 lens 576/3264 e 1 to 0 dl 1556668990 ref 2 fl Interpret:/0/0 rc 0/0 Apr 30 17:03:07 fir-md1-s2 kernel: Lustre: 122054:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff931707f18c00 x1631642457508976/t0(0) o101->b4dc4310-abd3-57a8-960f-a27b33e667d3@10.8.27.7@o2ib6:12/0 lens 576/3264 e 1 to 0 dl 1556668992 ref 2 fl Interpret:/0/0 rc 0/0 Apr 30 17:03:09 fir-md1-s2 kernel: Lustre: 122340:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556668982/real 1556668982] req@ffff9327c8f26c00 x1632253735386272/t0(0) o104->fir-MDT0003@10.8.10.20@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556668989 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Apr 30 17:03:09 fir-md1-s2 kernel: Lustre: 122155:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff931e4c364800 x1631712510072528/t0(0) o101->05c8b6b2-04ac-c002-5530-092914937d78@10.8.1.25@o2ib6:14/0 lens 576/3264 e 1 to 0 dl 1556668994 ref 2 fl Interpret:/0/0 rc 0/0 Apr 30 17:03:11 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client a15481a3-7f9b-2fb5-a19f-95b625f6846e (at 10.8.1.2@o2ib6) reconnecting Apr 30 17:03:11 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Apr 30 17:03:11 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to (at 10.8.1.2@o2ib6) Apr 30 17:03:11 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Apr 30 17:03:12 fir-md1-s2 kernel: Lustre: 122166:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff931f42b00f00 x1631712348204928/t0(0) o101->25e6a98d-1523-4b7c-d720-65145c7958fc@10.8.11.10@o2ib6:17/0 lens 632/3264 e 1 to 0 dl 1556668997 ref 2 fl Interpret:/0/0 rc 0/0 Apr 30 17:03:12 fir-md1-s2 kernel: Lustre: 122166:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Apr 30 17:03:16 fir-md1-s2 kernel: Lustre: 122340:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556668989/real 1556668989] req@ffff9327c8f26c00 x1632253735386272/t0(0) o104->fir-MDT0003@10.8.10.20@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556668996 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Apr 30 17:03:17 fir-md1-s2 kernel: Lustre: 122730:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff932c4227f200 x1632180808459408/t0(0) o101->d290c414-d21b-2229-b831-d35e3a056cfa@10.8.1.12@o2ib6:22/0 lens 576/3264 e 0 to 0 dl 1556669002 ref 2 fl Interpret:/0/0 rc 0/0 Apr 30 17:03:17 fir-md1-s2 kernel: Lustre: 122730:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 6 previous similar messages Apr 30 17:03:23 fir-md1-s2 kernel: Lustre: 122340:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556668996/real 1556668996] req@ffff9327c8f26c00 x1632253735386272/t0(0) o104->fir-MDT0003@10.8.10.20@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556669003 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Apr 30 17:03:26 fir-md1-s2 kernel: Lustre: 122041:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff930804bf1e00 x1631543736027360/t0(0) o101->acb643ef-75ad-6f92-b388-57634462f54f@10.8.28.6@o2ib6:1/0 lens 576/3264 e 0 to 0 dl 1556669011 ref 2 fl Interpret:/0/0 rc 0/0 Apr 30 17:03:26 fir-md1-s2 kernel: Lustre: 122041:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 5 previous similar messages Apr 30 17:03:37 fir-md1-s2 kernel: Lustre: 122340:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556669010/real 1556669010] req@ffff9327c8f26c00 x1632253735386272/t0(0) o104->fir-MDT0003@10.8.10.20@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556669017 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Apr 30 17:03:37 fir-md1-s2 kernel: Lustre: 122340:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Apr 30 17:03:45 fir-md1-s2 kernel: Lustre: 122041:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff93105581c800 x1631586064454304/t0(0) o101->8a37f7b1-3efc-30e9-f8d1-739df6680357@10.9.104.19@o2ib4:20/0 lens 576/3264 e 0 to 0 dl 1556669030 ref 2 fl Interpret:/0/0 rc 0/0 Apr 30 17:03:45 fir-md1-s2 kernel: Lustre: 122041:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 10 previous similar messages Apr 30 17:03:58 fir-md1-s2 kernel: Lustre: 122340:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556669031/real 1556669031] req@ffff9327c8f26c00 x1632253735386272/t0(0) o104->fir-MDT0003@10.8.10.20@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556669038 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Apr 30 17:03:58 fir-md1-s2 kernel: Lustre: 122340:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Apr 30 17:04:18 fir-md1-s2 kernel: LustreError: 122127:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556668968, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff932f522ed100/0x1c35e99e28d3a086 lrc: 3/1,0 mode: --/PR res: [0x280000ddf:0x1282:0x0].0x0 bits 0x13/0x0 rrc: 51 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 122127 timeout: 0 lvb_type: 0 Apr 30 17:04:18 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556669058.122658 Apr 30 17:04:18 fir-md1-s2 kernel: LustreError: 122127:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 1 previous similar message Apr 30 17:04:19 fir-md1-s2 kernel: LustreError: 122013:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556668969, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff93197a6b2880/0x1c35e99e28d56a26 lrc: 3/1,0 mode: --/PR res: [0x280000ddf:0x1282:0x0].0x0 bits 0x13/0x0 rrc: 51 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 122013 timeout: 0 lvb_type: 0 Apr 30 17:04:19 fir-md1-s2 kernel: LustreError: 122013:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 1 previous similar message Apr 30 17:04:22 fir-md1-s2 kernel: LustreError: 122274:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556668972, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff932fba70ba80/0x1c35e99e28dccd84 lrc: 3/1,0 mode: --/PR res: [0x280000ddf:0x1282:0x0].0x0 bits 0x13/0x0 rrc: 51 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 122274 timeout: 0 lvb_type: 0 Apr 30 17:04:22 fir-md1-s2 kernel: LustreError: 122274:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 2 previous similar messages Apr 30 17:04:24 fir-md1-s2 kernel: LustreError: 122346:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556668974, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff9315c1314800/0x1c35e99e28e35a27 lrc: 3/1,0 mode: --/PR res: [0x280000ddf:0x1282:0x0].0x0 bits 0x13/0x0 rrc: 51 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 122346 timeout: 0 lvb_type: 0 Apr 30 17:04:30 fir-md1-s2 kernel: LustreError: 122117:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556668980, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff9315a02f9b00/0x1c35e99e28f2408d lrc: 3/1,0 mode: --/PR res: [0x280000ddf:0x1282:0x0].0x0 bits 0x13/0x0 rrc: 51 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 122117 timeout: 0 lvb_type: 0 Apr 30 17:04:30 fir-md1-s2 kernel: LustreError: 122117:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 4 previous similar messages Apr 30 17:04:34 fir-md1-s2 kernel: Lustre: 122340:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556669066/real 1556669066] req@ffff9327c8f26c00 x1632253735386272/t0(0) o104->fir-MDT0003@10.8.10.20@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556669073 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Apr 30 17:04:34 fir-md1-s2 kernel: Lustre: 122340:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages Apr 30 17:04:39 fir-md1-s2 kernel: LustreError: 122054:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556668989, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff931688ee0480/0x1c35e99e2902af32 lrc: 3/1,0 mode: --/PR res: [0x280000ddf:0x1282:0x0].0x0 bits 0x13/0x0 rrc: 51 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 122054 timeout: 0 lvb_type: 0 Apr 30 17:04:39 fir-md1-s2 kernel: LustreError: 122054:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 8 previous similar messages Apr 30 17:04:57 fir-md1-s2 kernel: LustreError: 122168:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556669007, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff931a84e8ca40/0x1c35e99e29233a97 lrc: 3/1,0 mode: --/PR res: [0x280000ddf:0x1282:0x0].0x0 bits 0x13/0x0 rrc: 51 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 122168 timeout: 0 lvb_type: 0 Apr 30 17:04:57 fir-md1-s2 kernel: LustreError: 122168:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 7 previous similar messages Apr 30 17:05:23 fir-md1-s2 kernel: LustreError: 122340:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.10.20@o2ib6) failed to reply to blocking AST (req@ffff9327c8f26c00 x1632253735386272 status 0 rc -110), evict it ns: mdt-fir-MDT0003_UUID lock: ffff932d842f6540/0x1c35e99e26615506 lrc: 4/0,0 mode: PR/PR res: [0x280000ddf:0x1282:0x0].0x0 bits 0x13/0x0 rrc: 53 type: IBT flags: 0x60200400000020 nid: 10.8.10.20@o2ib6 remote: 0xf9be892c61dc7fc5 expref: 427 pid: 122098 timeout: 194503 lvb_type: 0 Apr 30 17:05:23 fir-md1-s2 kernel: LustreError: 138-a: fir-MDT0003: A client on nid 10.8.10.20@o2ib6 was evicted due to a lock blocking callback time out: rc -110 Apr 30 17:05:23 fir-md1-s2 kernel: LustreError: 121399:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 155s: evicting client at 10.8.10.20@o2ib6 ns: mdt-fir-MDT0003_UUID lock: ffff932d842f6540/0x1c35e99e26615506 lrc: 3/0,0 mode: PR/PR res: [0x280000ddf:0x1282:0x0].0x0 bits 0x13/0x0 rrc: 53 type: IBT flags: 0x60200400000020 nid: 10.8.10.20@o2ib6 remote: 0xf9be892c61dc7fc5 expref: 428 pid: 122098 timeout: 0 lvb_type: 0 Apr 30 17:05:52 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 53385b7b-a550-b1a8-0abe-3b8ac836eb95 (at 10.8.10.20@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93156db69800, cur 1556669152 expire 1556669002 last 1556668925 Apr 30 17:05:52 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Apr 30 17:13:05 fir-md1-s2 kernel: LNetError: 121176:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Apr 30 17:28:03 fir-md1-s2 kernel: LNetError: 121177:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Apr 30 17:28:03 fir-md1-s2 kernel: LNetError: 121177:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 2 previous similar messages Apr 30 17:39:23 fir-md1-s2 kernel: LNetError: 121181:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Apr 30 17:39:23 fir-md1-s2 kernel: LNetError: 121181:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 3 previous similar messages Apr 30 17:39:30 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client af26ea58-5b3c-18ce-a05f-14f0d6aed832 (at 10.9.0.63@o2ib4) reconnecting Apr 30 17:39:30 fir-md1-s2 kernel: Lustre: Skipped 153 previous similar messages Apr 30 17:39:30 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.0.63@o2ib4) Apr 30 17:39:30 fir-md1-s2 kernel: Lustre: Skipped 155 previous similar messages Apr 30 17:52:42 fir-md1-s2 kernel: LNetError: 121171:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Apr 30 17:52:42 fir-md1-s2 kernel: LNetError: 121171:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 3 previous similar messages Apr 30 18:05:41 fir-md1-s2 kernel: LNetError: 121174:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Apr 30 18:05:41 fir-md1-s2 kernel: LNetError: 121174:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 5 previous similar messages Apr 30 18:25:18 fir-md1-s2 kernel: LNetError: 121180:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Apr 30 18:25:18 fir-md1-s2 kernel: LNetError: 121180:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 3 previous similar messages Apr 30 18:44:15 fir-md1-s2 kernel: LNetError: 121173:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Apr 30 18:44:15 fir-md1-s2 kernel: LNetError: 121173:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1 previous similar message Apr 30 19:00:10 fir-md1-s2 kernel: LNetError: 121175:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Apr 30 19:00:10 fir-md1-s2 kernel: LNetError: 121175:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1 previous similar message Apr 30 19:16:10 fir-md1-s2 kernel: LNetError: 121175:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Apr 30 19:16:10 fir-md1-s2 kernel: LNetError: 121175:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 2 previous similar messages Apr 30 19:30:56 fir-md1-s2 kernel: LNetError: 121178:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Apr 30 19:30:56 fir-md1-s2 kernel: LNetError: 121178:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 2 previous similar messages Apr 30 19:51:52 fir-md1-s2 kernel: LNetError: 121169:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) Apr 30 19:51:52 fir-md1-s2 kernel: LNetError: 121169:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 2 previous similar messages Apr 30 19:51:59 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client b2fec681-7e3f-a2ea-8cb9-5b0cd1294390 (at 10.9.114.15@o2ib4) reconnecting Apr 30 19:51:59 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to 782f60b1-717d-ff4f-8bab-0951282de63b (at 10.9.112.11@o2ib4) Apr 30 19:51:59 fir-md1-s2 kernel: Lustre: Skipped 2 previous similar messages Apr 30 19:54:25 fir-md1-s2 kernel: LNetError: 121175:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Apr 30 20:08:13 fir-md1-s2 kernel: LNetError: 121178:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Apr 30 20:08:13 fir-md1-s2 kernel: LNetError: 121178:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 2 previous similar messages Apr 30 20:09:54 fir-md1-s2 kernel: LNetError: 121171:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Apr 30 20:09:54 fir-md1-s2 kernel: LNetError: 121171:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1 previous similar message Apr 30 20:17:47 fir-md1-s2 kernel: LNetError: 121177:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Apr 30 20:19:28 fir-md1-s2 kernel: LNetError: 121175:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Apr 30 20:23:26 fir-md1-s2 kernel: LNetError: 121174:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Apr 30 20:23:26 fir-md1-s2 kernel: LNetError: 121174:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1 previous similar message Apr 30 20:33:38 fir-md1-s2 kernel: LNetError: 121174:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Apr 30 20:51:04 fir-md1-s2 kernel: LNetError: 121172:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Apr 30 21:06:09 fir-md1-s2 kernel: LNetError: 121173:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Apr 30 21:06:09 fir-md1-s2 kernel: LNetError: 121173:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 3 previous similar messages Apr 30 21:18:03 fir-md1-s2 kernel: LNetError: 121176:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Apr 30 21:18:03 fir-md1-s2 kernel: LNetError: 121176:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1 previous similar message Apr 30 21:47:15 fir-md1-s2 kernel: LNetError: 121173:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Apr 30 22:25:11 fir-md1-s2 kernel: LNetError: 121175:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Apr 30 22:31:32 fir-md1-s2 kernel: LNetError: 121176:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Apr 30 22:33:23 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 6fab2fb5-26e6-7b9a-b3d9-fd518701970b (at 10.8.14.8@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931b47372800, cur 1556688803 expire 1556688653 last 1556688576 Apr 30 22:33:43 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 6fab2fb5-26e6-7b9a-b3d9-fd518701970b (at 10.8.14.8@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931051469c00, cur 1556688823 expire 1556688673 last 1556688596 Apr 30 23:09:00 fir-md1-s2 kernel: LNetError: 121174:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 01 00:03:09 fir-md1-s2 kernel: LNetError: 121180:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 01 01:30:52 fir-md1-s2 kernel: LNetError: 121171:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 01 01:54:16 fir-md1-s2 kernel: LNetError: 121174:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 01 02:03:54 fir-md1-s2 kernel: LNetError: 121176:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 01 02:04:04 fir-md1-s2 kernel: LNetError: 121173:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 01 02:27:27 fir-md1-s2 kernel: LNetError: 121175:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 01 02:37:17 fir-md1-s2 kernel: LNetError: 121174:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 01 03:22:46 fir-md1-s2 kernel: LNetError: 121174:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 01 03:27:29 fir-md1-s2 kernel: LNetError: 121173:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 01 03:34:37 fir-md1-s2 kernel: LNetError: 121174:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 01 03:39:04 fir-md1-s2 kernel: LNetError: 121176:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 01 04:22:14 fir-md1-s2 kernel: LNetError: 121176:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 01 05:02:30 fir-md1-s2 kernel: LNetError: 121178:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 01 05:03:15 fir-md1-s2 kernel: LNetError: 121173:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 01 05:04:42 fir-md1-s2 kernel: LNetError: 121175:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 01 05:12:29 fir-md1-s2 kernel: LNetError: 121179:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 01 05:38:13 fir-md1-s2 kernel: LNetError: 121174:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 01 05:41:08 fir-md1-s2 kernel: LNetError: 121175:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 01 06:41:36 fir-md1-s2 kernel: LNetError: 121171:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 01 06:46:38 fir-md1-s2 kernel: LNetError: 121173:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 01 07:05:00 fir-md1-s2 kernel: LNetError: 121172:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 01 07:28:53 fir-md1-s2 kernel: LNetError: 121175:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 01 07:41:02 fir-md1-s2 kernel: LNetError: 121175:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 01 08:17:30 fir-md1-s2 kernel: LNetError: 121178:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 01 08:20:20 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 6b2f9741-e509-4243-058a-e7872e15cb5c (at 10.8.1.31@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93307cac1800, cur 1556724020 expire 1556723870 last 1556723793 May 01 08:22:30 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.1.31@o2ib6) May 01 08:22:30 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 01 08:39:13 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 1067cc99-569b-6c01-e8cd-7bfbb2eea42a (at 10.8.14.5@o2ib6) May 01 08:39:13 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 01 08:39:27 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 770c7550-60ce-b00d-c0ae-73d52a13d9c0 (at 10.8.13.23@o2ib6) May 01 08:39:27 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 01 08:39:31 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.13.24@o2ib6) May 01 08:39:31 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 01 08:55:33 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.12.33@o2ib6) May 01 08:55:33 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 01 09:48:58 fir-md1-s2 kernel: LNetError: 121174:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 01 10:14:05 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client c002d779-213f-8764-b0ce-a364b557d98d (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931b1c6fbc00, cur 1556730845 expire 1556730695 last 1556730618 May 01 10:14:05 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 01 10:14:23 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 01 10:14:23 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 01 10:44:51 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 0840d825-5e1d-ab09-748d-b5fef372f47f (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93081be32c00, cur 1556732691 expire 1556732541 last 1556732464 May 01 10:44:51 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 01 10:45:26 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.21.21@o2ib6) May 01 10:45:26 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 01 11:35:59 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client c7c1e785-7484-b308-7dc5-6b63513d6220 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930cb5f14000, cur 1556735759 expire 1556735609 last 1556735532 May 01 11:35:59 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 01 11:36:15 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 01 11:36:15 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 01 11:55:58 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 84153925-7318-d597-37bf-61264542eb58 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931947bb9c00, cur 1556736958 expire 1556736808 last 1556736731 May 01 11:55:58 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 01 11:56:24 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 01 11:56:24 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 01 18:07:17 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client dfe61200-863e-32be-7d68-5233540a9762 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931669aec400, cur 1556759237 expire 1556759087 last 1556759010 May 01 18:07:17 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 01 18:07:40 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 01 18:07:40 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 01 18:11:54 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 7592b62b-cd74-82e4-03cd-75fb5e0a226b (at 10.8.14.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff934076cb8800, cur 1556759514 expire 1556759364 last 1556759287 May 01 18:11:54 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 01 18:37:56 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 01 18:37:56 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 01 18:38:41 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 4cd88215-e667-298c-fb54-c17c8301efbb (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93301ff7d400, cur 1556761121 expire 1556760971 last 1556760894 May 01 18:38:41 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 01 18:43:17 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.14.9@o2ib6) May 01 18:43:17 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 01 18:51:21 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client dd447c0e-bf16-d0be-6449-bf36e688df99 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931b46bd2c00, cur 1556761881 expire 1556761731 last 1556761654 May 01 18:51:21 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 01 18:51:55 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.21.21@o2ib6) May 01 18:51:55 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 01 19:04:32 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client fe35d843-20f4-288f-d8db-52dd32b58570 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930b07bdbc00, cur 1556762672 expire 1556762522 last 1556762445 May 01 19:04:32 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 01 19:04:50 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.21.21@o2ib6) May 01 19:04:50 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 01 19:08:53 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 64b566c2-ebb5-7da0-af60-514dba7cee07 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff932f08420800, cur 1556762933 expire 1556762783 last 1556762706 May 01 19:08:53 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 01 19:09:25 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 01 19:09:25 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 01 19:13:40 fir-md1-s2 kernel: Lustre: 122662:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556763213/real 1556763213] req@ffff931d4e378900 x1632254721495920/t0(0) o104->fir-MDT0001@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556763220 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 01 19:13:40 fir-md1-s2 kernel: Lustre: 122662:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 7 previous similar messages May 01 19:13:48 fir-md1-s2 kernel: Lustre: 122166:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff931b7df27200 x1631565227324112/t0(0) o101->7cc0f019-7fa6-17b1-76f1-8ecb3c84ba82@10.8.27.24@o2ib6:23/0 lens 480/568 e 1 to 0 dl 1556763233 ref 2 fl Interpret:/0/0 rc 0/0 May 01 19:13:48 fir-md1-s2 kernel: Lustre: 122166:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 6 previous similar messages May 01 19:13:54 fir-md1-s2 kernel: Lustre: 122662:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556763227/real 1556763227] req@ffff931d4e378900 x1632254721495920/t0(0) o104->fir-MDT0001@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556763234 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 01 19:13:54 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 7cc0f019-7fa6-17b1-76f1-8ecb3c84ba82 (at 10.8.27.24@o2ib6) reconnecting May 01 19:13:54 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.24@o2ib6) May 01 19:13:54 fir-md1-s2 kernel: Lustre: 122662:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message May 01 19:14:15 fir-md1-s2 kernel: Lustre: 122662:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556763248/real 1556763248] req@ffff931d4e378900 x1632254721495920/t0(0) o104->fir-MDT0001@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556763255 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 01 19:14:15 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 7cc0f019-7fa6-17b1-76f1-8ecb3c84ba82 (at 10.8.27.24@o2ib6) reconnecting May 01 19:14:15 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.24@o2ib6) May 01 19:14:15 fir-md1-s2 kernel: Lustre: 122662:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages May 01 19:14:36 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 7cc0f019-7fa6-17b1-76f1-8ecb3c84ba82 (at 10.8.27.24@o2ib6) reconnecting May 01 19:14:36 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.24@o2ib6) May 01 19:14:50 fir-md1-s2 kernel: Lustre: 122662:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556763283/real 1556763283] req@ffff931d4e378900 x1632254721495920/t0(0) o104->fir-MDT0001@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556763290 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 01 19:14:50 fir-md1-s2 kernel: Lustre: 122662:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages May 01 19:14:50 fir-md1-s2 kernel: LustreError: 122662:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.27.23@o2ib6) returned error from blocking AST (req@ffff931d4e378900 x1632254721495920 status -107 rc -107), evict it ns: mdt-fir-MDT0001_UUID lock: ffff931fa6974ec0/0x1c35e9ae8441cdfb lrc: 4/0,0 mode: PR/PR res: [0x240025f32:0x679:0x0].0x0 bits 0x40/0x0 rrc: 8 type: IBT flags: 0x60000400000020 nid: 10.8.27.23@o2ib6 remote: 0x6819141b9e88f366 expref: 82 pid: 122029 timeout: 288678 lvb_type: 0 May 01 19:14:50 fir-md1-s2 kernel: LustreError: 138-a: fir-MDT0001: A client on nid 10.8.27.23@o2ib6 was evicted due to a lock blocking callback time out: rc -107 May 01 19:14:50 fir-md1-s2 kernel: LustreError: 121399:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 77s: evicting client at 10.8.27.23@o2ib6 ns: mdt-fir-MDT0001_UUID lock: ffff931fa6974ec0/0x1c35e9ae8441cdfb lrc: 3/0,0 mode: PR/PR res: [0x240025f32:0x679:0x0].0x0 bits 0x40/0x0 rrc: 8 type: IBT flags: 0x60000400000020 nid: 10.8.27.23@o2ib6 remote: 0x6819141b9e88f366 expref: 83 pid: 122029 timeout: 0 lvb_type: 0 May 01 19:15:45 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 2acff163-453f-6866-ca52-3be787a802e5 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931b673a7c00, cur 1556763345 expire 1556763195 last 1556763118 May 01 19:15:45 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 01 19:15:52 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 01 19:15:52 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 01 19:20:47 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client ef0ca740-68f8-0d2f-af07-d739c91e59f6 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff932c62b16c00, cur 1556763647 expire 1556763497 last 1556763420 May 01 19:21:10 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.21.21@o2ib6) May 01 19:21:10 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 01 19:33:55 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client b169cfff-999c-3e08-edaf-bc412cfb2b0a (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff933ce57c7c00, cur 1556764435 expire 1556764285 last 1556764208 May 01 19:33:55 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 01 19:34:21 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 01 19:34:21 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 01 19:49:50 fir-md1-s2 kernel: LNetError: 121172:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 01 20:09:36 fir-md1-s2 kernel: LNetError: 121169:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 01 20:12:00 fir-md1-s2 kernel: LNetError: 121177:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 01 20:56:23 fir-md1-s2 kernel: LNetError: 121175:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 01 21:00:45 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 6667e1fb-9e5d-8122-f716-8d2ca6b880cd (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9308d5626c00, cur 1556769645 expire 1556769495 last 1556769418 May 01 21:00:45 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 01 21:00:56 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 6667e1fb-9e5d-8122-f716-8d2ca6b880cd (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931cb4e63400, cur 1556769656 expire 1556769506 last 1556769429 May 01 21:02:39 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.21.21@o2ib6) May 01 21:02:39 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 01 21:36:58 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 5242334c-3a63-f428-27e9-84a9b8569357 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930be3b05400, cur 1556771818 expire 1556771668 last 1556771591 May 01 21:37:21 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.21.21@o2ib6) May 01 21:37:21 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 01 21:37:25 fir-md1-s2 kernel: LNetError: 121169:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 01 21:54:35 fir-md1-s2 kernel: LNetError: 121175:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 01 21:59:07 fir-md1-s2 kernel: LNetError: 121176:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 01 22:06:58 fir-md1-s2 kernel: LNetError: 121174:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 01 22:17:56 fir-md1-s2 kernel: LNetError: 121169:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 01 22:18:16 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 90fd09f3-1e4c-d89d-b1ef-509c9c50dd06 (at 10.8.9.8@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93278cebfc00, cur 1556774296 expire 1556774146 last 1556774069 May 01 22:18:16 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 01 22:18:23 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.9.8@o2ib6) May 01 22:18:23 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 01 22:40:24 fir-md1-s2 kernel: LNetError: 121176:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 01 22:48:32 fir-md1-s2 kernel: LNetError: 121180:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 01 22:53:23 fir-md1-s2 kernel: LNetError: 121175:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 01 22:53:57 fir-md1-s2 kernel: LNetError: 121173:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 01 22:56:51 fir-md1-s2 kernel: LNetError: 121174:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) May 01 22:56:58 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 0f8f808f-b03b-81e6-e30e-46ff547f2e45 (at 10.9.113.3@o2ib4) reconnecting May 01 22:56:58 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to (at 10.9.113.3@o2ib4) May 01 23:11:08 fir-md1-s2 kernel: LNetError: 121176:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 01 23:12:16 fir-md1-s2 kernel: LNetError: 121175:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 01 23:12:23 fir-md1-s2 kernel: LNetError: 121173:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 01 23:13:29 fir-md1-s2 kernel: LNetError: 121173:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 01 23:17:54 fir-md1-s2 kernel: Lustre: 122692:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 01 23:20:58 fir-md1-s2 kernel: LNetError: 121174:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 01 23:22:05 fir-md1-s2 kernel: LNetError: 121172:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 01 23:25:47 fir-md1-s2 kernel: Lustre: 122262:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 01 23:25:52 fir-md1-s2 kernel: Lustre: 121638:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 01 23:25:52 fir-md1-s2 kernel: Lustre: 121638:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 2 previous similar messages May 01 23:25:56 fir-md1-s2 kernel: Lustre: 122182:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 01 23:26:15 fir-md1-s2 kernel: Lustre: 122044:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 01 23:26:15 fir-md1-s2 kernel: Lustre: 122044:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1 previous similar message May 01 23:28:03 fir-md1-s2 kernel: Lustre: 122003:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 01 23:29:03 fir-md1-s2 kernel: LNetError: 121172:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 01 23:30:19 fir-md1-s2 kernel: Lustre: 121996:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 01 23:31:38 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client e2a571ed-a09d-5b66-3666-df63bf8e2019 (at 10.8.10.33@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93156db68000, cur 1556778698 expire 1556778548 last 1556778471 May 01 23:31:38 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 01 23:32:00 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client e2a571ed-a09d-5b66-3666-df63bf8e2019 (at 10.8.10.33@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930be8ba7000, cur 1556778720 expire 1556778570 last 1556778493 May 01 23:34:18 fir-md1-s2 kernel: Lustre: 122708:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 01 23:34:18 fir-md1-s2 kernel: Lustre: 122708:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 4 previous similar messages May 01 23:37:30 fir-md1-s2 kernel: Lustre: 122096:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 01 23:38:25 fir-md1-s2 kernel: LNetError: 121184:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 01 23:40:44 fir-md1-s2 kernel: Lustre: 122699:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 01 23:40:44 fir-md1-s2 kernel: Lustre: 122699:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 16 previous similar messages May 01 23:42:23 fir-md1-s2 kernel: Lustre: 122438:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff9314fe608850 x1631297234590688/t0(0) o4->301bd1d5-f294-00c7-57bb-0517ce6cb157@10.8.15.8@o2ib6:28/0 lens 488/448 e 1 to 0 dl 1556779348 ref 2 fl Interpret:/0/0 rc 0/0 May 01 23:42:29 fir-md1-s2 kernel: Lustre: 122471:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556779328/real 1556779328] req@ffff931d186ca100 x1632254943904320/t0(0) o601->fir-MDT0000-lwp-MDT0003@10.0.10.51@o2ib7:23/10 lens 336/336 e 1 to 1 dl 1556779349 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1 May 01 23:42:29 fir-md1-s2 kernel: Lustre: fir-MDT0000-lwp-MDT0003: Connection to fir-MDT0000 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete May 01 23:42:29 fir-md1-s2 kernel: Lustre: 122471:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:1s); client may timeout. req@ffff9314fe608850 x1631297234590688/t365603108752(0) o4->301bd1d5-f294-00c7-57bb-0517ce6cb157@10.8.15.8@o2ib6:28/0 lens 488/416 e 1 to 0 dl 1556779348 ref 1 fl Complete:/0/0 rc 0/0 May 01 23:42:29 fir-md1-s2 kernel: Lustre: fir-MDT0000-lwp-MDT0003: Connection restored to 10.0.10.51@o2ib7 (at 10.0.10.51@o2ib7) May 01 23:43:01 fir-md1-s2 kernel: Lustre: 121233:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556779350/real 1556779350] req@ffff930e47fdfb00 x1632254944175680/t0(0) o601->fir-MDT0000-lwp-MDT0003@10.0.10.51@o2ib7:23/10 lens 336/336 e 0 to 1 dl 1556779381 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 May 01 23:43:01 fir-md1-s2 kernel: Lustre: fir-MDT0000-lwp-MDT0003: Connection to fir-MDT0000 (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will wait for recovery to complete May 01 23:43:01 fir-md1-s2 kernel: Lustre: fir-MDT0000-lwp-MDT0003: Connection restored to 10.0.10.51@o2ib7 (at 10.0.10.51@o2ib7) May 01 23:45:26 fir-md1-s2 kernel: Lustre: 121992:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 01 23:45:26 fir-md1-s2 kernel: Lustre: 121992:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 7 previous similar messages May 01 23:51:30 fir-md1-s2 kernel: LNetError: 121175:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 01 23:54:01 fir-md1-s2 kernel: Lustre: 122664:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 01 23:54:01 fir-md1-s2 kernel: Lustre: 122664:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 3 previous similar messages May 02 00:01:16 fir-md1-s2 kernel: LNetError: 121171:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 02 00:07:41 fir-md1-s2 kernel: Lustre: 122123:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 00:07:41 fir-md1-s2 kernel: Lustre: 122123:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 56 previous similar messages May 02 00:08:25 fir-md1-s2 kernel: LNetError: 121179:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 02 00:12:14 fir-md1-s2 kernel: LNetError: 121177:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 02 00:22:44 fir-md1-s2 kernel: Lustre: 122229:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 00:22:44 fir-md1-s2 kernel: Lustre: 122229:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 25 previous similar messages May 02 00:25:30 fir-md1-s2 kernel: LNetError: 121173:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 02 00:26:17 fir-md1-s2 kernel: LNetError: 121180:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 02 00:31:33 fir-md1-s2 kernel: LNetError: 121176:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 02 00:40:02 fir-md1-s2 kernel: LNetError: 121176:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 02 00:40:06 fir-md1-s2 kernel: Lustre: 122109:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 00:40:06 fir-md1-s2 kernel: Lustre: 122109:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 103 previous similar messages May 02 00:56:15 fir-md1-s2 kernel: Lustre: 122041:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0003: Failure to clear the changelog for user 1: -22 May 02 00:56:15 fir-md1-s2 kernel: Lustre: 122041:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1 previous similar message May 02 00:57:36 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931b47376c00, cur 1556783856 expire 1556783706 last 1556783629 May 02 00:58:08 fir-md1-s2 kernel: LNetError: 121174:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 02 01:01:39 fir-md1-s2 kernel: LNetError: 121180:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 02 01:04:44 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 02 01:04:44 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 01:08:34 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client bb5df75e-47d2-6116-c28b-57643841d372 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9305baebd800, cur 1556784514 expire 1556784364 last 1556784287 May 02 01:08:34 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 01:08:39 fir-md1-s2 kernel: Lustre: 122633:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 01:08:39 fir-md1-s2 kernel: Lustre: 122633:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 17 previous similar messages May 02 01:13:25 fir-md1-s2 kernel: LNetError: 121174:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 02 01:15:42 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 02 01:15:42 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 01:19:01 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 02 01:19:01 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 01:19:32 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client ff37667d-58c6-941f-43b8-fbc08903c9b3 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93161f771800, cur 1556785172 expire 1556785022 last 1556784945 May 02 01:19:32 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 01:20:01 fir-md1-s2 kernel: Lustre: 122708:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 01:20:01 fir-md1-s2 kernel: Lustre: 122708:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 9 previous similar messages May 02 01:25:18 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client ea0ec5c3-6829-d0ee-e1d8-43c3a991f5ee (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931622ef5400, cur 1556785518 expire 1556785368 last 1556785291 May 02 01:25:18 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 01:25:23 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 02 01:25:23 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 01:25:24 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client ea0ec5c3-6829-d0ee-e1d8-43c3a991f5ee (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931a973ef000, cur 1556785524 expire 1556785374 last 1556785297 May 02 01:26:34 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 90508cea-f23b-ead6-21b6-eb1abb6d51cd (at 10.8.23.14@o2ib6) in 172 seconds. I think it's dead, and I am evicting it. exp ffff9310183b7400, cur 1556785594 expire 1556785444 last 1556785422 May 02 01:26:40 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 90508cea-f23b-ead6-21b6-eb1abb6d51cd (at 10.8.23.14@o2ib6) in 183 seconds. I think it's dead, and I am evicting it. exp ffff9315f6e95000, cur 1556785600 expire 1556785450 last 1556785417 May 02 01:27:39 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.23.14@o2ib6) May 02 01:27:39 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 01:29:13 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 4f271e75-b2c7-65df-808a-6ee3fc024815 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93161a314000, cur 1556785753 expire 1556785603 last 1556785526 May 02 01:29:36 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 02 01:29:36 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 01:34:56 fir-md1-s2 kernel: LNetError: 121179:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 02 01:35:54 fir-md1-s2 kernel: Lustre: 122262:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 01:35:54 fir-md1-s2 kernel: Lustre: 122262:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1 previous similar message May 02 01:38:51 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 8d664866-17c7-7f44-31eb-66dff3b61f71 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930786e5a400, cur 1556786331 expire 1556786181 last 1556786104 May 02 01:38:51 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 01:39:10 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 02 01:39:10 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 01:43:47 fir-md1-s2 kernel: LNetError: 121179:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 02 01:52:33 fir-md1-s2 kernel: Lustre: 122638:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 01:52:33 fir-md1-s2 kernel: Lustre: 122638:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 3 previous similar messages May 02 02:09:39 fir-md1-s2 kernel: Lustre: 122298:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 02:09:39 fir-md1-s2 kernel: Lustre: 122298:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 25 previous similar messages May 02 02:28:58 fir-md1-s2 kernel: LNetError: 121175:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 02 02:39:59 fir-md1-s2 kernel: Lustre: 122187:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 02:39:59 fir-md1-s2 kernel: Lustre: 122187:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 24 previous similar messages May 02 02:46:13 fir-md1-s2 kernel: Lustre: 11910:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 02:46:13 fir-md1-s2 kernel: Lustre: 11910:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1 previous similar message May 02 02:47:47 fir-md1-s2 kernel: LNetError: 121180:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 02 02:53:26 fir-md1-s2 kernel: LNetError: 121179:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 02 02:54:34 fir-md1-s2 kernel: LNetError: 121180:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 02 02:57:17 fir-md1-s2 kernel: Lustre: 122187:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 02:57:17 fir-md1-s2 kernel: Lustre: 122187:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 63 previous similar messages May 02 03:05:18 fir-md1-s2 kernel: Lustre: 122665:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 03:05:18 fir-md1-s2 kernel: Lustre: 122665:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 12 previous similar messages May 02 03:08:59 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client aec68a6a-7cc2-5a83-a3fa-b45b5d00f2f3 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93301f37ac00, cur 1556791739 expire 1556791589 last 1556791512 May 02 03:08:59 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 03:09:26 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.21.21@o2ib6) May 02 03:09:26 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 03:18:19 fir-md1-s2 kernel: Lustre: 122131:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 03:18:19 fir-md1-s2 kernel: Lustre: 122131:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 37 previous similar messages May 02 03:19:57 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 0bd5061e-994e-558c-c820-3ad8bf31cfa8 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9316b0a7c400, cur 1556792397 expire 1556792247 last 1556792170 May 02 03:19:57 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 03:20:18 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.21.21@o2ib6) May 02 03:20:18 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 03:29:08 fir-md1-s2 kernel: Lustre: 121992:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 03:29:08 fir-md1-s2 kernel: Lustre: 121992:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 10 previous similar messages May 02 03:32:52 fir-md1-s2 kernel: LNetError: 121175:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 02 03:42:09 fir-md1-s2 kernel: Lustre: 122656:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 03:42:09 fir-md1-s2 kernel: Lustre: 122656:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 7 previous similar messages May 02 03:54:52 fir-md1-s2 kernel: LNetError: 121175:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 02 03:57:36 fir-md1-s2 kernel: Lustre: 121992:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 03:57:36 fir-md1-s2 kernel: Lustre: 121992:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 8 previous similar messages May 02 04:07:38 fir-md1-s2 kernel: Lustre: 122303:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 04:07:38 fir-md1-s2 kernel: Lustre: 122303:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 15 previous similar messages May 02 04:17:21 fir-md1-s2 kernel: LNetError: 121175:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 02 04:19:49 fir-md1-s2 kernel: Lustre: 121625:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 04:19:49 fir-md1-s2 kernel: Lustre: 121625:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 29 previous similar messages May 02 04:25:51 fir-md1-s2 kernel: LNetError: 121178:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 02 04:39:22 fir-md1-s2 kernel: Lustre: 122712:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 04:39:22 fir-md1-s2 kernel: Lustre: 122712:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 16 previous similar messages May 02 04:58:07 fir-md1-s2 kernel: Lustre: 122695:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 04:58:07 fir-md1-s2 kernel: Lustre: 122695:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 12 previous similar messages May 02 05:10:42 fir-md1-s2 kernel: Lustre: 122096:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 05:10:42 fir-md1-s2 kernel: Lustre: 122096:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 12 previous similar messages May 02 05:20:54 fir-md1-s2 kernel: Lustre: 122667:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 05:20:54 fir-md1-s2 kernel: Lustre: 122667:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 7 previous similar messages May 02 05:29:08 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 3fbaa703-3adf-9bb5-3d07-350b21402455 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9316ae769c00, cur 1556800148 expire 1556799998 last 1556799921 May 02 05:29:08 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 05:29:36 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to (at 10.8.21.21@o2ib6) May 02 05:29:36 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 05:32:02 fir-md1-s2 kernel: Lustre: 122638:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 05:32:02 fir-md1-s2 kernel: Lustre: 122638:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 23 previous similar messages May 02 05:42:45 fir-md1-s2 kernel: Lustre: 122638:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 05:42:45 fir-md1-s2 kernel: Lustre: 122638:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 6 previous similar messages May 02 05:46:31 fir-md1-s2 kernel: LNetError: 121178:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 02 05:53:45 fir-md1-s2 kernel: Lustre: 121415:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 05:53:45 fir-md1-s2 kernel: Lustre: 121415:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 47 previous similar messages May 02 05:59:53 fir-md1-s2 kernel: LNetError: 121176:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 02 06:08:26 fir-md1-s2 kernel: Lustre: 122014:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 06:08:26 fir-md1-s2 kernel: Lustre: 122014:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 4 previous similar messages May 02 06:18:32 fir-md1-s2 kernel: Lustre: 122178:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 06:29:17 fir-md1-s2 kernel: Lustre: 121986:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 06:29:17 fir-md1-s2 kernel: Lustre: 121986:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 19 previous similar messages May 02 06:39:30 fir-md1-s2 kernel: Lustre: 122660:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 06:39:30 fir-md1-s2 kernel: Lustre: 122660:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 41 previous similar messages May 02 06:49:45 fir-md1-s2 kernel: Lustre: 122643:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 07:11:24 fir-md1-s2 kernel: Lustre: 122708:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 07:11:24 fir-md1-s2 kernel: Lustre: 122708:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 27 previous similar messages May 02 07:20:12 fir-md1-s2 kernel: Lustre: 122698:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 07:20:12 fir-md1-s2 kernel: Lustre: 122698:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1 previous similar message May 02 07:23:01 fir-md1-s2 kernel: Lustre: 122001:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 07:23:01 fir-md1-s2 kernel: Lustre: 122001:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 13 previous similar messages May 02 07:31:18 fir-md1-s2 kernel: LNetError: 121173:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 02 07:33:08 fir-md1-s2 kernel: Lustre: 122695:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 07:33:08 fir-md1-s2 kernel: Lustre: 122695:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 14 previous similar messages May 02 07:42:21 fir-md1-s2 kernel: LNetError: 121179:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 02 07:46:08 fir-md1-s2 kernel: Lustre: 122056:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 07:46:08 fir-md1-s2 kernel: Lustre: 122056:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 40 previous similar messages May 02 07:59:48 fir-md1-s2 kernel: Lustre: 121650:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 07:59:48 fir-md1-s2 kernel: Lustre: 121650:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 24 previous similar messages May 02 08:12:24 fir-md1-s2 kernel: Lustre: 122262:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 08:12:24 fir-md1-s2 kernel: Lustre: 122262:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 17 previous similar messages May 02 08:20:24 fir-md1-s2 kernel: LNetError: 121176:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 02 08:24:14 fir-md1-s2 kernel: Lustre: 122018:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 08:24:14 fir-md1-s2 kernel: Lustre: 122018:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 26 previous similar messages May 02 08:28:27 fir-md1-s2 kernel: LNetError: 121176:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 02 08:28:38 fir-md1-s2 kernel: LNetError: 121172:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 02 08:34:38 fir-md1-s2 kernel: LNetError: 121178:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 02 08:37:21 fir-md1-s2 kernel: LNetError: 121174:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 02 08:37:54 fir-md1-s2 kernel: Lustre: 122698:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 08:37:54 fir-md1-s2 kernel: Lustre: 122698:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 27 previous similar messages May 02 08:44:30 fir-md1-s2 kernel: LNetError: 121175:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 02 08:48:28 fir-md1-s2 kernel: Lustre: 122048:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 08:48:28 fir-md1-s2 kernel: Lustre: 122048:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1 previous similar message May 02 08:54:28 fir-md1-s2 kernel: LNetError: 121174:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 02 08:56:01 fir-md1-s2 kernel: LNetError: 121179:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 02 08:58:37 fir-md1-s2 kernel: Lustre: 122348:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 08:58:37 fir-md1-s2 kernel: Lustre: 122348:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 13 previous similar messages May 02 08:59:39 fir-md1-s2 kernel: LNetError: 121176:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 02 08:59:39 fir-md1-s2 kernel: LNetError: 121176:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1 previous similar message May 02 09:06:21 fir-md1-s2 kernel: LNetError: 121177:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 02 09:09:52 fir-md1-s2 kernel: Lustre: 122058:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 09:09:52 fir-md1-s2 kernel: Lustre: 122058:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 14 previous similar messages May 02 09:16:49 fir-md1-s2 kernel: LNetError: 121172:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 02 09:16:49 fir-md1-s2 kernel: LNetError: 121172:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 2 previous similar messages May 02 09:24:38 fir-md1-s2 kernel: Lustre: 122317:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 09:24:38 fir-md1-s2 kernel: Lustre: 122317:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1 previous similar message May 02 09:25:32 fir-md1-s2 kernel: LNetError: 121169:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 02 09:25:32 fir-md1-s2 kernel: LNetError: 121169:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1 previous similar message May 02 09:40:40 fir-md1-s2 kernel: LNetError: 121169:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 02 09:40:40 fir-md1-s2 kernel: LNetError: 121169:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 3 previous similar messages May 02 09:41:45 fir-md1-s2 kernel: Lustre: 122637:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 09:41:45 fir-md1-s2 kernel: Lustre: 122637:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 13 previous similar messages May 02 09:52:43 fir-md1-s2 kernel: Lustre: 122667:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 09:52:43 fir-md1-s2 kernel: Lustre: 122667:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 16 previous similar messages May 02 09:53:06 fir-md1-s2 kernel: LNetError: 121178:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 02 09:53:06 fir-md1-s2 kernel: LNetError: 121178:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1 previous similar message May 02 10:03:19 fir-md1-s2 kernel: LNetError: 121175:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 02 10:04:00 fir-md1-s2 kernel: Lustre: 122044:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 10:04:00 fir-md1-s2 kernel: Lustre: 122044:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 11 previous similar messages May 02 10:16:13 fir-md1-s2 kernel: LNetError: 121179:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 02 10:16:13 fir-md1-s2 kernel: LNetError: 121179:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1 previous similar message May 02 10:19:02 fir-md1-s2 kernel: Lustre: 122040:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 10:19:02 fir-md1-s2 kernel: Lustre: 122040:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1 previous similar message May 02 10:27:19 fir-md1-s2 kernel: LNetError: 121179:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 02 10:27:19 fir-md1-s2 kernel: LNetError: 121179:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1 previous similar message May 02 10:31:59 fir-md1-s2 kernel: Lustre: 122310:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 10:31:59 fir-md1-s2 kernel: Lustre: 122310:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 11 previous similar messages May 02 10:44:27 fir-md1-s2 kernel: Lustre: 122708:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 10:44:27 fir-md1-s2 kernel: Lustre: 122708:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 10 previous similar messages May 02 10:57:55 fir-md1-s2 kernel: LNetError: 121169:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 02 10:57:55 fir-md1-s2 kernel: LNetError: 121169:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 3 previous similar messages May 02 10:58:01 fir-md1-s2 kernel: Lustre: 122704:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 10:58:01 fir-md1-s2 kernel: Lustre: 122704:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 37 previous similar messages May 02 11:03:26 fir-md1-s2 kernel: LNetError: 121176:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 02 11:03:42 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 44af0c56-5a88-a72b-6045-a7b009a95d81 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931ec2bcf800, cur 1556820222 expire 1556820072 last 1556819995 May 02 11:03:42 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 11:07:40 fir-md1-s2 kernel: LNetError: 121176:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 02 11:12:00 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 02 11:12:00 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 11:15:39 fir-md1-s2 kernel: LNetError: 121171:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 02 11:23:21 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 4ddbaacf-8a30-33af-ce74-60975dfc2df4 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9303d4f8e800, cur 1556821401 expire 1556821251 last 1556821174 May 02 11:23:21 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 11:24:37 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 02 11:24:37 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 11:33:53 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 172c526d-c0f5-9a2a-9f65-ccfb5a20ff9a (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931cd5a01000, cur 1556822033 expire 1556821883 last 1556821806 May 02 11:33:53 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 11:34:52 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 02 11:34:52 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 11:39:16 fir-md1-s2 kernel: Lustre: 122041:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 11:39:16 fir-md1-s2 kernel: Lustre: 122041:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 6 previous similar messages May 02 11:41:12 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 58f926d8-2802-f47d-7c08-c8978d4a4d11 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff932eb9ae0c00, cur 1556822472 expire 1556822322 last 1556822245 May 02 11:41:12 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 11:41:18 fir-md1-s2 kernel: Lustre: 122096:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 11:41:18 fir-md1-s2 kernel: Lustre: 122096:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 8 previous similar messages May 02 11:42:09 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 02 11:42:09 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 11:44:14 fir-md1-s2 kernel: Lustre: 122719:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 11:44:14 fir-md1-s2 kernel: Lustre: 122719:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 5 previous similar messages May 02 11:50:09 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client c2b0b8a6-df91-5479-ef43-e8943f21239d (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930dafe2b000, cur 1556823009 expire 1556822859 last 1556822782 May 02 11:50:09 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 11:50:19 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client c2b0b8a6-df91-5479-ef43-e8943f21239d (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff932be47efc00, cur 1556823019 expire 1556822869 last 1556822792 May 02 11:50:19 fir-md1-s2 kernel: Lustre: 122661:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 11:50:19 fir-md1-s2 kernel: Lustre: 122661:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 13 previous similar messages May 02 11:50:57 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 02 11:50:57 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 12:05:14 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 6aeb6079-402b-3bb1-a2d3-c82f7dedb64c (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff932d2c3e6000, cur 1556823914 expire 1556823764 last 1556823687 May 02 12:05:33 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 6aeb6079-402b-3bb1-a2d3-c82f7dedb64c (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93168d672000, cur 1556823933 expire 1556823783 last 1556823706 May 02 12:06:20 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 02 12:06:20 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 12:11:50 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 9eff0b4e-3afc-f49d-130c-6037f4f08b84 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931de2747000, cur 1556824310 expire 1556824160 last 1556824083 May 02 12:12:23 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 02 12:12:23 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 12:15:20 fir-md1-s2 kernel: Lustre: 122163:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 12:15:20 fir-md1-s2 kernel: Lustre: 122163:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 46 previous similar messages May 02 12:17:28 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client e1793f57-a4ba-6d03-0fea-3e33901bbb9a (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930ca2fa4c00, cur 1556824648 expire 1556824498 last 1556824421 May 02 12:17:28 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 12:21:10 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 02 12:21:10 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 12:26:15 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 93ac24e1-823a-ff22-9381-5377a6e67cd8 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931d6eddf400, cur 1556825175 expire 1556825025 last 1556824948 May 02 12:26:15 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 12:27:26 fir-md1-s2 kernel: Lustre: 122310:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 12:27:26 fir-md1-s2 kernel: Lustre: 122310:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1 previous similar message May 02 12:28:51 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 02 12:28:51 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 12:33:11 fir-md1-s2 kernel: Lustre: 122011:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 12:33:11 fir-md1-s2 kernel: Lustre: 122011:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 8 previous similar messages May 02 12:34:29 fir-md1-s2 kernel: Lustre: 122121:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 12:34:29 fir-md1-s2 kernel: Lustre: 122121:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 3 previous similar messages May 02 12:34:46 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 47cd7d71-22c2-28c6-529e-0733b434823f (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931f74bc5c00, cur 1556825686 expire 1556825536 last 1556825459 May 02 12:34:46 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 12:35:08 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 02 12:35:08 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 12:40:38 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client c9762523-e50e-5d2d-7f55-8a388512d616 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931b1fb05000, cur 1556826038 expire 1556825888 last 1556825811 May 02 12:40:38 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 12:44:20 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 02 12:44:20 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 12:47:58 fir-md1-s2 kernel: Lustre: 121480:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 12:47:58 fir-md1-s2 kernel: Lustre: 121480:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 4 previous similar messages May 02 12:50:53 fir-md1-s2 kernel: Lustre: 122018:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 12:50:53 fir-md1-s2 kernel: Lustre: 122018:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1 previous similar message May 02 12:54:27 fir-md1-s2 kernel: Lustre: 122044:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 12:56:02 fir-md1-s2 kernel: Lustre: 122121:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 12:56:02 fir-md1-s2 kernel: Lustre: 122121:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1 previous similar message May 02 12:56:05 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client bd662612-7c6d-b660-6da2-0730c6cccb0c (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93080ffa5c00, cur 1556826965 expire 1556826815 last 1556826738 May 02 12:56:05 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 12:59:42 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 02 12:59:42 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 13:03:42 fir-md1-s2 kernel: Lustre: 122014:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 13:04:47 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 45ccfd2e-c9a2-5159-cfd1-e5d0ed6a8547 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930ac5fb3800, cur 1556827487 expire 1556827337 last 1556827260 May 02 13:04:47 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 13:05:04 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 02 13:05:04 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 13:06:53 fir-md1-s2 kernel: Lustre: 122651:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 13:06:53 fir-md1-s2 kernel: Lustre: 122651:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 3 previous similar messages May 02 13:10:09 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 6f24849a-5082-4c56-5222-ba0806df8317 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930e69281800, cur 1556827809 expire 1556827659 last 1556827582 May 02 13:10:09 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 13:13:33 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 02 13:13:33 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 13:16:44 fir-md1-s2 kernel: Lustre: 121480:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 13:16:44 fir-md1-s2 kernel: Lustre: 121480:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1 previous similar message May 02 13:19:00 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 381cf436-f89c-8581-2b6b-48bc0bd0427a (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93036ce8b400, cur 1556828340 expire 1556828190 last 1556828113 May 02 13:19:00 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 13:19:12 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 02 13:19:12 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 13:26:22 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 02 13:26:22 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 13:27:12 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client bcd1f643-86fc-1a53-19ae-6fe448ff66ff (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff932fe0a4ec00, cur 1556828832 expire 1556828682 last 1556828605 May 02 13:27:12 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 13:27:55 fir-md1-s2 kernel: Lustre: 121651:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 13:27:55 fir-md1-s2 kernel: Lustre: 121651:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 2 previous similar messages May 02 13:31:02 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 890e4d16-d7c3-3317-6d52-b0c9fa5bef3f (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93008d448000, cur 1556829062 expire 1556828912 last 1556828835 May 02 13:31:02 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 13:31:08 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 02 13:31:08 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 13:36:13 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client a8b9138a-75f6-8796-bab3-20790c25867e (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930e9624ec00, cur 1556829373 expire 1556829223 last 1556829146 May 02 13:36:13 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 13:36:39 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 02 13:36:39 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 13:37:57 fir-md1-s2 kernel: Lustre: 122167:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 13:37:57 fir-md1-s2 kernel: Lustre: 122167:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 4 previous similar messages May 02 13:38:42 fir-md1-s2 kernel: LNetError: 121176:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) May 02 13:38:42 fir-md1-s2 kernel: LNetError: 121176:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1 previous similar message May 02 13:38:49 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 170f3856-60d8-2b8e-75f5-a5d9637ffe80 (at 10.9.105.19@o2ib4) reconnecting May 02 13:38:49 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to (at 10.9.105.21@o2ib4) May 02 13:38:49 fir-md1-s2 kernel: Lustre: Skipped 2 previous similar messages May 02 13:40:54 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 5c284da1-6a2b-bc9e-3788-39b9513d00de (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931b31eee000, cur 1556829654 expire 1556829504 last 1556829427 May 02 13:40:54 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 13:47:30 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 02 13:47:30 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages May 02 13:48:22 fir-md1-s2 kernel: Lustre: 122303:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 13:48:22 fir-md1-s2 kernel: Lustre: 122303:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 5 previous similar messages May 02 13:52:34 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 9ca84ea1-b5eb-0326-12af-93722f0f8262 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93163339bc00, cur 1556830354 expire 1556830204 last 1556830127 May 02 13:52:34 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 02 13:56:39 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 02 13:56:39 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 02 13:58:38 fir-md1-s2 kernel: Lustre: 122011:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 13:58:38 fir-md1-s2 kernel: Lustre: 122011:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 6 previous similar messages May 02 14:11:43 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client c4e75ec6-3deb-55b0-3f43-268cf1ca9b51 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93283bfa4400, cur 1556831503 expire 1556831353 last 1556831276 May 02 14:11:43 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 02 14:12:24 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 02 14:12:24 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 14:12:34 fir-md1-s2 kernel: Lustre: 122001:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 14:12:34 fir-md1-s2 kernel: Lustre: 122001:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 2 previous similar messages May 02 14:25:36 fir-md1-s2 kernel: Lustre: 122022:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 14:25:36 fir-md1-s2 kernel: Lustre: 122022:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 23 previous similar messages May 02 14:37:12 fir-md1-s2 kernel: Lustre: 122643:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 14:37:12 fir-md1-s2 kernel: Lustre: 122643:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 2 previous similar messages May 02 14:41:39 fir-md1-s2 kernel: LNetError: 121173:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 02 14:54:28 fir-md1-s2 kernel: LNetError: 121173:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 02 15:02:58 fir-md1-s2 kernel: LNetError: 121176:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 02 15:09:49 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 224573b7-e555-fcb4-9196-684f2aee08d5 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93183e64c400, cur 1556834989 expire 1556834839 last 1556834762 May 02 15:09:49 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 02 15:10:35 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 02 15:10:35 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 02 15:13:47 fir-md1-s2 kernel: Lustre: 122163:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 15:13:47 fir-md1-s2 kernel: Lustre: 122163:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 2 previous similar messages May 02 15:19:18 fir-md1-s2 kernel: Lustre: 121996:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 15:25:14 fir-md1-s2 kernel: Lustre: 122716:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 15:31:45 fir-md1-s2 kernel: Lustre: 122080:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 15:31:45 fir-md1-s2 kernel: Lustre: 122080:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1 previous similar message May 02 16:09:31 fir-md1-s2 kernel: Lustre: 122646:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 16:09:31 fir-md1-s2 kernel: Lustre: 122646:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 2 previous similar messages May 02 16:14:41 fir-md1-s2 kernel: Lustre: 122044:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 16:28:56 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 02 16:28:56 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 16:29:15 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 111d3c6e-044c-2474-7739-f5cf178c4f50 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9315aaffac00, cur 1556839755 expire 1556839605 last 1556839528 May 02 16:29:15 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 16:30:30 fir-md1-s2 kernel: Lustre: 121651:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 16:37:36 fir-md1-s2 kernel: Lustre: 122103:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 16:47:45 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 5ea8819a-828a-65cf-f59c-c1f9bbaca44f (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931ca0775c00, cur 1556840865 expire 1556840715 last 1556840638 May 02 16:47:45 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 16:48:22 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 02 16:48:22 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 17:23:34 fir-md1-s2 kernel: list passed to list_sort() too long for efficiency May 02 17:23:47 fir-md1-s2 kernel: Lustre: 122006:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff93105068ec00 x1632086229498720/t0(0) o502->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:22/0 lens 272/0 e 1 to 0 dl 1556843032 ref 2 fl Interpret:/0/ffffffff rc 0/-1 May 02 17:23:48 fir-md1-s2 kernel: Lustre: 122653:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 6s req@ffff93302761c850 x1632265247107728/t0(0) o1000->fir-MDT0000-mdtlov_UUID@10.0.10.51@o2ib7:0/0 lens 304/0 e 0 to 0 dl 0 ref 1 fl New:/0/ffffffff rc 0/-1 May 02 17:44:34 fir-md1-s2 kernel: LNetError: 121174:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 02 18:06:56 fir-md1-s2 kernel: Lustre: 122109:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 18:19:37 fir-md1-s2 kernel: Lustre: 122637:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 18:19:37 fir-md1-s2 kernel: Lustre: 122637:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 65 previous similar messages May 02 18:19:40 fir-md1-s2 kernel: Lustre: 122704:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 18:19:40 fir-md1-s2 kernel: Lustre: 122704:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 9 previous similar messages May 02 18:20:30 fir-md1-s2 kernel: Lustre: 122704:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 02 18:20:30 fir-md1-s2 kernel: Lustre: 122704:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 7 previous similar messages May 02 19:26:33 fir-md1-s2 kernel: LNetError: 121169:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 02 19:28:10 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 8b37eb62-720c-7043-4839-bef9879e22f7 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931c632d0000, cur 1556850490 expire 1556850340 last 1556850263 May 02 19:28:10 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 19:28:19 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 8b37eb62-720c-7043-4839-bef9879e22f7 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930c22361800, cur 1556850499 expire 1556850349 last 1556850272 May 02 19:28:52 fir-md1-s2 kernel: LNetError: 121170:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 02 19:28:55 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 02 19:28:55 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 19:41:56 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client fd434721-6365-6185-32ff-00f1f3487de4 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931934697c00, cur 1556851316 expire 1556851166 last 1556851089 May 02 19:42:18 fir-md1-s2 kernel: LNetError: 121179:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 02 19:42:39 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 02 19:42:39 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 19:51:07 fir-md1-s2 kernel: LNetError: 121175:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 02 19:58:34 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client ee0a0f5b-1a45-2dc9-8ad3-be9445926234 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff932c073a6000, cur 1556852314 expire 1556852164 last 1556852087 May 02 19:58:34 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 19:58:36 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client ee0a0f5b-1a45-2dc9-8ad3-be9445926234 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93200ebd0800, cur 1556852316 expire 1556852166 last 1556852089 May 02 19:58:57 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 02 19:58:57 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 20:10:18 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 5ade0ad4-3544-d046-38f3-e328f1c49cb6 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93074ff16800, cur 1556853018 expire 1556852868 last 1556852791 May 02 20:10:27 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 5ade0ad4-3544-d046-38f3-e328f1c49cb6 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9315a1b2a000, cur 1556853027 expire 1556852877 last 1556852800 May 02 20:11:08 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 02 20:11:08 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 20:13:58 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 59acbc1a-7ffb-ea0d-d547-724bb9e2d549 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff932dd962d000, cur 1556853238 expire 1556853088 last 1556853011 May 02 20:14:17 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 02 20:14:17 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 20:18:15 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 2db61525-0572-20c7-4bec-8eacb2081288 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931574244400, cur 1556853495 expire 1556853345 last 1556853268 May 02 20:18:15 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 20:18:31 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 2db61525-0572-20c7-4bec-8eacb2081288 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff933ef3301c00, cur 1556853511 expire 1556853361 last 1556853284 May 02 20:18:40 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 02 20:18:40 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 20:22:29 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client eb37cfba-87f3-87c0-46ed-5e5b8889c7b4 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931f3de22400, cur 1556853749 expire 1556853599 last 1556853522 May 02 20:23:10 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 02 20:23:10 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 20:23:22 fir-md1-s2 kernel: Lustre: 122671:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556853795/real 1556853795] req@ffff9327a8edbf00 x1632265677873424/t0(0) o104->fir-MDT0001@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556853802 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 02 20:23:29 fir-md1-s2 kernel: Lustre: 122671:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556853802/real 1556853802] req@ffff9327a8edbf00 x1632265677873424/t0(0) o104->fir-MDT0001@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556853809 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 02 20:23:30 fir-md1-s2 kernel: Lustre: 122290:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff932804b54e00 x1631546177747424/t0(0) o101->dc88b1ac-f571-192a-a71e-3c30ff0f97bc@10.8.7.8@o2ib6:5/0 lens 480/568 e 1 to 0 dl 1556853815 ref 2 fl Interpret:/0/0 rc 0/0 May 02 20:23:37 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client dc88b1ac-f571-192a-a71e-3c30ff0f97bc (at 10.8.7.8@o2ib6) reconnecting May 02 20:23:37 fir-md1-s2 kernel: Lustre: 122671:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556853809/real 1556853809] req@ffff9327a8edbf00 x1632265677873424/t0(0) o104->fir-MDT0001@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556853816 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 02 20:23:37 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.7.8@o2ib6) May 02 20:23:51 fir-md1-s2 kernel: Lustre: 122671:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556853823/real 1556853823] req@ffff9327a8edbf00 x1632265677873424/t0(0) o104->fir-MDT0001@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556853830 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 02 20:23:51 fir-md1-s2 kernel: Lustre: 122671:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message May 02 20:23:58 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client dc88b1ac-f571-192a-a71e-3c30ff0f97bc (at 10.8.7.8@o2ib6) reconnecting May 02 20:23:58 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.7.8@o2ib6) May 02 20:24:12 fir-md1-s2 kernel: Lustre: 122671:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556853845/real 1556853845] req@ffff9327a8edbf00 x1632265677873424/t0(0) o104->fir-MDT0001@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556853852 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 02 20:24:12 fir-md1-s2 kernel: Lustre: 122671:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages May 02 20:24:19 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client dc88b1ac-f571-192a-a71e-3c30ff0f97bc (at 10.8.7.8@o2ib6) reconnecting May 02 20:24:19 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.7.8@o2ib6) May 02 20:24:40 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client dc88b1ac-f571-192a-a71e-3c30ff0f97bc (at 10.8.7.8@o2ib6) reconnecting May 02 20:24:47 fir-md1-s2 kernel: Lustre: 122671:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556853880/real 1556853880] req@ffff9327a8edbf00 x1632265677873424/t0(0) o104->fir-MDT0001@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556853887 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 02 20:24:47 fir-md1-s2 kernel: Lustre: 122671:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages May 02 20:25:01 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client dc88b1ac-f571-192a-a71e-3c30ff0f97bc (at 10.8.7.8@o2ib6) reconnecting May 02 20:25:01 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.7.8@o2ib6) May 02 20:25:01 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 20:25:22 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client dc88b1ac-f571-192a-a71e-3c30ff0f97bc (at 10.8.7.8@o2ib6) reconnecting May 02 20:25:36 fir-md1-s2 kernel: LustreError: 122671:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.27.23@o2ib6) returned error from blocking AST (req@ffff9327a8edbf00 x1632265677873424 status -107 rc -107), evict it ns: mdt-fir-MDT0001_UUID lock: ffff930e37749b00/0x1c35e9bdbba996da lrc: 4/0,0 mode: PR/PR res: [0x2400267a5:0x5:0x0].0x0 bits 0x40/0x0 rrc: 10 type: IBT flags: 0x60000400000020 nid: 10.8.27.23@o2ib6 remote: 0x7123f5e32cf21624 expref: 101 pid: 122262 timeout: 379323 lvb_type: 0 May 02 20:25:36 fir-md1-s2 kernel: LustreError: 138-a: fir-MDT0001: A client on nid 10.8.27.23@o2ib6 was evicted due to a lock blocking callback time out: rc -107 May 02 20:25:36 fir-md1-s2 kernel: LustreError: 121399:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 141s: evicting client at 10.8.27.23@o2ib6 ns: mdt-fir-MDT0001_UUID lock: ffff930e37749b00/0x1c35e9bdbba996da lrc: 3/0,0 mode: PR/PR res: [0x2400267a5:0x5:0x0].0x0 bits 0x40/0x0 rrc: 10 type: IBT flags: 0x60000400000020 nid: 10.8.27.23@o2ib6 remote: 0x7123f5e32cf21624 expref: 102 pid: 122262 timeout: 0 lvb_type: 0 May 02 20:26:16 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 0840f217-cd93-17cc-422e-1eb9c197e388 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931507abf000, cur 1556853976 expire 1556853826 last 1556853749 May 02 20:26:16 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 20:30:35 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 02 20:30:35 fir-md1-s2 kernel: Lustre: Skipped 2 previous similar messages May 02 20:34:06 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 066d7298-dc05-7eb2-47bb-6555ef5cc631 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93206d2ac800, cur 1556854446 expire 1556854296 last 1556854219 May 02 20:34:45 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 02 20:41:27 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 276f680b-b322-5691-4bcf-a98e1795221f (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93203fc59400, cur 1556854887 expire 1556854737 last 1556854660 May 02 20:41:27 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 20:45:28 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 02 20:45:28 fir-md1-s2 kernel: Lustre: Skipped 2 previous similar messages May 02 20:54:43 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 8c2e7b34-1340-8b6e-9bb9-f4ee3d090d2e (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930098d9ec00, cur 1556855683 expire 1556855533 last 1556855456 May 02 20:54:43 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 02 20:55:05 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 02 20:55:05 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 02 20:58:22 fir-md1-s2 kernel: Lustre: 122189:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556855895/real 1556855895] req@ffff932f1c02d100 x1632266025549744/t0(0) o104->fir-MDT0001@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556855902 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 02 20:58:22 fir-md1-s2 kernel: Lustre: 122189:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 7 previous similar messages May 02 20:58:30 fir-md1-s2 kernel: Lustre: 122024:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff9316386d2d00 x1631547243559696/t0(0) o101->f653589b-eefb-abf7-a1b5-4c7dd788fc78@10.8.7.16@o2ib6:5/0 lens 480/568 e 1 to 0 dl 1556855915 ref 2 fl Interpret:/0/0 rc 0/0 May 02 20:58:36 fir-md1-s2 kernel: Lustre: 122189:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556855909/real 1556855909] req@ffff932f1c02d100 x1632266025549744/t0(0) o104->fir-MDT0001@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556855916 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 02 20:58:36 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client f653589b-eefb-abf7-a1b5-4c7dd788fc78 (at 10.8.7.16@o2ib6) reconnecting May 02 20:58:36 fir-md1-s2 kernel: Lustre: 122189:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message May 02 20:58:58 fir-md1-s2 kernel: Lustre: 122189:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556855930/real 1556855930] req@ffff932f1c02d100 x1632266025549744/t0(0) o104->fir-MDT0001@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556855937 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 02 20:58:58 fir-md1-s2 kernel: Lustre: 122189:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages May 02 20:58:58 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client f653589b-eefb-abf7-a1b5-4c7dd788fc78 (at 10.8.7.16@o2ib6) reconnecting May 02 20:59:19 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client f653589b-eefb-abf7-a1b5-4c7dd788fc78 (at 10.8.7.16@o2ib6) reconnecting May 02 20:59:33 fir-md1-s2 kernel: Lustre: 122189:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556855966/real 1556855966] req@ffff932f1c02d100 x1632266025549744/t0(0) o104->fir-MDT0001@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556855973 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 02 20:59:33 fir-md1-s2 kernel: Lustre: 122189:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages May 02 20:59:40 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client f653589b-eefb-abf7-a1b5-4c7dd788fc78 (at 10.8.7.16@o2ib6) reconnecting May 02 21:00:01 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client f653589b-eefb-abf7-a1b5-4c7dd788fc78 (at 10.8.7.16@o2ib6) reconnecting May 02 21:00:17 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 0e969bb3-055a-5246-ded1-15bd8520d2e1 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff932f0d25d000, cur 1556856017 expire 1556855867 last 1556855790 May 02 21:00:17 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 21:28:41 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client a886a0d6-d9f2-9bc7-ea16-457c939e6e92 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9319c1234000, cur 1556857721 expire 1556857571 last 1556857494 May 02 21:28:41 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 02 21:29:18 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 02 21:29:18 fir-md1-s2 kernel: Lustre: Skipped 10 previous similar messages May 02 21:35:38 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 7ad6038c-e130-9e19-0552-3b91313cb0c0 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93163e2c7000, cur 1556858138 expire 1556857988 last 1556857911 May 02 21:35:38 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 21:36:24 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 02 21:36:24 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 21:46:41 fir-md1-s2 kernel: LustreError: 122638:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.27.23@o2ib6) returned error from blocking AST (req@ffff9306fbe2c800 x1632266508726560 status -107 rc -107), evict it ns: mdt-fir-MDT0001_UUID lock: ffff931b3f720d80/0x1c35e9be1bd6749a lrc: 4/0,0 mode: PR/PR res: [0x24002670e:0x23:0x0].0x0 bits 0x40/0x0 rrc: 7 type: IBT flags: 0x60000400000020 nid: 10.8.27.23@o2ib6 remote: 0xf497ee6af17a63ac expref: 148 pid: 122358 timeout: 384188 lvb_type: 0 May 02 21:46:41 fir-md1-s2 kernel: LustreError: 138-a: fir-MDT0001: A client on nid 10.8.27.23@o2ib6 was evicted due to a lock blocking callback time out: rc -107 May 02 21:46:41 fir-md1-s2 kernel: LustreError: 121399:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 0s: evicting client at 10.8.27.23@o2ib6 ns: mdt-fir-MDT0001_UUID lock: ffff931b3f720d80/0x1c35e9be1bd6749a lrc: 3/0,0 mode: PR/PR res: [0x24002670e:0x23:0x0].0x0 bits 0x40/0x0 rrc: 7 type: IBT flags: 0x60000400000020 nid: 10.8.27.23@o2ib6 remote: 0xf497ee6af17a63ac expref: 149 pid: 122358 timeout: 0 lvb_type: 0 May 02 21:46:55 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 89fe1949-4d94-3b7b-9566-994a2e556f6a (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93272e692000, cur 1556858815 expire 1556858665 last 1556858588 May 02 21:46:55 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 21:47:28 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 02 21:47:28 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 21:54:44 fir-md1-s2 kernel: Lustre: 122313:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556859276/real 1556859276] req@ffff933c38375d00 x1632266586373344/t0(0) o104->fir-MDT0001@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556859283 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 02 21:54:44 fir-md1-s2 kernel: Lustre: 122313:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 6 previous similar messages May 02 21:54:51 fir-md1-s2 kernel: Lustre: 122361:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff93405037ce00 x1631584247944320/t296555653806(0) o36->bdad8a00-34c7-2f9a-b17d-c5a4e4bbe54f@10.9.106.19@o2ib4:26/0 lens 488/3152 e 1 to 0 dl 1556859296 ref 2 fl Interpret:/0/0 rc 0/0 May 02 21:54:52 fir-md1-s2 kernel: Lustre: 121418:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff932065ee1200 x1631546982726992/t296555653852(0) o36->c6274fea-902b-f634-b05d-2d475f88b926@10.9.104.71@o2ib4:27/0 lens 488/3152 e 1 to 0 dl 1556859297 ref 2 fl Interpret:/0/0 rc 0/0 May 02 21:54:52 fir-md1-s2 kernel: Lustre: 121418:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message May 02 21:54:52 fir-md1-s2 kernel: Lustre: 122199:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556859285/real 1556859285] req@ffff930eae753c00 x1632266587810448/t0(0) o104->fir-MDT0001@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556859292 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 02 21:54:52 fir-md1-s2 kernel: Lustre: 122199:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 6 previous similar messages May 02 21:54:58 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client bdad8a00-34c7-2f9a-b17d-c5a4e4bbe54f (at 10.9.106.19@o2ib4) reconnecting May 02 21:54:58 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.106.19@o2ib4) May 02 21:55:00 fir-md1-s2 kernel: Lustre: 122040:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff9305652f0900 x1631575465969488/t0(0) o101->661f0cfa-e148-dc98-69cd-517192e597e7@10.8.7.3@o2ib6:5/0 lens 480/568 e 1 to 0 dl 1556859305 ref 2 fl Interpret:/0/0 rc 0/0 May 02 21:55:06 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 661f0cfa-e148-dc98-69cd-517192e597e7 (at 10.8.7.3@o2ib6) reconnecting May 02 21:55:06 fir-md1-s2 kernel: Lustre: Skipped 2 previous similar messages May 02 21:55:08 fir-md1-s2 kernel: Lustre: 122140:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff932072af9e00 x1631325466116688/t0(0) o101->85c0e001-f923-b111-e64e-bb958e792ce8@10.8.25.25@o2ib6:13/0 lens 480/568 e 0 to 0 dl 1556859313 ref 2 fl Interpret:/0/0 rc 0/0 May 02 21:55:08 fir-md1-s2 kernel: Lustre: 122222:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556859301/real 1556859301] req@ffff9318db763c00 x1632266589354368/t0(0) o104->fir-MDT0001@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556859308 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 02 21:55:08 fir-md1-s2 kernel: Lustre: 122222:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 11 previous similar messages May 02 21:55:14 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 85c0e001-f923-b111-e64e-bb958e792ce8 (at 10.8.25.25@o2ib6) reconnecting May 02 21:55:36 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 7da2364c-273e-9791-279a-dee1848c518b (at 10.8.25.6@o2ib6) reconnecting May 02 21:55:36 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages May 02 21:55:41 fir-md1-s2 kernel: Lustre: 122199:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556859334/real 1556859334] req@ffff930eae753c00 x1632266587810448/t0(0) o104->fir-MDT0001@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556859341 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 02 21:55:41 fir-md1-s2 kernel: Lustre: 122199:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 28 previous similar messages May 02 21:56:09 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 661f0cfa-e148-dc98-69cd-517192e597e7 (at 10.8.7.3@o2ib6) reconnecting May 02 21:56:09 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages May 02 21:56:46 fir-md1-s2 kernel: Lustre: 122222:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556859399/real 1556859399] req@ffff9318db763c00 x1632266589354368/t0(0) o104->fir-MDT0001@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556859406 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 02 21:56:46 fir-md1-s2 kernel: Lustre: 122222:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 54 previous similar messages May 02 21:57:00 fir-md1-s2 kernel: LustreError: 122222:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.27.23@o2ib6) returned error from blocking AST (req@ffff9318db763c00 x1632266589354368 status -107 rc -107), evict it ns: mdt-fir-MDT0001_UUID lock: ffff93171fb87980/0x1c35e9be26ced675 lrc: 4/0,0 mode: PR/PR res: [0x240026077:0x1d:0x0].0x0 bits 0x5b/0x0 rrc: 7 type: IBT flags: 0x60200400000020 nid: 10.8.27.23@o2ib6 remote: 0xb92337a04fb057ad expref: 98 pid: 122168 timeout: 384808 lvb_type: 0 May 02 21:57:00 fir-md1-s2 kernel: LustreError: 138-a: fir-MDT0001: A client on nid 10.8.27.23@o2ib6 was evicted due to a lock blocking callback time out: rc -107 May 02 21:57:00 fir-md1-s2 kernel: LustreError: 121399:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 126s: evicting client at 10.8.27.23@o2ib6 ns: mdt-fir-MDT0001_UUID lock: ffff93171fb87980/0x1c35e9be26ced675 lrc: 3/0,0 mode: PR/PR res: [0x240026077:0x1d:0x0].0x0 bits 0x5b/0x0 rrc: 7 type: IBT flags: 0x60200400000020 nid: 10.8.27.23@o2ib6 remote: 0xb92337a04fb057ad expref: 99 pid: 122168 timeout: 0 lvb_type: 0 May 02 21:57:59 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client ca189637-f46d-b577-70ce-b9de98e9c123 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93191a6a5400, cur 1556859479 expire 1556859329 last 1556859252 May 02 22:09:37 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 1bf1e1e9-c78f-dcda-1c43-3ea5bf95d5b3 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93193dfba400, cur 1556860177 expire 1556860027 last 1556859950 May 02 22:10:08 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 02 22:10:08 fir-md1-s2 kernel: Lustre: Skipped 35 previous similar messages May 02 22:16:46 fir-md1-s2 kernel: Lustre: 122267:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556860599/real 1556860599] req@ffff933f9af3dd00 x1632266805071968/t0(0) o104->fir-MDT0001@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556860606 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 02 22:16:46 fir-md1-s2 kernel: Lustre: 122267:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 12 previous similar messages May 02 22:16:54 fir-md1-s2 kernel: Lustre: 122382:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff933d442f1500 x1631894478361504/t0(0) o101->b0ace5e9-c2a4-c49d-1c2e-c6c1f30dfaa4@10.9.106.14@o2ib4:29/0 lens 480/568 e 1 to 0 dl 1556860619 ref 2 fl Interpret:/0/0 rc 0/0 May 02 22:16:54 fir-md1-s2 kernel: Lustre: 122382:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message May 02 22:17:00 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client b0ace5e9-c2a4-c49d-1c2e-c6c1f30dfaa4 (at 10.9.106.14@o2ib4) reconnecting May 02 22:17:00 fir-md1-s2 kernel: Lustre: Skipped 12 previous similar messages May 02 22:17:07 fir-md1-s2 kernel: Lustre: 122267:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556860620/real 1556860620] req@ffff933f9af3dd00 x1632266805071968/t0(0) o104->fir-MDT0001@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556860627 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 02 22:17:07 fir-md1-s2 kernel: Lustre: 122267:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages May 02 22:17:21 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client b0ace5e9-c2a4-c49d-1c2e-c6c1f30dfaa4 (at 10.9.106.14@o2ib4) reconnecting May 02 22:17:42 fir-md1-s2 kernel: Lustre: 122267:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556860655/real 1556860655] req@ffff933f9af3dd00 x1632266805071968/t0(0) o104->fir-MDT0001@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556860662 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 02 22:17:42 fir-md1-s2 kernel: Lustre: 122267:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages May 02 22:17:42 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client b0ace5e9-c2a4-c49d-1c2e-c6c1f30dfaa4 (at 10.9.106.14@o2ib4) reconnecting May 02 22:18:24 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client b0ace5e9-c2a4-c49d-1c2e-c6c1f30dfaa4 (at 10.9.106.14@o2ib4) reconnecting May 02 22:18:24 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 22:18:52 fir-md1-s2 kernel: Lustre: 122267:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556860725/real 1556860725] req@ffff933f9af3dd00 x1632266805071968/t0(0) o104->fir-MDT0001@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556860732 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 02 22:18:52 fir-md1-s2 kernel: Lustre: 122267:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 9 previous similar messages May 02 22:18:59 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client e4e678a0-7b8a-1717-2775-ddbcc8fc22df (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff932839e83800, cur 1556860739 expire 1556860589 last 1556860512 May 02 22:18:59 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 22:43:06 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 0b5f092b-97a6-9972-cdc1-5734afdc9cdd (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930078b7e000, cur 1556862186 expire 1556862036 last 1556861959 May 02 22:43:06 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 22:46:36 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 7884db4b-2231-5156-0f6e-711bf297b217 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff932de2303c00, cur 1556862396 expire 1556862246 last 1556862169 May 02 22:46:36 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 22:47:03 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 02 22:47:03 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages May 02 22:49:10 fir-md1-s2 kernel: Lustre: 122292:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556862543/real 1556862543] req@ffff931caffc2d00 x1632267115163968/t0(0) o104->fir-MDT0001@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556862550 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 02 22:49:18 fir-md1-s2 kernel: Lustre: 122141:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff931ab7bc7500 x1631536334912336/t0(0) o101->6519c919-e2e8-89a4-1fa4-d0ad3d892e61@10.8.27.20@o2ib6:23/0 lens 480/568 e 1 to 0 dl 1556862563 ref 2 fl Interpret:/0/0 rc 0/0 May 02 22:49:24 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 6519c919-e2e8-89a4-1fa4-d0ad3d892e61 (at 10.8.27.20@o2ib6) reconnecting May 02 22:49:24 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 22:49:24 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.20@o2ib6) May 02 22:49:31 fir-md1-s2 kernel: Lustre: 122292:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556862564/real 1556862564] req@ffff931caffc2d00 x1632267115163968/t0(0) o104->fir-MDT0001@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556862571 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 02 22:49:31 fir-md1-s2 kernel: Lustre: 122292:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages May 02 22:49:45 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 6519c919-e2e8-89a4-1fa4-d0ad3d892e61 (at 10.8.27.20@o2ib6) reconnecting May 02 22:49:55 fir-md1-s2 kernel: Lustre: 122719:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff930f22b30f00 x1631610592936128/t0(0) o101->ea216aa1-3f9e-6bba-cc60-e74ebefab95f@10.9.106.35@o2ib4:0/0 lens 480/568 e 1 to 0 dl 1556862600 ref 2 fl Interpret:/0/0 rc 0/0 May 02 22:50:04 fir-md1-s2 kernel: Lustre: 122057:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556862597/real 1556862597] req@ffff932d9fe56c00 x1632267121913776/t0(0) o104->fir-MDT0001@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556862604 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 02 22:50:04 fir-md1-s2 kernel: Lustre: 122057:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 9 previous similar messages May 02 22:50:06 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 6519c919-e2e8-89a4-1fa4-d0ad3d892e61 (at 10.8.27.20@o2ib6) reconnecting May 02 22:50:06 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 22:50:08 fir-md1-s2 kernel: Lustre: 122873:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff933032aeb900 x1631543680220752/t0(0) o101->393b5fde-e98f-60d4-0397-472006e679db@10.8.27.16@o2ib6:13/0 lens 480/568 e 0 to 0 dl 1556862613 ref 2 fl Interpret:/0/0 rc 0/0 May 02 22:50:20 fir-md1-s2 kernel: LustreError: 122292:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.27.23@o2ib6) returned error from blocking AST (req@ffff931caffc2d00 x1632267115163968 status -107 rc -107), evict it ns: mdt-fir-MDT0001_UUID lock: ffff931c66f5f740/0x1c35e9be6ff4d73e lrc: 4/0,0 mode: PR/PR res: [0x240026062:0x10:0x0].0x0 bits 0x40/0x0 rrc: 7 type: IBT flags: 0x60000400000020 nid: 10.8.27.23@o2ib6 remote: 0x1ab464ff9ccc74ff expref: 145 pid: 121991 timeout: 388008 lvb_type: 0 May 02 22:50:20 fir-md1-s2 kernel: LustreError: 138-a: fir-MDT0001: A client on nid 10.8.27.23@o2ib6 was evicted due to a lock blocking callback time out: rc -107 May 02 22:50:20 fir-md1-s2 kernel: LustreError: 121399:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 77s: evicting client at 10.8.27.23@o2ib6 ns: mdt-fir-MDT0001_UUID lock: ffff931c66f5f740/0x1c35e9be6ff4d73e lrc: 3/0,0 mode: PR/PR res: [0x240026062:0x10:0x0].0x0 bits 0x40/0x0 rrc: 7 type: IBT flags: 0x60000400000020 nid: 10.8.27.23@o2ib6 remote: 0x1ab464ff9ccc74ff expref: 146 pid: 121991 timeout: 0 lvb_type: 0 May 02 22:51:02 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client cfd376f5-99eb-64b3-852c-7ec755f9647e (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931b8eccd000, cur 1556862662 expire 1556862512 last 1556862435 May 02 22:51:02 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 22:56:50 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 02 22:56:50 fir-md1-s2 kernel: Lustre: Skipped 7 previous similar messages May 02 22:57:14 fir-md1-s2 kernel: Lustre: 122126:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556863027/real 1556863027] req@ffff9329863b8900 x1632267194213040/t0(0) o104->fir-MDT0001@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556863034 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 02 22:57:14 fir-md1-s2 kernel: Lustre: 122126:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 7 previous similar messages May 02 22:57:32 fir-md1-s2 kernel: Lustre: 122353:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff932df0718300 x1631544634042992/t0(0) o101->cead7d10-a870-f1c4-8ddf-757d1d8e738a@10.9.104.67@o2ib4:7/0 lens 480/568 e 0 to 0 dl 1556863057 ref 2 fl Interpret:/0/0 rc 0/0 May 02 22:57:38 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client cead7d10-a870-f1c4-8ddf-757d1d8e738a (at 10.9.104.67@o2ib4) reconnecting May 02 22:57:38 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 22:59:12 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client cead7d10-a870-f1c4-8ddf-757d1d8e738a (at 10.9.104.67@o2ib4) reconnecting May 02 22:59:12 fir-md1-s2 kernel: Lustre: Skipped 2 previous similar messages May 02 22:59:27 fir-md1-s2 kernel: Lustre: 122126:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556863160/real 1556863160] req@ffff9329863b8900 x1632267194213040/t0(0) o104->fir-MDT0001@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556863167 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 02 22:59:27 fir-md1-s2 kernel: Lustre: 122126:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 18 previous similar messages May 02 22:59:41 fir-md1-s2 kernel: LustreError: 122126:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.27.23@o2ib6) failed to reply to blocking AST (req@ffff9329863b8900 x1632267194213040 status 0 rc -110), evict it ns: mdt-fir-MDT0001_UUID lock: ffff93269aaf2880/0x1c35e9be755f9a81 lrc: 4/0,0 mode: PR/PR res: [0x2400267b2:0x1:0x0].0x0 bits 0x5b/0x0 rrc: 9 type: IBT flags: 0x60200400000020 nid: 10.8.27.23@o2ib6 remote: 0x7166586f1c0954b6 expref: 100 pid: 122000 timeout: 388562 lvb_type: 0 May 02 22:59:41 fir-md1-s2 kernel: LustreError: 138-a: fir-MDT0001: A client on nid 10.8.27.23@o2ib6 was evicted due to a lock blocking callback time out: rc -110 May 02 22:59:41 fir-md1-s2 kernel: LustreError: 121399:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 154s: evicting client at 10.8.27.23@o2ib6 ns: mdt-fir-MDT0001_UUID lock: ffff93269aaf2880/0x1c35e9be755f9a81 lrc: 3/0,0 mode: PR/PR res: [0x2400267b2:0x1:0x0].0x0 bits 0x5b/0x0 rrc: 9 type: IBT flags: 0x60200400000020 nid: 10.8.27.23@o2ib6 remote: 0x7166586f1c0954b6 expref: 101 pid: 122000 timeout: 0 lvb_type: 0 May 02 22:59:54 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client f2785026-ba97-0a64-f4a9-63b058631860 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff933bf33fa800, cur 1556863194 expire 1556863044 last 1556862967 May 02 23:06:06 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 527c5c76-e36c-6212-b0c9-7fb694ea6bf9 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff933e2cec2000, cur 1556863566 expire 1556863416 last 1556863339 May 02 23:07:01 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 02 23:07:01 fir-md1-s2 kernel: Lustre: Skipped 7 previous similar messages May 02 23:07:22 fir-md1-s2 kernel: Lustre: 122731:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff932b1fe57800 x1631534791564592/t0(0) o101->a62b9648-73d4-4e84-cbc3-4dd2cc8c6b56@10.9.106.24@o2ib4:27/0 lens 480/568 e 1 to 0 dl 1556863647 ref 2 fl Interpret:/0/0 rc 0/0 May 02 23:07:28 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client a62b9648-73d4-4e84-cbc3-4dd2cc8c6b56 (at 10.9.106.24@o2ib4) reconnecting May 02 23:07:38 fir-md1-s2 kernel: Lustre: 121988:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556863627/real 1556863627] req@ffff932bac2d5400 x1632267284805184/t0(0) o104->fir-MDT0001@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556863658 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 02 23:07:38 fir-md1-s2 kernel: Lustre: 121988:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages May 02 23:08:09 fir-md1-s2 kernel: Lustre: 122004:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff932ad87b9500 x1631565240796240/t0(0) o101->7cc0f019-7fa6-17b1-76f1-8ecb3c84ba82@10.8.27.24@o2ib6:14/0 lens 480/568 e 1 to 0 dl 1556863694 ref 2 fl Interpret:/0/0 rc 0/0 May 02 23:09:42 fir-md1-s2 kernel: LustreError: 121988:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.27.23@o2ib6) failed to reply to blocking AST (req@ffff932bac2d5400 x1632267284805184 status 0 rc -110), evict it ns: mdt-fir-MDT0001_UUID lock: ffff932de7e0a880/0x1c35e9be816f6167 lrc: 4/0,0 mode: PR/PR res: [0x2400266ee:0x1a:0x0].0x0 bits 0x5b/0x0 rrc: 6 type: IBT flags: 0x60200400000020 nid: 10.8.27.23@o2ib6 remote: 0xd9606bfc7064136c expref: 204 pid: 122057 timeout: 389139 lvb_type: 0 May 02 23:09:42 fir-md1-s2 kernel: LustreError: 138-a: fir-MDT0001: A client on nid 10.8.27.23@o2ib6 was evicted due to a lock blocking callback time out: rc -110 May 02 23:09:42 fir-md1-s2 kernel: LustreError: 121399:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 155s: evicting client at 10.8.27.23@o2ib6 ns: mdt-fir-MDT0001_UUID lock: ffff932de7e0a880/0x1c35e9be816f6167 lrc: 3/0,0 mode: PR/PR res: [0x2400266ee:0x1a:0x0].0x0 bits 0x5b/0x0 rrc: 6 type: IBT flags: 0x60200400000020 nid: 10.8.27.23@o2ib6 remote: 0xd9606bfc7064136c expref: 205 pid: 122057 timeout: 0 lvb_type: 0 May 02 23:10:44 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 746bfd56-d0d2-3dc8-6c36-864fa752d244 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931b5af90000, cur 1556863844 expire 1556863694 last 1556863617 May 02 23:10:44 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 23:15:27 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client f46af3cd-c06c-a547-e772-8a943f29af08 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931ce5bdcc00, cur 1556864127 expire 1556863977 last 1556863900 May 02 23:18:23 fir-md1-s2 kernel: Lustre: 122358:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556864296/real 1556864296] req@ffff931844729b00 x1632267385897376/t0(0) o104->fir-MDT0001@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556864303 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 02 23:18:23 fir-md1-s2 kernel: Lustre: 122358:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 7 previous similar messages May 02 23:18:31 fir-md1-s2 kernel: Lustre: 122093:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff931f7020cb00 x1631543680408928/t0(0) o101->393b5fde-e98f-60d4-0397-472006e679db@10.8.27.16@o2ib6:6/0 lens 480/568 e 1 to 0 dl 1556864316 ref 2 fl Interpret:/0/0 rc 0/0 May 02 23:18:37 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 393b5fde-e98f-60d4-0397-472006e679db (at 10.8.27.16@o2ib6) reconnecting May 02 23:18:37 fir-md1-s2 kernel: Lustre: Skipped 11 previous similar messages May 02 23:18:37 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.16@o2ib6) May 02 23:18:37 fir-md1-s2 kernel: Lustre: Skipped 16 previous similar messages May 02 23:19:04 fir-md1-s2 kernel: Lustre: 122661:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff930ed2bbfb00 x1631894478783072/t0(0) o101->b0ace5e9-c2a4-c49d-1c2e-c6c1f30dfaa4@10.9.106.14@o2ib4:9/0 lens 480/568 e 1 to 0 dl 1556864349 ref 2 fl Interpret:/0/0 rc 0/0 May 02 23:20:13 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 36049bdb-9701-cbf2-c7a5-67dcc86922f0 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff932a15fd2400, cur 1556864413 expire 1556864263 last 1556864186 May 02 23:20:13 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 23:20:13 fir-md1-s2 kernel: Lustre: 122229:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (83:1s); client may timeout. req@ffff930ed2bbfb00 x1631894478783072/t0(0) o101->b0ace5e9-c2a4-c49d-1c2e-c6c1f30dfaa4@10.9.106.14@o2ib4:9/0 lens 480/536 e 1 to 0 dl 1556864412 ref 1 fl Complete:/0/0 rc 0/0 May 02 23:27:28 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client e2c3d463-602c-f780-2a86-11143700f970 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930eb6b62800, cur 1556864848 expire 1556864698 last 1556864621 May 02 23:27:28 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 23:31:47 fir-md1-s2 kernel: Lustre: 122671:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556865100/real 1556865100] req@ffff9329fe718000 x1632267529904912/t0(0) o104->fir-MDT0001@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556865107 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 02 23:31:47 fir-md1-s2 kernel: Lustre: 122671:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 55 previous similar messages May 02 23:31:55 fir-md1-s2 kernel: Lustre: 121636:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff932bd4683900 x1631575466969920/t0(0) o101->661f0cfa-e148-dc98-69cd-517192e597e7@10.8.7.3@o2ib6:0/0 lens 480/568 e 1 to 0 dl 1556865120 ref 2 fl Interpret:/0/0 rc 0/0 May 02 23:31:55 fir-md1-s2 kernel: Lustre: 121636:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 3 previous similar messages May 02 23:32:01 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 661f0cfa-e148-dc98-69cd-517192e597e7 (at 10.8.7.3@o2ib6) reconnecting May 02 23:32:01 fir-md1-s2 kernel: Lustre: Skipped 13 previous similar messages May 02 23:32:01 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 661f0cfa-e148-dc98-69cd-517192e597e7 (at 10.8.7.3@o2ib6) May 02 23:32:01 fir-md1-s2 kernel: Lustre: Skipped 17 previous similar messages May 02 23:32:36 fir-md1-s2 kernel: LustreError: 122671:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.27.23@o2ib6) returned error from blocking AST (req@ffff9329fe718000 x1632267529904912 status -107 rc -107), evict it ns: mdt-fir-MDT0001_UUID lock: ffff931f9fe14c80/0x1c35e9be9e706490 lrc: 4/0,0 mode: PR/PR res: [0x240026742:0xa:0x0].0x0 bits 0x40/0x0 rrc: 7 type: IBT flags: 0x60000400000020 nid: 10.8.27.23@o2ib6 remote: 0x516dcf026a592f36 expref: 100 pid: 122234 timeout: 390544 lvb_type: 0 May 02 23:32:36 fir-md1-s2 kernel: LustreError: 138-a: fir-MDT0001: A client on nid 10.8.27.23@o2ib6 was evicted due to a lock blocking callback time out: rc -107 May 02 23:32:36 fir-md1-s2 kernel: LustreError: 121399:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 56s: evicting client at 10.8.27.23@o2ib6 ns: mdt-fir-MDT0001_UUID lock: ffff931f9fe14c80/0x1c35e9be9e706490 lrc: 3/0,0 mode: PR/PR res: [0x240026742:0xa:0x0].0x0 bits 0x40/0x0 rrc: 7 type: IBT flags: 0x60000400000020 nid: 10.8.27.23@o2ib6 remote: 0x516dcf026a592f36 expref: 101 pid: 122234 timeout: 0 lvb_type: 0 May 02 23:33:25 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 27614c77-11fd-6dcf-e26a-cffa7b35755b (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9327502bf000, cur 1556865205 expire 1556865055 last 1556864978 May 02 23:33:25 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 23:44:50 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 02 23:44:50 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages May 02 23:44:53 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 97b571d3-7066-7518-9f4d-1fc69dc6a7d5 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9329f52fc000, cur 1556865893 expire 1556865743 last 1556865666 May 02 23:44:53 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages May 02 23:55:40 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.10.9@o2ib6) May 02 23:55:40 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 02 23:58:18 fir-md1-s2 kernel: Lustre: 122026:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556866691/real 1556866691] req@ffff930af9f76900 x1632267793649872/t0(0) o104->fir-MDT0001@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556866698 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 02 23:58:18 fir-md1-s2 kernel: Lustre: 122026:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 7 previous similar messages May 02 23:58:23 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 1230389d-aeba-96dd-fca1-60baa2e7677a (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93272e76c000, cur 1556866703 expire 1556866553 last 1556866476 May 02 23:58:23 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages May 02 23:58:26 fir-md1-s2 kernel: Lustre: 122354:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff933c73f58000 x1631534706600912/t0(0) o101->6523185d-2af8-5c1c-4bf2-2f57eecfc4af@10.9.106.38@o2ib4:1/0 lens 584/3264 e 1 to 0 dl 1556866711 ref 2 fl Interpret:/0/0 rc 0/0 May 02 23:58:27 fir-md1-s2 kernel: Lustre: 122354:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff933e9d3e7b00 x1631559557762640/t0(0) o101->d5482dec-dd94-00f8-737a-5b6b97429b46@10.9.106.50@o2ib4:2/0 lens 584/3264 e 1 to 0 dl 1556866712 ref 2 fl Interpret:/0/0 rc 0/0 May 02 23:58:27 fir-md1-s2 kernel: Lustre: 122354:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message May 02 23:58:32 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client f37c3da1-0e56-86e1-dca2-c29b3ae80868 (at 10.9.112.9@o2ib4) reconnecting May 02 23:58:32 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 02 23:58:55 fir-md1-s2 kernel: Lustre: 121981:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff93202c220300 x1632257517885152/t0(0) o101->87dc7894-17ad-35e9-debd-4fd0a600b9db@10.8.1.3@o2ib6:0/0 lens 576/3264 e 0 to 0 dl 1556866740 ref 2 fl Interpret:/0/0 rc 0/0 May 02 23:59:35 fir-md1-s2 kernel: Lustre: 122026:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556866768/real 1556866768] req@ffff930af9f76900 x1632267793649872/t0(0) o104->fir-MDT0001@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556866775 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 02 23:59:35 fir-md1-s2 kernel: Lustre: 122026:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 21 previous similar messages May 02 23:59:35 fir-md1-s2 kernel: LustreError: 122026:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.27.23@o2ib6) returned error from blocking AST (req@ffff930af9f76900 x1632267793649872 status -107 rc -107), evict it ns: mdt-fir-MDT0001_UUID lock: ffff9315a6ed3f00/0x1c35e9bec61b8cb8 lrc: 4/0,0 mode: PR/PR res: [0x240024003:0x1669d:0x0].0x0 bits 0x13/0x0 rrc: 10 type: IBT flags: 0x60200400000020 nid: 10.8.27.23@o2ib6 remote: 0x20a7a9a901ba3c3c expref: 94 pid: 122159 timeout: 392163 lvb_type: 0 May 02 23:59:35 fir-md1-s2 kernel: LustreError: 138-a: fir-MDT0001: A client on nid 10.8.27.23@o2ib6 was evicted due to a lock blocking callback time out: rc -107 May 02 23:59:35 fir-md1-s2 kernel: LustreError: 121399:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 84s: evicting client at 10.8.27.23@o2ib6 ns: mdt-fir-MDT0001_UUID lock: ffff9315a6ed3f00/0x1c35e9bec61b8cb8 lrc: 3/0,0 mode: PR/PR res: [0x240024003:0x1669d:0x0].0x0 bits 0x13/0x0 rrc: 11 type: IBT flags: 0x60200400000020 nid: 10.8.27.23@o2ib6 remote: 0x20a7a9a901ba3c3c expref: 95 pid: 122159 timeout: 0 lvb_type: 0 May 02 23:59:35 fir-md1-s2 kernel: Lustre: 122026:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (83:1s); client may timeout. req@ffff9305ac29d400 x1631284028592544/t296557825414(0) o36->f37c3da1-0e56-86e1-dca2-c29b3ae80868@10.9.112.9@o2ib4:1/0 lens 504/416 e 1 to 0 dl 1556866774 ref 1 fl Complete:/0/0 rc 0/0 May 03 00:08:23 fir-md1-s2 kernel: Lustre: 122262:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 03 00:08:23 fir-md1-s2 kernel: Lustre: 122262:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 40 previous similar messages May 03 00:08:35 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 03 00:08:35 fir-md1-s2 kernel: Lustre: Skipped 19 previous similar messages May 03 00:12:40 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 4df8c6b1-51dc-6010-4fe0-9d1cd6584ae6 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff933ce4fa2000, cur 1556867560 expire 1556867410 last 1556867333 May 03 00:12:40 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages May 03 00:23:08 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client fb911caf-e007-d19d-0ab4-1d2f1342b991 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93155764d400, cur 1556868188 expire 1556868038 last 1556867961 May 03 00:23:08 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 03 00:23:40 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 03 00:23:40 fir-md1-s2 kernel: Lustre: Skipped 7 previous similar messages May 03 00:33:58 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 0ea3dd46-e104-3244-2691-cb70c35dd1c4 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff933f8cef6400, cur 1556868838 expire 1556868688 last 1556868611 May 03 00:33:58 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 03 00:37:16 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 03 00:37:16 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 03 00:45:50 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 70acfceb-171a-b002-d478-76bd6c68048d (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93194deddc00, cur 1556869550 expire 1556869400 last 1556869323 May 03 00:45:50 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages May 03 00:48:46 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 03 00:48:46 fir-md1-s2 kernel: Lustre: Skipped 7 previous similar messages May 03 00:58:35 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client f57ce904-ce0a-04b9-b70d-bd94c750bb0c (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff933d89a19800, cur 1556870315 expire 1556870165 last 1556870088 May 03 00:58:35 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages May 03 00:59:04 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 03 00:59:04 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 03 01:03:04 fir-md1-s2 kernel: LNetError: 121178:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 03 01:10:39 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client dca54525-8481-63ab-8976-a7771ede0944 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931b572df000, cur 1556871039 expire 1556870889 last 1556870812 May 03 01:10:39 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 03 01:14:23 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 03 01:14:23 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 03 01:21:46 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 35f63ed0-5e1b-a5dd-b9a5-15c8f8126be3 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931ee17a8000, cur 1556871706 expire 1556871556 last 1556871479 May 03 01:21:46 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 03 01:30:24 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 03 01:30:24 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages May 03 01:37:09 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 71b07175-2e63-3b03-c72d-44a8ae90a894 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9318fc766800, cur 1556872629 expire 1556872479 last 1556872402 May 03 01:37:09 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 03 01:44:07 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 03 01:44:07 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 03 01:50:52 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client e67abb51-aaa3-19be-3dbc-4caacc64e59d (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff932fe0bf8800, cur 1556873452 expire 1556873302 last 1556873225 May 03 01:50:52 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 03 01:54:23 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 03 01:54:23 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 02:03:13 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 502638b4-fee2-1b8c-cf39-c17e5e932383 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931cc6381c00, cur 1556874193 expire 1556874043 last 1556873966 May 03 02:03:13 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 02:27:06 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 60f2ea10-0e78-5603-438a-5e9b49e30074 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931fff737400, cur 1556875626 expire 1556875476 last 1556875399 May 03 02:27:06 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 02:27:54 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 03 02:27:54 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 03 03:10:12 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client e37145dc-5f1e-c4eb-43eb-3b341d7e0005 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff933861b54000, cur 1556878212 expire 1556878062 last 1556877985 May 03 03:10:12 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 03:11:08 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 03 03:11:08 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 03:20:09 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 03 03:20:09 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 03:21:09 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 4140f03b-1dfc-f0a7-c960-8945edd752ff (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93404db69000, cur 1556878869 expire 1556878719 last 1556878642 May 03 03:21:09 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 03:22:58 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 03 03:22:58 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 03:25:14 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 4696c7f9-bd91-5bb6-81ad-8f0f8ff2f6cd (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931b31fa9400, cur 1556879114 expire 1556878964 last 1556878887 May 03 03:25:14 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 03 03:25:18 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 03 03:25:18 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 03:28:02 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 03 03:28:02 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to (at 10.8.27.23@o2ib6) May 03 03:29:07 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 3d436b01-383b-4068-0875-047d0a471c98 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9316a7bf9c00, cur 1556879347 expire 1556879197 last 1556879120 May 03 03:29:07 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 03:29:16 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 3d436b01-383b-4068-0875-047d0a471c98 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931e17f8a000, cur 1556879356 expire 1556879206 last 1556879129 May 03 03:30:23 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 5df88fe2-a1bd-768a-6aaa-497686cb92a0 (at 10.8.26.4@o2ib6) in 217 seconds. I think it's dead, and I am evicting it. exp ffff931968f40400, cur 1556879423 expire 1556879273 last 1556879206 May 03 03:31:50 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 03 03:31:50 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 03:38:35 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 72d6f3a7-cca7-363a-96a9-df666766080a (at 10.8.26.33@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931725ecd400, cur 1556879915 expire 1556879765 last 1556879688 May 03 03:38:35 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 03:39:24 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.26.33@o2ib6) May 03 03:39:24 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 04:55:27 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 03a03f15-3bab-e9d9-9faa-41218638973c (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93161e2af000, cur 1556884527 expire 1556884377 last 1556884300 May 03 04:55:27 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 04:56:01 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 03 04:56:01 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 05:04:27 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 51109e2c-5a1c-a049-2254-66d0dd5d889c (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff934067e1c000, cur 1556885067 expire 1556884917 last 1556884840 May 03 05:04:27 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 05:04:41 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 03 05:04:41 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 05:13:57 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 73d46309-aa6d-058e-b935-954ed7c04d8b (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff933d7e736c00, cur 1556885637 expire 1556885487 last 1556885410 May 03 05:13:57 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 05:15:06 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 03 05:15:06 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 06:32:57 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 795cda94-55c2-2172-3b32-4d79ce01f5f4 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff932c0e267400, cur 1556890377 expire 1556890227 last 1556890150 May 03 06:32:57 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 06:33:44 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 03 06:33:44 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 06:36:35 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client e3844d58-7b07-a82a-d04e-4fefd5bd1f79 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93030f2a0000, cur 1556890595 expire 1556890445 last 1556890368 May 03 06:36:35 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 06:37:19 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 03 06:37:19 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 06:40:04 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 86080157-4cef-640a-5df7-03ad2489370b (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93082bf50800, cur 1556890804 expire 1556890654 last 1556890577 May 03 06:40:04 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 06:40:53 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 03 06:40:53 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 06:48:15 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 0b56fcd1-9fe0-663c-f221-1624bfee89ef (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930ff9219400, cur 1556891295 expire 1556891145 last 1556891068 May 03 06:48:15 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 06:48:29 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 03 06:48:29 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 06:49:31 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 42457870-9868-0b26-4344-272e3da2dd49 (at 10.8.26.4@o2ib6) in 214 seconds. I think it's dead, and I am evicting it. exp ffff930c38ee2800, cur 1556891371 expire 1556891221 last 1556891157 May 03 06:49:31 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 06:51:46 fir-md1-s2 kernel: Lustre: 122285:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556891499/real 1556891499] req@ffff930e0029b600 x1632270840943408/t0(0) o106->fir-MDT0001@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1556891506 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 03 06:51:46 fir-md1-s2 kernel: Lustre: 122285:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message May 03 06:51:54 fir-md1-s2 kernel: Lustre: 122664:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff930e45b0ad00 x1632092127388816/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:29/0 lens 480/568 e 1 to 0 dl 1556891519 ref 2 fl Interpret:/0/0 rc 0/0 May 03 06:52:00 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 03 06:52:00 fir-md1-s2 kernel: Lustre: Skipped 11 previous similar messages May 03 06:52:07 fir-md1-s2 kernel: Lustre: 122285:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556891520/real 1556891520] req@ffff930e0029b600 x1632270840943408/t0(0) o106->fir-MDT0001@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1556891527 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 03 06:52:07 fir-md1-s2 kernel: Lustre: 122285:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages May 03 06:52:22 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 03 06:52:44 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 03 06:52:49 fir-md1-s2 kernel: Lustre: 122285:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556891562/real 1556891562] req@ffff930e0029b600 x1632270840943408/t0(0) o106->fir-MDT0001@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1556891569 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 03 06:52:49 fir-md1-s2 kernel: Lustre: 122285:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages May 03 06:53:15 fir-md1-s2 kernel: Lustre: 122133:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff931620383600 x1631565267232448/t0(0) o101->7cc0f019-7fa6-17b1-76f1-8ecb3c84ba82@10.8.27.24@o2ib6:20/0 lens 480/568 e 1 to 0 dl 1556891600 ref 2 fl Interpret:/0/0 rc 0/0 May 03 06:53:27 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 03 06:53:27 fir-md1-s2 kernel: Lustre: Skipped 2 previous similar messages May 03 06:53:42 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.24@o2ib6) May 03 06:53:42 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages May 03 06:53:59 fir-md1-s2 kernel: LustreError: 122285:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.27.23@o2ib6) returned error from glimpse AST (req@ffff930e0029b600 x1632270840943408 status -107 rc -107), evict it ns: mdt-fir-MDT0001_UUID lock: ffff932a3eb81b00/0x1c35e9c0ac00722d lrc: 4/0,0 mode: PW/PW res: [0x240025f19:0x54:0x0].0x0 bits 0x40/0x0 rrc: 8 type: IBT flags: 0x40200000000000 nid: 10.8.27.23@o2ib6 remote: 0x59ffdeae0c85d6f9 expref: 91 pid: 122013 timeout: 0 lvb_type: 0 May 03 06:53:59 fir-md1-s2 kernel: LustreError: 122285:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) Skipped 1 previous similar message May 03 06:53:59 fir-md1-s2 kernel: LustreError: 138-a: fir-MDT0001: A client on nid 10.8.27.23@o2ib6 was evicted due to a lock glimpse callback time out: rc -107 May 03 06:53:59 fir-md1-s2 kernel: LustreError: Skipped 1 previous similar message May 03 06:53:59 fir-md1-s2 kernel: LustreError: 121399:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 306s: evicting client at 10.8.27.23@o2ib6 ns: mdt-fir-MDT0001_UUID lock: ffff932a3eb81b00/0x1c35e9c0ac00722d lrc: 4/0,0 mode: PW/PW res: [0x240025f19:0x54:0x0].0x0 bits 0x40/0x0 rrc: 8 type: IBT flags: 0x40200000000000 nid: 10.8.27.23@o2ib6 remote: 0x59ffdeae0c85d6f9 expref: 92 pid: 122013 timeout: 0 lvb_type: 0 May 03 06:55:14 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client e37e0e64-2f2e-0f99-9039-20009038f1f8 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930f5e7f0c00, cur 1556891714 expire 1556891564 last 1556891487 May 03 06:55:14 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 06:57:48 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client c759cd7f-a22b-cf06-3a32-a610320b3d8a (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff932b49ef3000, cur 1556891868 expire 1556891718 last 1556891641 May 03 07:04:51 fir-md1-s2 kernel: Lustre: 122712:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0003: Failure to clear the changelog for user 1: -22 May 03 07:04:51 fir-md1-s2 kernel: Lustre: 122712:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 59 previous similar messages May 03 07:04:54 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 979375a7-f3b9-fdf4-cb4e-0b72f19a98b4 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9309d8b49c00, cur 1556892294 expire 1556892144 last 1556892067 May 03 07:04:54 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 07:05:11 fir-md1-s2 kernel: Lustre: 122199:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556892304/real 1556892304] req@ffff930c467e0c00 x1632270936451520/t0(0) o104->fir-MDT0001@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556892311 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 03 07:05:11 fir-md1-s2 kernel: Lustre: 122199:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 18 previous similar messages May 03 07:05:19 fir-md1-s2 kernel: Lustre: 122068:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff930d827bbc00 x1631584251643872/t0(0) o101->bdad8a00-34c7-2f9a-b17d-c5a4e4bbe54f@10.9.106.19@o2ib4:24/0 lens 480/568 e 1 to 0 dl 1556892324 ref 2 fl Interpret:/0/0 rc 0/0 May 03 07:05:25 fir-md1-s2 kernel: Lustre: 122199:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556892318/real 1556892318] req@ffff930c467e0c00 x1632270936451520/t0(0) o104->fir-MDT0001@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556892325 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 03 07:05:25 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client bdad8a00-34c7-2f9a-b17d-c5a4e4bbe54f (at 10.9.106.19@o2ib4) reconnecting May 03 07:05:25 fir-md1-s2 kernel: Lustre: Skipped 2 previous similar messages May 03 07:05:25 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.106.19@o2ib4) May 03 07:05:25 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages May 03 07:05:25 fir-md1-s2 kernel: Lustre: 122199:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message May 03 07:05:39 fir-md1-s2 kernel: LustreError: 122199:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.27.23@o2ib6) returned error from blocking AST (req@ffff930c467e0c00 x1632270936451520 status -107 rc -107), evict it ns: mdt-fir-MDT0001_UUID lock: ffff931b08ea6e40/0x1c35e9c0b7e79258 lrc: 4/0,0 mode: PR/PR res: [0x2400267b3:0x1:0x0].0x0 bits 0x5b/0x0 rrc: 6 type: IBT flags: 0x60200400000020 nid: 10.8.27.23@o2ib6 remote: 0x7272a419ce824cfc expref: 88 pid: 122704 timeout: 417727 lvb_type: 0 May 03 07:05:39 fir-md1-s2 kernel: LustreError: 138-a: fir-MDT0001: A client on nid 10.8.27.23@o2ib6 was evicted due to a lock blocking callback time out: rc -107 May 03 07:05:39 fir-md1-s2 kernel: LustreError: 121399:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 35s: evicting client at 10.8.27.23@o2ib6 ns: mdt-fir-MDT0001_UUID lock: ffff931b08ea6e40/0x1c35e9c0b7e79258 lrc: 3/0,0 mode: PR/PR res: [0x2400267b3:0x1:0x0].0x0 bits 0x5b/0x0 rrc: 6 type: IBT flags: 0x60200400000020 nid: 10.8.27.23@o2ib6 remote: 0x7272a419ce824cfc expref: 89 pid: 122704 timeout: 0 lvb_type: 0 May 03 07:13:34 fir-md1-s2 kernel: Lustre: 122867:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556892807/real 1556892807] req@ffff933ae3a7cb00 x1632271003155632/t0(0) o104->fir-MDT0001@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556892814 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 03 07:13:34 fir-md1-s2 kernel: Lustre: 122867:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages May 03 07:13:52 fir-md1-s2 kernel: Lustre: 122710:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff933d30273f00 x1631544637398048/t0(0) o101->cead7d10-a870-f1c4-8ddf-757d1d8e738a@10.9.104.67@o2ib4:27/0 lens 480/568 e 0 to 0 dl 1556892837 ref 2 fl Interpret:/0/0 rc 0/0 May 03 07:13:58 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client cead7d10-a870-f1c4-8ddf-757d1d8e738a (at 10.9.104.67@o2ib4) reconnecting May 03 07:14:16 fir-md1-s2 kernel: Lustre: 122867:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556892849/real 1556892849] req@ffff933ae3a7cb00 x1632271003155632/t0(0) o104->fir-MDT0001@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556892856 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 03 07:14:16 fir-md1-s2 kernel: Lustre: 122867:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages May 03 07:14:29 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client cead7d10-a870-f1c4-8ddf-757d1d8e738a (at 10.9.104.67@o2ib4) reconnecting May 03 07:15:09 fir-md1-s2 kernel: Lustre: 11910:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff930c87aa5d00 x1632092258091248/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:14/0 lens 480/568 e 1 to 0 dl 1556892914 ref 2 fl Interpret:/0/0 rc 0/0 May 03 07:15:16 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 03 07:15:16 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 07:15:22 fir-md1-s2 kernel: Lustre: 122267:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff933029be1500 x1631584251715312/t0(0) o101->bdad8a00-34c7-2f9a-b17d-c5a4e4bbe54f@10.9.106.19@o2ib4:27/0 lens 480/568 e 1 to 0 dl 1556892927 ref 2 fl Interpret:/0/0 rc 0/0 May 03 07:15:28 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.106.19@o2ib4) May 03 07:15:28 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages May 03 07:15:33 fir-md1-s2 kernel: Lustre: 122867:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556892926/real 1556892926] req@ffff933ae3a7cb00 x1632271003155632/t0(0) o104->fir-MDT0001@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556892933 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 03 07:15:33 fir-md1-s2 kernel: Lustre: 122867:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 18 previous similar messages May 03 07:16:01 fir-md1-s2 kernel: LustreError: 122867:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.27.23@o2ib6) failed to reply to blocking AST (req@ffff933ae3a7cb00 x1632271003155632 status 0 rc -110), evict it ns: mdt-fir-MDT0001_UUID lock: ffff932741690d80/0x1c35e9c0c004f0ce lrc: 4/0,0 mode: PR/PR res: [0x2400267b2:0x1:0x0].0x0 bits 0x5b/0x0 rrc: 6 type: IBT flags: 0x60200400000020 nid: 10.8.27.23@o2ib6 remote: 0x1d10e71d7e3d97cc expref: 86 pid: 122253 timeout: 418342 lvb_type: 0 May 03 07:16:01 fir-md1-s2 kernel: LustreError: 138-a: fir-MDT0001: A client on nid 10.8.27.23@o2ib6 was evicted due to a lock blocking callback time out: rc -110 May 03 07:16:01 fir-md1-s2 kernel: LustreError: 121399:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 154s: evicting client at 10.8.27.23@o2ib6 ns: mdt-fir-MDT0001_UUID lock: ffff932741690d80/0x1c35e9c0c004f0ce lrc: 3/0,0 mode: PR/PR res: [0x2400267b2:0x1:0x0].0x0 bits 0x5b/0x0 rrc: 6 type: IBT flags: 0x60200400000020 nid: 10.8.27.23@o2ib6 remote: 0x1d10e71d7e3d97cc expref: 87 pid: 122253 timeout: 0 lvb_type: 0 May 03 07:16:27 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 4d72d8bf-7870-a7b1-6dd1-4412f3232f67 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9304246d0000, cur 1556892987 expire 1556892837 last 1556892760 May 03 07:16:27 fir-md1-s2 kernel: Lustre: Skipped 2 previous similar messages May 03 07:30:27 fir-md1-s2 kernel: Lustre: 122732:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556893820/real 1556893820] req@ffff932f90a50300 x1632271146010352/t0(0) o104->fir-MDT0001@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556893827 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 03 07:30:27 fir-md1-s2 kernel: Lustre: 122732:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 12 previous similar messages May 03 07:30:35 fir-md1-s2 kernel: Lustre: 122720:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff932d02668300 x1631584251821568/t0(0) o101->bdad8a00-34c7-2f9a-b17d-c5a4e4bbe54f@10.9.106.19@o2ib4:10/0 lens 480/568 e 1 to 0 dl 1556893840 ref 2 fl Interpret:/0/0 rc 0/0 May 03 07:30:41 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client bdad8a00-34c7-2f9a-b17d-c5a4e4bbe54f (at 10.9.106.19@o2ib4) reconnecting May 03 07:30:41 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages May 03 07:30:41 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.106.19@o2ib4) May 03 07:30:42 fir-md1-s2 kernel: Lustre: Skipped 6 previous similar messages May 03 07:30:46 fir-md1-s2 kernel: Lustre: 122637:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff931055253000 x1632092350101072/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:21/0 lens 480/568 e 1 to 0 dl 1556893851 ref 2 fl Interpret:/0/0 rc 0/0 May 03 07:30:48 fir-md1-s2 kernel: Lustre: 122732:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556893841/real 1556893841] req@ffff932f90a50300 x1632271146010352/t0(0) o104->fir-MDT0001@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556893848 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 03 07:30:48 fir-md1-s2 kernel: Lustre: 122732:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages May 03 07:30:52 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 03 07:31:13 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 03 07:31:13 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 07:31:27 fir-md1-s2 kernel: Lustre: 122303:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556893880/real 1556893880] req@ffff930ea23bd700 x1632271147754480/t0(0) o106->fir-MDT0001@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1556893887 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 03 07:31:27 fir-md1-s2 kernel: Lustre: 122303:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 10 previous similar messages May 03 07:31:56 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 03 07:31:56 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 03 07:32:19 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 7d8e42fe-bcd7-07b8-347e-0321e1cca86b (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9307557c0000, cur 1556893939 expire 1556893789 last 1556893712 May 03 07:40:25 fir-md1-s2 kernel: Lustre: 122019:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff93202b6ca100 x1631654448609872/t0(0) o101->e891cc28-9c10-be1b-29fe-00592513d891@10.9.101.41@o2ib4:0/0 lens 480/568 e 1 to 0 dl 1556894430 ref 2 fl Interpret:/0/0 rc 0/0 May 03 07:40:31 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client e891cc28-9c10-be1b-29fe-00592513d891 (at 10.9.101.41@o2ib4) reconnecting May 03 07:40:31 fir-md1-s2 kernel: Lustre: Skipped 2 previous similar messages May 03 07:40:41 fir-md1-s2 kernel: Lustre: 121577:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556894410/real 1556894410] req@ffff931760209e00 x1632271228969296/t0(0) o104->fir-MDT0001@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556894441 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 03 07:40:41 fir-md1-s2 kernel: Lustre: 121577:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 14 previous similar messages May 03 07:40:52 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.101.41@o2ib4) May 03 07:40:52 fir-md1-s2 kernel: Lustre: Skipped 12 previous similar messages May 03 07:40:55 fir-md1-s2 kernel: Lustre: 122214:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff930fce3ad700 x1632092397572608/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:0/0 lens 480/568 e 0 to 0 dl 1556894460 ref 2 fl Interpret:/0/0 rc 0/0 May 03 07:41:43 fir-md1-s2 kernel: Lustre: 122637:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff93084efba700 x1631565267551376/t0(0) o101->7cc0f019-7fa6-17b1-76f1-8ecb3c84ba82@10.8.27.24@o2ib6:18/0 lens 480/568 e 0 to 0 dl 1556894508 ref 2 fl Interpret:/0/0 rc 0/0 May 03 07:42:24 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 4127c4d1-c98e-c193-0d1c-f3cd378b503a (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9317ec652400, cur 1556894544 expire 1556894394 last 1556894317 May 03 07:42:24 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 08:11:40 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 5a7835e8-621a-db61-297e-ea7cbe12c9c1 (at 10.8.26.33@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93256827fc00, cur 1556896300 expire 1556896150 last 1556896073 May 03 08:11:40 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 03 08:12:29 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.26.33@o2ib6) May 03 08:12:29 fir-md1-s2 kernel: Lustre: Skipped 14 previous similar messages May 03 08:26:20 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client ef054150-dfc7-5fe1-a382-ea1061512073 (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9340412f1c00, cur 1556897180 expire 1556897030 last 1556896953 May 03 08:26:20 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 08:32:25 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.10.29@o2ib6) May 03 08:32:25 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 08:38:47 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client f95662b0-697c-b601-6f35-28b2dbb10221 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930f66600400, cur 1556897927 expire 1556897777 last 1556897700 May 03 08:38:47 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 08:39:05 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 03 08:39:05 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 08:40:03 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client c2b0e589-3588-d905-f10c-bac73af67084 (at 10.8.26.33@o2ib6) in 171 seconds. I think it's dead, and I am evicting it. exp ffff930fc8637c00, cur 1556898003 expire 1556897853 last 1556897832 May 03 08:40:03 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 08:41:19 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client a95b25b7-8f79-5668-91fe-fc9eca1a25b2 (at 10.8.26.4@o2ib6) in 211 seconds. I think it's dead, and I am evicting it. exp ffff9302f1295000, cur 1556898079 expire 1556897929 last 1556897868 May 03 08:41:19 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 08:41:49 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.26.33@o2ib6) May 03 08:41:49 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 08:43:06 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 03 08:43:06 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 08:45:50 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client aac82ab8-0a52-2812-c1ac-bb4f1699e373 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930f3fb09400, cur 1556898350 expire 1556898200 last 1556898123 May 03 08:45:50 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 08:46:10 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client aac82ab8-0a52-2812-c1ac-bb4f1699e373 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930096853c00, cur 1556898370 expire 1556898220 last 1556898143 May 03 08:48:47 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 03 08:48:47 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 08:52:29 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 14afde0d-64e4-8d1e-4b0f-d9b103c23288 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93301b795000, cur 1556898749 expire 1556898599 last 1556898522 May 03 08:53:33 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 03 08:53:33 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 09:00:08 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 2c462a18-098e-7a73-a679-30c3eb8b507d (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9319d5ed9c00, cur 1556899208 expire 1556899058 last 1556898981 May 03 09:00:08 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 03 09:00:43 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 03 09:00:43 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 03 09:06:00 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 5858d1f9-928b-9b3d-4cd2-ed8b98db075a (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930bf6aed000, cur 1556899560 expire 1556899410 last 1556899333 May 03 09:06:00 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 03 09:09:05 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client dd2bd823-a9e8-ea1a-1912-9791d31c41d7 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9306977afc00, cur 1556899745 expire 1556899595 last 1556899518 May 03 09:09:05 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 09:10:45 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 03 09:10:45 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages May 03 09:17:59 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client aeea8dfb-dc0f-143b-04d1-34efebe3c6b0 (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930c37a17400, cur 1556900279 expire 1556900129 last 1556900052 May 03 09:17:59 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 03 09:21:07 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.10.29@o2ib6) May 03 09:21:07 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 03 09:24:07 fir-md1-s2 kernel: LustreError: 121650:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.27.23@o2ib6) returned error from glimpse AST (req@ffff930c2cbb0300 x1632272151459568 status -107 rc -107), evict it ns: mdt-fir-MDT0001_UUID lock: ffff931937f86780/0x1c35e9c125b70eb6 lrc: 4/0,0 mode: PW/PW res: [0x24000f325:0x33d:0x0].0x0 bits 0x40/0x0 rrc: 11 type: IBT flags: 0x40200000000000 nid: 10.8.27.23@o2ib6 remote: 0x8ce9473e60d182be expref: 77 pid: 122222 timeout: 0 lvb_type: 0 May 03 09:24:07 fir-md1-s2 kernel: LustreError: 138-a: fir-MDT0001: A client on nid 10.8.27.23@o2ib6 was evicted due to a lock glimpse callback time out: rc -107 May 03 09:24:07 fir-md1-s2 kernel: LustreError: 121399:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 300s: evicting client at 10.8.27.23@o2ib6 ns: mdt-fir-MDT0001_UUID lock: ffff932b5af39200/0x1c35e9c125b71d88 lrc: 4/0,0 mode: PW/PW res: [0x24000f325:0x33e:0x0].0x0 bits 0x40/0x0 rrc: 11 type: IBT flags: 0x40200000000000 nid: 10.8.27.23@o2ib6 remote: 0x8ce9473e60d1848c expref: 78 pid: 122673 timeout: 0 lvb_type: 0 May 03 09:24:07 fir-md1-s2 kernel: LustreError: 121650:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) Skipped 1 previous similar message May 03 09:28:56 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 4c64004d-4d28-6daa-a2d1-0bea1ed1377c (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9304a9fb3000, cur 1556900936 expire 1556900786 last 1556900709 May 03 09:28:56 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages May 03 09:34:36 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 03 09:34:36 fir-md1-s2 kernel: Lustre: Skipped 7 previous similar messages May 03 09:40:31 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client f6aa1d23-fbe2-bdc2-449f-61ee24385db5 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930303aa8000, cur 1556901631 expire 1556901481 last 1556901404 May 03 09:40:31 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages May 03 09:48:52 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 03 09:48:52 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages May 03 09:53:56 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client c65c7edb-9621-670d-78b0-b6fe77ae1c53 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930c202c3c00, cur 1556902436 expire 1556902286 last 1556902209 May 03 09:53:56 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 03 10:04:00 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.10.29@o2ib6) May 03 10:04:00 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 03 10:13:16 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 4ea3c528-b342-aa1e-412a-672520eef780 (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930ccf36b800, cur 1556903596 expire 1556903446 last 1556903369 May 03 10:13:16 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages May 03 10:17:06 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.10.29@o2ib6) May 03 10:17:06 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 03 10:25:57 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client d9c19008-9827-e671-67a3-a52e4344f679 (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930f67dc6c00, cur 1556904357 expire 1556904207 last 1556904130 May 03 10:25:57 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 10:31:02 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.10.29@o2ib6) May 03 10:31:02 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 10:41:58 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 028e0b5d-b19e-c19b-b3ed-6fa708300cb4 (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931ee6f32c00, cur 1556905318 expire 1556905168 last 1556905091 May 03 10:41:58 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 03 10:46:55 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.10.29@o2ib6) May 03 10:46:55 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 03 10:52:02 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client ed1add75-1053-6559-13c5-eb9a03aa78c9 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9325417bf800, cur 1556905922 expire 1556905772 last 1556905695 May 03 10:52:02 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 03 11:01:19 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 03 11:01:19 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages May 03 11:16:55 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 0395c81f-44b7-1d8e-ce26-0bbaedce981e (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9326b6628400, cur 1556907415 expire 1556907265 last 1556907188 May 03 11:16:55 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 03 11:18:34 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.11.9@o2ib6) May 03 11:18:34 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 11:32:50 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client b63d5e73-9cf1-a4e3-05c5-0e6123396f5a (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93181e386000, cur 1556908370 expire 1556908220 last 1556908143 May 03 11:32:50 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 11:35:08 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.11.9@o2ib6) May 03 11:35:08 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 11:45:39 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client e9b10532-56af-da7e-0d06-e7566a35c264 (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff933a513bd400, cur 1556909139 expire 1556908989 last 1556908912 May 03 11:45:39 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 11:47:34 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.11.9@o2ib6) May 03 11:47:34 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 12:08:51 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client b97f5529-05a6-671b-124b-7d37b1307439 (at 10.8.30.25@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff932077d9fc00, cur 1556910531 expire 1556910381 last 1556910304 May 03 12:08:51 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 12:09:11 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.30.25@o2ib6) May 03 12:09:11 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 12:14:08 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 076f14de-7dac-4951-1807-4b0246581885 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930e3fe14400, cur 1556910848 expire 1556910698 last 1556910621 May 03 12:14:08 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 12:15:00 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 03 12:15:00 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 12:21:55 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 0e30e826-5440-f115-6784-21980943bb7a (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff932dd9f6b800, cur 1556911315 expire 1556911165 last 1556911088 May 03 12:21:55 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 12:23:44 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.11.9@o2ib6) May 03 12:23:44 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 12:40:44 fir-md1-s2 kernel: Lustre: 122282:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff9316d9607500 x1631535481676032/t0(0) o101->f5d73076-d037-5b0c-43b9-6f23831eb2e5@10.8.8.16@o2ib6:18/0 lens 480/568 e 1 to 0 dl 1556912448 ref 2 fl Interpret:/0/0 rc 0/0 May 03 12:40:49 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client f5d73076-d037-5b0c-43b9-6f23831eb2e5 (at 10.8.8.16@o2ib6) reconnecting May 03 12:40:49 fir-md1-s2 kernel: Lustre: Skipped 10 previous similar messages May 03 12:40:49 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to (at 10.8.8.16@o2ib6) May 03 12:40:54 fir-md1-s2 kernel: Lustre: 122345:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff9339fba5a100 x1631543133606784/t0(0) o101->5d6a1f91-8e73-2d03-02f2-a06db17ace68@10.9.104.33@o2ib4:29/0 lens 480/568 e 0 to 0 dl 1556912459 ref 2 fl Interpret:/0/0 rc 0/0 May 03 12:40:58 fir-md1-s2 kernel: LustreError: 121399:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.9.104.47@o2ib4 ns: mdt-fir-MDT0003_UUID lock: ffff932f1c1d45c0/0x1c35e9c196387104 lrc: 3/0,0 mode: PW/PW res: [0x2800259ce:0x7f:0x0].0x0 bits 0x40/0x0 rrc: 83 type: IBT flags: 0x60200400000020 nid: 10.9.104.47@o2ib4 remote: 0xeb2683dfdc6125be expref: 9431 pid: 122720 timeout: 437696 lvb_type: 0 May 03 12:40:59 fir-md1-s2 kernel: LustreError: 122185:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff93273fbb6f00 x1632274274433504/t0(0) o104->fir-MDT0003@10.9.104.47@o2ib4:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 May 03 12:40:59 fir-md1-s2 kernel: LustreError: 121399:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.9.108.17@o2ib4 ns: mdt-fir-MDT0003_UUID lock: ffff9315b8f26780/0x1c35e9c1963a5b43 lrc: 3/0,0 mode: PW/PW res: [0x280025c8b:0x7:0x0].0x0 bits 0x40/0x0 rrc: 65 type: IBT flags: 0x60200400000020 nid: 10.9.108.17@o2ib4 remote: 0x1c5c55bc32972ca0 expref: 9378 pid: 122870 timeout: 437697 lvb_type: 0 May 03 12:40:59 fir-md1-s2 kernel: LustreError: 111710:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_convert from 10.9.108.17@o2ib4 arrived at 1556912459 with bad export cookie 2032787652043384404 May 03 12:40:59 fir-md1-s2 kernel: LustreError: 121422:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff933baa741200 x1632274274596000/t0(0) o104->fir-MDT0003@10.9.108.17@o2ib4:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 May 03 12:40:59 fir-md1-s2 kernel: LustreError: 121422:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 3 previous similar messages May 03 13:01:50 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client cb69cd2d-93d3-bee1-db21-fe2b3f3ca9c6 (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9314cd643800, cur 1556913710 expire 1556913560 last 1556913483 May 03 13:01:50 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 13:03:47 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.11.9@o2ib6) May 03 13:03:47 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 03 13:09:55 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client f177c1d4-235a-5dc4-a6f4-a44cfc44c463 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930815b85400, cur 1556914195 expire 1556914045 last 1556913968 May 03 13:09:55 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 13:12:08 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 03 13:12:08 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 13:17:27 fir-md1-s2 kernel: Lustre: 122317:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0003: Failure to clear the changelog for user 1: -22 May 03 13:30:35 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 3aedce70-4bf1-4ae8-3845-eb0f10d1fc87 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930e15e05400, cur 1556915435 expire 1556915285 last 1556915208 May 03 13:30:35 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 13:31:14 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 03 13:31:14 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 13:31:51 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client d58d6902-155e-b288-aeec-34f0613953b8 (at 10.8.11.9@o2ib6) in 176 seconds. I think it's dead, and I am evicting it. exp ffff9307172ec400, cur 1556915511 expire 1556915361 last 1556915335 May 03 13:31:51 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 13:34:14 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.11.9@o2ib6) May 03 13:34:14 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 13:40:21 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client f288cf55-7f62-a214-4a98-fa3df403c209 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff932a0e7b0c00, cur 1556916021 expire 1556915871 last 1556915794 May 03 13:40:21 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 13:40:43 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 03 13:40:43 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 13:49:34 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 2d89ecf0-6152-5faf-bd31-446a85cf5283 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930e3ca97c00, cur 1556916574 expire 1556916424 last 1556916347 May 03 13:49:34 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 13:50:19 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 03 14:08:49 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 75a42419-1c36-3d84-69b0-0982bb5ad919 (at 10.9.101.63@o2ib4) reconnecting May 03 14:08:49 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.101.63@o2ib4) May 03 14:08:49 fir-md1-s2 kernel: Lustre: Skipped 2 previous similar messages May 03 14:09:46 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 75a42419-1c36-3d84-69b0-0982bb5ad919 (at 10.9.101.63@o2ib4) reconnecting May 03 14:09:46 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 14:09:46 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.101.63@o2ib4) May 03 14:16:02 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 75690e50-9cef-96d9-35c3-91f964088184 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930ff5bc3000, cur 1556918162 expire 1556918012 last 1556917935 May 03 14:16:02 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 14:16:38 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 03 14:16:38 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 14:19:11 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 3e2d5c79-3c29-4573-80fd-0479f8ebef28 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930f6e933000, cur 1556918351 expire 1556918201 last 1556918124 May 03 14:19:11 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 14:19:28 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 03 14:19:28 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 14:26:09 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 03 14:26:09 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 14:26:14 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 1d73ea3f-7fa9-4145-50db-a61a07e9088c (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930a94e62000, cur 1556918774 expire 1556918624 last 1556918547 May 03 14:26:14 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 14:30:16 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 946a91e3-4e18-fe36-43ed-463403d1cb75 (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93254161e800, cur 1556919016 expire 1556918866 last 1556918789 May 03 14:30:16 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 14:33:00 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.10.29@o2ib6) May 03 14:33:00 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 14:38:21 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 03 14:38:21 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 14:39:11 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 6671c8b6-b08b-7fef-dcfa-9fd820277abe (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930e6167b800, cur 1556919551 expire 1556919401 last 1556919324 May 03 14:39:11 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 14:40:16 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 9c0c610d-0561-9b9a-98c4-0ad0384caf27 (at 10.9.101.51@o2ib4) reconnecting May 03 14:40:16 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.101.51@o2ib4) May 03 14:41:00 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 9c0c610d-0561-9b9a-98c4-0ad0384caf27 (at 10.9.101.51@o2ib4) reconnecting May 03 14:41:58 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 75a42419-1c36-3d84-69b0-0982bb5ad919 (at 10.9.101.63@o2ib4) reconnecting May 03 14:41:58 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 14:42:50 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 9c0c610d-0561-9b9a-98c4-0ad0384caf27 (at 10.9.101.51@o2ib4) reconnecting May 03 14:42:50 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to (at 10.9.101.51@o2ib4) May 03 14:42:50 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 03 14:43:15 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 9c0c610d-0561-9b9a-98c4-0ad0384caf27 (at 10.9.101.51@o2ib4) reconnecting May 03 14:46:19 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 20b2697a-0a05-414b-bb91-1b15e0613d04 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9303007e0000, cur 1556919979 expire 1556919829 last 1556919752 May 03 14:46:19 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 14:58:51 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 3a7b3379-8114-b28e-8d07-28bf2279cfcb (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930af9344800, cur 1556920731 expire 1556920581 last 1556920504 May 03 14:58:51 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 14:58:59 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 3a7b3379-8114-b28e-8d07-28bf2279cfcb (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930792b21400, cur 1556920739 expire 1556920589 last 1556920512 May 03 14:59:11 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 03 14:59:11 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages May 03 15:10:13 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client c8ba4227-c835-d772-68cd-176ab2dba0c2 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930ffabaf400, cur 1556921413 expire 1556921263 last 1556921186 May 03 15:11:22 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 03 15:11:22 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 15:13:24 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 03 15:13:24 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 15:16:27 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client a38f4e83-8e75-96b3-cd41-7cb0e8786771 (at 10.8.10.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930e13273800, cur 1556921787 expire 1556921637 last 1556921560 May 03 15:16:27 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages May 03 15:17:27 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 03 15:17:27 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 15:18:28 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.10.9@o2ib6) May 03 15:18:28 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 15:25:56 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client f707532b-fbf3-a25a-213f-6ee216d9e3ee (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931b8ce1dc00, cur 1556922356 expire 1556922206 last 1556922129 May 03 15:25:56 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 03 15:25:58 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.10.29@o2ib6) May 03 15:25:58 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 15:27:21 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.11.9@o2ib6) May 03 15:27:21 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 15:40:31 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client eac8837a-ac82-cdac-3334-a9d3b43c4206 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931c22676800, cur 1556923231 expire 1556923081 last 1556923004 May 03 15:40:31 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 15:41:06 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 03 15:41:06 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 15:52:32 fir-md1-s2 kernel: Lustre: 122178:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556923945/real 1556923945] req@ffff930551720c00 x1632276383355696/t0(0) o104->fir-MDT0003@10.8.26.4@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556923952 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 03 15:52:32 fir-md1-s2 kernel: Lustre: 122178:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 8 previous similar messages May 03 15:52:40 fir-md1-s2 kernel: Lustre: 122649:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff93009357f500 x1631559001212640/t0(0) o101->c1d9f0f7-d490-e556-ed11-756e6b122018@10.9.104.22@o2ib4:15/0 lens 1784/3288 e 1 to 0 dl 1556923965 ref 2 fl Interpret:/0/0 rc 0/0 May 03 15:52:41 fir-md1-s2 kernel: Lustre: 122699:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff930e152f6300 x1631551492115632/t0(0) o101->62f2b488-84b8-fb8d-fa79-5fe434be1423@10.8.1.35@o2ib6:16/0 lens 576/3264 e 1 to 0 dl 1556923966 ref 2 fl Interpret:/0/0 rc 0/0 May 03 15:52:41 fir-md1-s2 kernel: Lustre: 122699:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 10 previous similar messages May 03 15:52:42 fir-md1-s2 kernel: Lustre: 122317:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff930c84b1f500 x1631560771341808/t0(0) o101->5f0dd240-53c1-516b-7224-272e9211f8ae@10.9.105.53@o2ib4:17/0 lens 576/3264 e 1 to 0 dl 1556923967 ref 2 fl Interpret:/0/0 rc 0/0 May 03 15:52:42 fir-md1-s2 kernel: Lustre: 122317:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 5 previous similar messages May 03 15:52:46 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client c1d9f0f7-d490-e556-ed11-756e6b122018 (at 10.9.104.22@o2ib4) reconnecting May 03 15:52:46 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to (at 10.9.104.22@o2ib4) May 03 15:52:47 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 4edad98c-7717-083c-e9f2-be247b482ea4 (at 10.8.10.3@o2ib6) reconnecting May 03 15:52:47 fir-md1-s2 kernel: Lustre: Skipped 12 previous similar messages May 03 15:52:51 fir-md1-s2 kernel: Lustre: 122208:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff931b206d7200 x1631751179162896/t0(0) o101->ba47e349-0702-f6f0-1080-5cd761e580b9@10.8.24.25@o2ib6:26/0 lens 576/3264 e 0 to 0 dl 1556923976 ref 2 fl Interpret:/0/0 rc 0/0 May 03 15:52:53 fir-md1-s2 kernel: Lustre: 122178:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556923966/real 1556923966] req@ffff930551720c00 x1632276383355696/t0(0) o104->fir-MDT0003@10.8.26.4@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556923973 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 03 15:52:53 fir-md1-s2 kernel: Lustre: 122178:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages May 03 15:52:57 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 3b2a36c5-4777-020b-1160-e986348a2428 (at 10.8.21.1@o2ib6) reconnecting May 03 15:52:57 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages May 03 15:52:57 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to (at 10.8.21.1@o2ib6) May 03 15:52:57 fir-md1-s2 kernel: Lustre: Skipped 17 previous similar messages May 03 15:53:02 fir-md1-s2 kernel: Lustre: 122644:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff9320229b2a00 x1631714028771728/t0(0) o101->25e6a98d-1523-4b7c-d720-65145c7958fc@10.8.11.10@o2ib6:7/0 lens 576/3264 e 0 to 0 dl 1556923987 ref 2 fl Interpret:/0/0 rc 0/0 May 03 15:53:02 fir-md1-s2 kernel: Lustre: 122644:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 11 previous similar messages May 03 15:53:07 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client c1d9f0f7-d490-e556-ed11-756e6b122018 (at 10.9.104.22@o2ib4) reconnecting May 03 15:53:07 fir-md1-s2 kernel: Lustre: Skipped 11 previous similar messages May 03 15:53:07 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to (at 10.9.104.22@o2ib4) May 03 15:53:07 fir-md1-s2 kernel: Lustre: Skipped 12 previous similar messages May 03 15:53:28 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 3ef17f0c-d35b-8428-c1da-c84a40a8bdbc (at 10.9.101.71@o2ib4) reconnecting May 03 15:53:28 fir-md1-s2 kernel: Lustre: Skipped 18 previous similar messages May 03 15:53:28 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to (at 10.9.104.22@o2ib4) May 03 15:53:28 fir-md1-s2 kernel: Lustre: Skipped 17 previous similar messages May 03 15:53:35 fir-md1-s2 kernel: Lustre: 122178:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556924008/real 1556924008] req@ffff930551720c00 x1632276383355696/t0(0) o104->fir-MDT0003@10.8.26.4@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556924015 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 03 15:53:35 fir-md1-s2 kernel: Lustre: 122178:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages May 03 15:53:49 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 3ef17f0c-d35b-8428-c1da-c84a40a8bdbc (at 10.9.101.71@o2ib4) reconnecting May 03 15:53:49 fir-md1-s2 kernel: Lustre: Skipped 29 previous similar messages May 03 15:53:49 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to (at 10.9.101.71@o2ib4) May 03 15:53:49 fir-md1-s2 kernel: Lustre: Skipped 31 previous similar messages May 03 15:53:55 fir-md1-s2 kernel: LustreError: 9076:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556923945, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff932feef04c80/0x1c35e9c1eb55b7f6 lrc: 3/1,0 mode: --/PR res: [0x280000ddf:0x1282:0x0].0x0 bits 0x13/0x0 rrc: 122 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 9076 timeout: 0 lvb_type: 0 May 03 15:53:55 fir-md1-s2 kernel: LustreError: 9076:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 3 previous similar messages May 03 15:54:07 fir-md1-s2 kernel: LustreError: 122624:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556923957, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff931c7b38de80/0x1c35e9c1eb73bdcc lrc: 3/1,0 mode: --/PR res: [0x280000ddf:0x1282:0x0].0x0 bits 0x13/0x0 rrc: 122 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 122624 timeout: 0 lvb_type: 0 May 03 15:54:07 fir-md1-s2 kernel: LustreError: 122624:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 29 previous similar messages May 03 15:54:31 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 3b2a36c5-4777-020b-1160-e986348a2428 (at 10.8.21.1@o2ib6) reconnecting May 03 15:54:31 fir-md1-s2 kernel: Lustre: Skipped 48 previous similar messages May 03 15:54:31 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to (at 10.8.21.1@o2ib6) May 03 15:54:31 fir-md1-s2 kernel: Lustre: Skipped 47 previous similar messages May 03 15:54:52 fir-md1-s2 kernel: Lustre: 122178:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556924085/real 1556924085] req@ffff930551720c00 x1632276383355696/t0(0) o104->fir-MDT0003@10.8.26.4@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556924092 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 03 15:54:52 fir-md1-s2 kernel: Lustre: 122178:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 10 previous similar messages May 03 15:54:59 fir-md1-s2 kernel: LustreError: 122178:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.26.4@o2ib6) failed to reply to blocking AST (req@ffff930551720c00 x1632276383355696 status 0 rc -110), evict it ns: mdt-fir-MDT0003_UUID lock: ffff932ded2d7740/0x1c35e9c1e9668db3 lrc: 4/0,0 mode: PR/PR res: [0x280000ddf:0x1282:0x0].0x0 bits 0x13/0x0 rrc: 122 type: IBT flags: 0x60200400000020 nid: 10.8.26.4@o2ib6 remote: 0x5d92e8ccd46cdcea expref: 331 pid: 122047 timeout: 449480 lvb_type: 0 May 03 15:54:59 fir-md1-s2 kernel: LustreError: 138-a: fir-MDT0003: A client on nid 10.8.26.4@o2ib6 was evicted due to a lock blocking callback time out: rc -110 May 03 15:54:59 fir-md1-s2 kernel: LustreError: Skipped 1 previous similar message May 03 15:54:59 fir-md1-s2 kernel: LustreError: 121399:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 154s: evicting client at 10.8.26.4@o2ib6 ns: mdt-fir-MDT0003_UUID lock: ffff932ded2d7740/0x1c35e9c1e9668db3 lrc: 3/0,0 mode: PR/PR res: [0x280000ddf:0x1282:0x0].0x0 bits 0x13/0x0 rrc: 122 type: IBT flags: 0x60200400000020 nid: 10.8.26.4@o2ib6 remote: 0x5d92e8ccd46cdcea expref: 332 pid: 122047 timeout: 0 lvb_type: 0 May 03 15:55:23 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 1ecffc4c-5025-d47d-e2cd-93895258213b (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930f08b2dc00, cur 1556924123 expire 1556923973 last 1556923896 May 03 15:55:23 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 15:58:24 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 03 15:58:24 fir-md1-s2 kernel: Lustre: Skipped 49 previous similar messages May 03 16:03:21 fir-md1-s2 kernel: LNetError: 121176:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 03 16:06:49 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client deb77812-f584-b192-ea35-1d7dd75e984c (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9314a7f25c00, cur 1556924809 expire 1556924659 last 1556924582 May 03 16:06:56 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client deb77812-f584-b192-ea35-1d7dd75e984c (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930fbe601400, cur 1556924816 expire 1556924666 last 1556924589 May 03 16:06:56 fir-md1-s2 kernel: Lustre: 122030:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556924809/real 1556924809] req@ffff9307a9f8aa00 x1632276535345744/t0(0) o106->fir-MDT0003@10.8.26.4@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1556924816 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 03 16:06:56 fir-md1-s2 kernel: Lustre: 122030:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message May 03 16:07:46 fir-md1-s2 kernel: LNetError: 121175:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 03 16:08:12 fir-md1-s2 kernel: LNetError: 121179:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 03 16:08:33 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 03 16:08:33 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 16:12:25 fir-md1-s2 kernel: LNetError: 121170:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 03 16:16:33 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 45edec13-ee3c-8a6a-dcd7-fc8dcea1b264 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9318cb24b800, cur 1556925393 expire 1556925243 last 1556925166 May 03 16:16:56 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 45edec13-ee3c-8a6a-dcd7-fc8dcea1b264 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930223b44c00, cur 1556925416 expire 1556925266 last 1556925189 May 03 16:17:46 fir-md1-s2 kernel: LNetError: 121176:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 03 16:20:20 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 75a42419-1c36-3d84-69b0-0982bb5ad919 (at 10.9.101.63@o2ib4) reconnecting May 03 16:20:20 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.101.63@o2ib4) May 03 16:20:20 fir-md1-s2 kernel: Lustre: Skipped 49 previous similar messages May 03 16:22:51 fir-md1-s2 kernel: LNetError: 121173:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 03 16:25:13 fir-md1-s2 kernel: Lustre: 121650:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556925906/real 1556925906] req@ffff9307ee21d700 x1632276718756208/t0(0) o106->fir-MDT0003@10.8.26.4@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1556925913 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 03 16:25:21 fir-md1-s2 kernel: Lustre: 122708:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff930ff275e000 x1631542245316848/t0(0) o101->2e1837bb-385a-af64-a5d1-7a58230af8b2@10.9.0.64@o2ib4:26/0 lens 480/568 e 1 to 0 dl 1556925926 ref 2 fl Interpret:/0/0 rc 0/0 May 03 16:25:54 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 2e1837bb-385a-af64-a5d1-7a58230af8b2 (at 10.9.0.64@o2ib4) reconnecting May 03 16:25:55 fir-md1-s2 kernel: Lustre: 121650:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556925948/real 1556925948] req@ffff9307ee21d700 x1632276718756208/t0(0) o106->fir-MDT0003@10.8.26.4@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1556925955 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 03 16:25:55 fir-md1-s2 kernel: Lustre: 121650:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages May 03 16:26:16 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 2e1837bb-385a-af64-a5d1-7a58230af8b2 (at 10.9.0.64@o2ib4) reconnecting May 03 16:27:03 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 2e1837bb-385a-af64-a5d1-7a58230af8b2 (at 10.9.0.64@o2ib4) reconnecting May 03 16:27:03 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 16:27:12 fir-md1-s2 kernel: Lustre: 121650:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556926025/real 1556926025] req@ffff9307ee21d700 x1632276718756208/t0(0) o106->fir-MDT0003@10.8.26.4@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1556926032 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 03 16:27:12 fir-md1-s2 kernel: Lustre: 121650:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 10 previous similar messages May 03 16:28:03 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 77f69608-fd9a-eaf7-473b-9ae8ad4455d7 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930415e7c400, cur 1556926083 expire 1556925933 last 1556925856 May 03 16:28:21 fir-md1-s2 kernel: Lustre: 122290:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff9324f0a2f200 x1631801161730320/t0(0) o101->9e8351aa-818b-5dd3-be4c-f819d6475d71@10.8.23.13@o2ib6:26/0 lens 480/568 e 1 to 0 dl 1556926106 ref 2 fl Interpret:/0/0 rc 0/0 May 03 16:28:26 fir-md1-s2 kernel: LNet: Service thread pid 121650 was inactive for 200.39s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: May 03 16:28:26 fir-md1-s2 kernel: Pid: 121650, comm: mdt00_008 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 May 03 16:28:26 fir-md1-s2 kernel: Call Trace: May 03 16:28:26 fir-md1-s2 kernel: [] ptlrpc_set_wait+0x500/0x8d0 [ptlrpc] May 03 16:28:26 fir-md1-s2 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] May 03 16:28:26 fir-md1-s2 kernel: [] ldlm_glimpse_locks+0x3b/0x100 [ptlrpc] May 03 16:28:26 fir-md1-s2 kernel: [] mdt_do_glimpse+0x1e9/0x4c0 [mdt] May 03 16:28:26 fir-md1-s2 kernel: [] mdt_glimpse_enqueue+0x3d3/0x4f0 [mdt] May 03 16:28:26 fir-md1-s2 kernel: [] mdt_intent_glimpse+0x1f/0x30 [mdt] May 03 16:28:26 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] May 03 16:28:26 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] May 03 16:28:26 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] May 03 16:28:26 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] May 03 16:28:26 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] May 03 16:28:26 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] May 03 16:28:26 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] May 03 16:28:26 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 May 03 16:28:26 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 May 03 16:28:26 fir-md1-s2 kernel: [] 0xffffffffffffffff May 03 16:28:26 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1556926106.121650 May 03 16:28:27 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 77f69608-fd9a-eaf7-473b-9ae8ad4455d7 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9324f1795c00, cur 1556926107 expire 1556925957 last 1556925880 May 03 16:28:27 fir-md1-s2 kernel: Lustre: 122115:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:1s); client may timeout. req@ffff9324f0a2f200 x1631801161730320/t0(0) o101->9e8351aa-818b-5dd3-be4c-f819d6475d71@10.8.23.13@o2ib6:26/0 lens 480/536 e 1 to 0 dl 1556926106 ref 1 fl Complete:/0/0 rc 301/301 May 03 16:28:27 fir-md1-s2 kernel: LNet: Service thread pid 121650 completed after 200.74s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). May 03 16:28:27 fir-md1-s2 kernel: Lustre: 122115:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 1 previous similar message May 03 16:33:46 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 03 16:33:46 fir-md1-s2 kernel: Lustre: Skipped 10 previous similar messages May 03 16:36:04 fir-md1-s2 kernel: LNetError: 121173:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 03 16:39:41 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 86121a96-c964-720d-4196-e7e8af9a6c1b (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931c68b1dc00, cur 1556926781 expire 1556926631 last 1556926554 May 03 16:40:40 fir-md1-s2 kernel: LNetError: 121177:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 03 16:42:26 fir-md1-s2 kernel: LNetError: 121174:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 03 16:42:52 fir-md1-s2 kernel: LNetError: 121176:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 03 17:00:30 fir-md1-s2 kernel: LNetError: 121180:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 03 17:04:31 fir-md1-s2 kernel: Lustre: 122712:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556928264/real 1556928264] req@ffff930225aa1800 x1632277126759920/t0(0) o104->fir-MDT0003@10.8.26.4@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556928271 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 03 17:04:31 fir-md1-s2 kernel: Lustre: 122712:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 12 previous similar messages May 03 17:04:39 fir-md1-s2 kernel: Lustre: 122067:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff930ce1ae6c00 x1631772477956912/t0(0) o36->f33e431f-0112-1f2a-098b-009723ad3df8@10.8.30.21@o2ib6:14/0 lens 552/2888 e 1 to 0 dl 1556928284 ref 2 fl Interpret:/0/0 rc 0/0 May 03 17:04:40 fir-md1-s2 kernel: Lustre: 122067:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff930e39392100 x1631542259198704/t0(0) o101->2e1837bb-385a-af64-a5d1-7a58230af8b2@10.9.0.64@o2ib4:15/0 lens 576/3264 e 1 to 0 dl 1556928285 ref 2 fl Interpret:/0/0 rc 0/0 May 03 17:04:40 fir-md1-s2 kernel: Lustre: 122067:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 170 previous similar messages May 03 17:04:45 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 856c8010-43d2-a1a9-d163-b7373c0c35f7 (at 10.8.30.16@o2ib6) reconnecting May 03 17:04:45 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to (at 10.8.21.5@o2ib6) May 03 17:04:45 fir-md1-s2 kernel: Lustre: Skipped 2 previous similar messages May 03 17:04:45 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 03 17:04:52 fir-md1-s2 kernel: Lustre: 122712:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556928285/real 1556928285] req@ffff930225aa1800 x1632277126759920/t0(0) o104->fir-MDT0003@10.8.26.4@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556928292 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 03 17:04:52 fir-md1-s2 kernel: Lustre: 122712:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages May 03 17:04:53 fir-md1-s2 kernel: Lustre: 122116:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff93105248ad00 x1631551941410464/t0(0) o101->43966e6e-8d66-2608-a0c9-5eff31e855bc@10.8.30.9@o2ib6:28/0 lens 576/3264 e 0 to 0 dl 1556928298 ref 2 fl Interpret:/0/0 rc 0/0 May 03 17:04:55 fir-md1-s2 kernel: Lustre: 122346:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff931e26b91200 x1631551941411280/t0(0) o101->43966e6e-8d66-2608-a0c9-5eff31e855bc@10.8.30.9@o2ib6:0/0 lens 576/3264 e 0 to 0 dl 1556928300 ref 2 fl Interpret:/0/0 rc 0/0 May 03 17:04:59 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 43966e6e-8d66-2608-a0c9-5eff31e855bc (at 10.8.30.9@o2ib6) reconnecting May 03 17:04:59 fir-md1-s2 kernel: Lustre: Skipped 148 previous similar messages May 03 17:04:59 fir-md1-s2 kernel: Lustre: 122159:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff93196d7bad00 x1631808481613280/t0(0) o101->3b8583db-9b25-3a84-4b44-4c626faa0d2b@10.8.30.13@o2ib6:4/0 lens 576/3264 e 0 to 0 dl 1556928304 ref 2 fl Interpret:/0/0 rc 0/0 May 03 17:04:59 fir-md1-s2 kernel: Lustre: 122159:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message May 03 17:05:09 fir-md1-s2 kernel: Lustre: 122346:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff9316663ab000 x1631945283513104/t0(0) o101->ca76b195-822f-9abb-2230-05894a3e9cc7@10.8.30.7@o2ib6:14/0 lens 576/3264 e 0 to 0 dl 1556928314 ref 2 fl Interpret:/0/0 rc 0/0 May 03 17:05:09 fir-md1-s2 kernel: Lustre: 122346:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message May 03 17:05:27 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 856c8010-43d2-a1a9-d163-b7373c0c35f7 (at 10.8.30.16@o2ib6) reconnecting May 03 17:05:27 fir-md1-s2 kernel: Lustre: Skipped 152 previous similar messages May 03 17:05:27 fir-md1-s2 kernel: Lustre: 122094:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-18), not sending early reply req@ffff931d01210900 x1631808481620272/t0(0) o101->3b8583db-9b25-3a84-4b44-4c626faa0d2b@10.8.30.13@o2ib6:2/0 lens 576/3264 e 0 to 0 dl 1556928332 ref 2 fl Interpret:/0/0 rc 0/0 May 03 17:05:27 fir-md1-s2 kernel: Lustre: 122094:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 3 previous similar messages May 03 17:05:34 fir-md1-s2 kernel: Lustre: 122712:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556928327/real 1556928327] req@ffff930225aa1800 x1632277126759920/t0(0) o104->fir-MDT0003@10.8.26.4@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556928334 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 03 17:05:34 fir-md1-s2 kernel: Lustre: 122712:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages May 03 17:05:54 fir-md1-s2 kernel: LustreError: 122136:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556928264, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff93201c272880/0x1c35e9c2a1294c3e lrc: 3/1,0 mode: --/PR res: [0x280006256:0x53d0:0x0].0x0 bits 0x13/0x0 rrc: 248 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 122136 timeout: 0 lvb_type: 0 May 03 17:05:54 fir-md1-s2 kernel: LustreError: 122136:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 111 previous similar messages May 03 17:05:55 fir-md1-s2 kernel: LustreError: 122291:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556928265, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff930d7ffa5c40/0x1c35e9c2a12a485a lrc: 3/1,0 mode: --/PR res: [0x280006256:0x53d0:0x0].0x0 bits 0x13/0x0 rrc: 248 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 122291 timeout: 0 lvb_type: 0 May 03 17:05:55 fir-md1-s2 kernel: LustreError: 122291:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 52 previous similar messages May 03 17:05:58 fir-md1-s2 kernel: LustreError: 122048:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556928268, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff93175423cec0/0x1c35e9c2a12ae2df lrc: 3/1,0 mode: --/PR res: [0x280006256:0x53d0:0x0].0x0 bits 0x13/0x0 rrc: 248 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 122048 timeout: 0 lvb_type: 0 May 03 17:06:01 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to (at 10.8.30.9@o2ib6) May 03 17:06:01 fir-md1-s2 kernel: Lustre: Skipped 605 previous similar messages May 03 17:06:04 fir-md1-s2 kernel: LustreError: 122662:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556928274, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff93174aaa6780/0x1c35e9c2a12c5760 lrc: 3/1,0 mode: --/PR res: [0x280006256:0x53d0:0x0].0x0 bits 0x13/0x0 rrc: 250 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 122662 timeout: 0 lvb_type: 0 May 03 17:06:04 fir-md1-s2 kernel: LustreError: 122662:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 2 previous similar messages May 03 17:06:09 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 170d6268-ca7a-a7d0-0083-35fb42e90690 (at 10.8.7.33@o2ib6) reconnecting May 03 17:06:09 fir-md1-s2 kernel: Lustre: Skipped 304 previous similar messages May 03 17:06:14 fir-md1-s2 kernel: LustreError: 122204:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556928284, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff931b93704140/0x1c35e9c2a12eb6ae lrc: 3/1,0 mode: --/PR res: [0x280006256:0x53d0:0x0].0x0 bits 0x13/0x0 rrc: 252 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 122204 timeout: 0 lvb_type: 0 May 03 17:06:14 fir-md1-s2 kernel: LustreError: 122204:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 4 previous similar messages May 03 17:06:31 fir-md1-s2 kernel: Lustre: 122220:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff9319cbe73c00 x1631551941435376/t0(0) o101->43966e6e-8d66-2608-a0c9-5eff31e855bc@10.8.30.9@o2ib6:6/0 lens 576/3264 e 0 to 0 dl 1556928396 ref 2 fl Interpret:/0/0 rc 0/0 May 03 17:06:31 fir-md1-s2 kernel: Lustre: 122220:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 4 previous similar messages May 03 17:06:33 fir-md1-s2 kernel: LustreError: 122224:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1556928303, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff932b3ea0b840/0x1c35e9c2a135219a lrc: 3/1,0 mode: --/PR res: [0x280006256:0x53d0:0x0].0x0 bits 0x13/0x0 rrc: 252 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 122224 timeout: 0 lvb_type: 0 May 03 17:06:33 fir-md1-s2 kernel: LustreError: 122224:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 5 previous similar messages May 03 17:06:51 fir-md1-s2 kernel: Lustre: 122712:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556928404/real 1556928404] req@ffff930225aa1800 x1632277126759920/t0(0) o104->fir-MDT0003@10.8.26.4@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1556928411 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 03 17:06:51 fir-md1-s2 kernel: Lustre: 122712:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 10 previous similar messages May 03 17:06:58 fir-md1-s2 kernel: LustreError: 122712:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.26.4@o2ib6) failed to reply to blocking AST (req@ffff930225aa1800 x1632277126759920 status 0 rc -110), evict it ns: mdt-fir-MDT0003_UUID lock: ffff931705bada00/0x1c35e9c2a0fe2389 lrc: 4/0,0 mode: PR/PR res: [0x280006256:0x53d0:0x0].0x0 bits 0x13/0x0 rrc: 252 type: IBT flags: 0x60200400000020 nid: 10.8.26.4@o2ib6 remote: 0xaa0e8b895edb0ac9 expref: 186 pid: 122637 timeout: 453798 lvb_type: 0 May 03 17:06:58 fir-md1-s2 kernel: LustreError: 138-a: fir-MDT0003: A client on nid 10.8.26.4@o2ib6 was evicted due to a lock blocking callback time out: rc -110 May 03 17:06:58 fir-md1-s2 kernel: LustreError: 121399:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 154s: evicting client at 10.8.26.4@o2ib6 ns: mdt-fir-MDT0003_UUID lock: ffff931705bada00/0x1c35e9c2a0fe2389 lrc: 3/0,0 mode: PR/PR res: [0x280006256:0x53d0:0x0].0x0 bits 0x13/0x0 rrc: 252 type: IBT flags: 0x60200400000020 nid: 10.8.26.4@o2ib6 remote: 0xaa0e8b895edb0ac9 expref: 187 pid: 122637 timeout: 0 lvb_type: 0 May 03 17:07:55 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client c5fe7dc5-6be6-753c-1bfb-3059cbdab4cc (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff932efd2f6c00, cur 1556928475 expire 1556928325 last 1556928248 May 03 17:07:55 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 17:27:28 fir-md1-s2 kernel: LNetError: 121173:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 03 17:31:08 fir-md1-s2 kernel: LNetError: 121172:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 03 17:56:07 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client cbd0bde2-1d67-c3c9-d9e1-2825c9dc4656 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff933ec9abf000, cur 1556931367 expire 1556931217 last 1556931140 May 03 17:56:23 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 03 17:56:23 fir-md1-s2 kernel: Lustre: Skipped 458 previous similar messages May 03 18:01:33 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client a327e393-246e-f0b0-a4c7-257350ff9a2e (at 10.9.112.13@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93307c826800, cur 1556931693 expire 1556931543 last 1556931466 May 03 18:01:33 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 18:11:17 fir-md1-s2 kernel: LNetError: 121176:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 03 18:20:14 fir-md1-s2 kernel: LNetError: 121178:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 03 22:26:52 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 50eec116-2bbe-0010-8e94-90a3b26eabed (at 10.8.26.33@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93074bf70c00, cur 1556947612 expire 1556947462 last 1556947385 May 03 22:26:52 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 22:27:38 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.26.33@o2ib6) May 03 22:27:38 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 23:28:06 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 71191044-c535-199c-f761-3a1e66b979bf (at 10.8.9.8@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff932bb639e400, cur 1556951286 expire 1556951136 last 1556951059 May 03 23:28:06 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 23:28:34 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.9.8@o2ib6) May 03 23:28:34 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 03 23:29:12 fir-md1-s2 kernel: LNetError: 121179:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 04 00:52:13 fir-md1-s2 kernel: Lustre: 122117:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1556956326/real 1556956326] req@ffff9319e0a19b00 x1632282037418896/t0(0) o104->fir-MDT0003@10.9.109.11@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1556956333 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 04 00:52:13 fir-md1-s2 kernel: Lustre: 122117:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message May 04 00:52:15 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client d48933ec-1431-e44d-6812-3036ccaf11ec (at 10.9.108.70@o2ib4) reconnecting May 04 00:52:15 fir-md1-s2 kernel: Lustre: Skipped 452 previous similar messages May 04 00:52:15 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to (at 10.9.108.70@o2ib4) May 04 00:52:19 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to (at 10.9.109.11@o2ib4) May 04 01:09:30 fir-md1-s2 kernel: LNetError: 121176:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-125, 0) May 04 01:09:30 fir-md1-s2 kernel: LNetError: 121176:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 8 previous similar messages May 04 01:09:36 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client e4236d6f-2184-c4d2-47bf-69ed0b075b31 (at 10.8.28.10@o2ib6) reconnecting May 04 01:09:36 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 04 01:09:36 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to (at 10.8.28.10@o2ib6) May 04 01:10:00 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client d930e61d-e56c-e6c2-e16a-4d6a026ada3e (at 10.9.107.7@o2ib4) reconnecting May 04 01:10:00 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages May 04 01:10:00 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to (at 10.9.107.7@o2ib4) May 04 01:10:00 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages May 04 01:16:59 fir-md1-s2 kernel: LNetError: 121174:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 04 01:32:01 fir-md1-s2 kernel: LNetError: 121173:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 04 01:33:30 fir-md1-s2 kernel: LNetError: 121174:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 04 02:06:52 fir-md1-s2 kernel: LNetError: 121173:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 04 02:17:52 fir-md1-s2 kernel: LNetError: 121174:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 04 05:18:17 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 383440be-7ca6-4d43-d52b-759fc8a58b5d (at 10.9.106.54@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930731319000, cur 1556972297 expire 1556972147 last 1556972070 May 04 05:18:17 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 04 05:20:22 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.106.54@o2ib4) May 04 05:20:22 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 04 06:04:19 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 9a943a6e-ba71-6a16-d3e1-22dafa089e55 (at 10.9.106.54@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff932dddf30000, cur 1556975059 expire 1556974909 last 1556974832 May 04 06:04:19 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 04 06:06:13 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.106.54@o2ib4) May 04 06:06:13 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 04 07:05:40 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client a3388f54-b8c8-a60b-6b5d-35c7209cd40f (at 10.9.106.54@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931c71a6e000, cur 1556978740 expire 1556978590 last 1556978513 May 04 07:05:40 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 04 07:07:26 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.106.54@o2ib4) May 04 07:07:26 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 04 07:24:37 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.13.24@o2ib6) May 04 07:24:37 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 04 07:25:40 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 88c93a95-f5ad-6110-cc93-c62767675453 (at 10.8.13.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9318fda56c00, cur 1556979940 expire 1556979790 last 1556979713 May 04 07:25:40 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 04 07:54:19 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 4d1ae96a-bfd7-a4f6-e4dd-c47a6ccecff7 (at 10.9.106.54@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff933d22a51c00, cur 1556981659 expire 1556981509 last 1556981432 May 04 07:54:19 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 04 07:56:15 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.106.54@o2ib4) May 04 07:56:15 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 04 10:57:47 fir-md1-s2 kernel: LNetError: 121177:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 04 13:31:14 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 04 13:31:14 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 04 13:32:13 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 767b9887-4949-9409-f873-995c28218891 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff934069410400, cur 1557001933 expire 1557001783 last 1557001706 May 04 13:32:13 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 04 13:35:46 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 04 13:35:46 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 04 13:35:54 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client a600d77e-56e0-8ca3-ef8c-c1da33b325f1 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93169d63a000, cur 1557002154 expire 1557002004 last 1557001927 May 04 13:35:54 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 04 13:43:21 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client f553bf50-c028-a97e-9410-fa196fae757f (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff932d75eb3400, cur 1557002601 expire 1557002451 last 1557002374 May 04 13:43:21 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 04 13:43:46 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 04 13:43:46 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 04 13:51:21 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 62239d4a-6992-768d-a1cc-2d98661c092d (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93285aec8400, cur 1557003081 expire 1557002931 last 1557002854 May 04 13:51:21 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 04 13:51:46 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 04 13:51:46 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 04 13:57:16 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 7f2bb20f-4f02-afda-1d89-8b69b6f31ac2 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93286cb9f800, cur 1557003436 expire 1557003286 last 1557003209 May 04 13:57:16 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 04 13:57:22 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 04 13:57:22 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 04 14:04:58 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 04 14:04:58 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 04 14:05:48 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 7dedc3aa-cf85-daf7-9fd3-ada2338e6ac1 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff932bf4620c00, cur 1557003948 expire 1557003798 last 1557003721 May 04 14:05:48 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 04 14:10:03 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 72196ffb-9a76-4842-d2d7-d8802bd689be (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9315fcee8800, cur 1557004203 expire 1557004053 last 1557003976 May 04 14:10:03 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 04 14:10:14 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 04 14:10:14 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 04 14:17:49 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client d53c58de-f8cb-7814-75c1-fc52447be8d8 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff932fb22d4800, cur 1557004669 expire 1557004519 last 1557004442 May 04 14:17:49 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 04 14:18:02 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 04 14:18:02 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 04 14:43:29 fir-md1-s2 kernel: LNetError: 121174:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 04 16:04:54 fir-md1-s2 kernel: Lustre: 122128:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557011087/real 1557011087] req@ffff930ccf3ae600 x1632291776448144/t0(0) o106->fir-MDT0001@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557011094 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 04 16:04:54 fir-md1-s2 kernel: Lustre: 122128:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message May 04 16:05:01 fir-md1-s2 kernel: Lustre: 122128:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557011094/real 1557011094] req@ffff930ccf3ae600 x1632291776448144/t0(0) o106->fir-MDT0001@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557011101 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 04 16:05:02 fir-md1-s2 kernel: Lustre: 121995:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff930891271b00 x1632098106420560/t0(0) o101->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:7/0 lens 480/568 e 1 to 0 dl 1557011107 ref 2 fl Interpret:/0/0 rc 0/0 May 04 16:05:08 fir-md1-s2 kernel: Lustre: 122128:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557011101/real 1557011101] req@ffff930ccf3ae600 x1632291776448144/t0(0) o106->fir-MDT0001@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557011108 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 04 16:05:09 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 04 16:05:09 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 04 16:05:09 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.0.10.3@o2ib7) May 04 16:05:22 fir-md1-s2 kernel: Lustre: 122128:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557011115/real 1557011115] req@ffff930ccf3ae600 x1632291776448144/t0(0) o106->fir-MDT0001@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557011122 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 04 16:05:22 fir-md1-s2 kernel: Lustre: 122128:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message May 04 16:05:30 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 6b6c38b2-4fd6-8d37-9524-d4553bfeb828 (at 10.0.10.3@o2ib7) reconnecting May 04 16:05:30 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.0.10.3@o2ib7) May 04 16:05:32 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 46347b9e-ebe9-507e-4869-8b66fcad2413 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff932bd7268000, cur 1557011132 expire 1557010982 last 1557010905 May 04 16:05:32 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 04 16:09:40 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 04 16:09:40 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 04 16:33:04 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 7444eeb8-aa34-ee96-aeea-02f153ed19ac (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931b3fa98400, cur 1557012784 expire 1557012634 last 1557012557 May 04 16:33:04 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 04 16:33:29 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 04 16:33:29 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 04 16:37:14 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 04 16:37:14 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 04 16:37:19 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 577d49cc-46ec-97b8-1139-c4b0a8b80cef (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9307332dc400, cur 1557013039 expire 1557012889 last 1557012812 May 04 16:37:19 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 04 16:49:49 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client f7224363-eb38-b009-9e90-8b8b47f8518e (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff933046af8c00, cur 1557013789 expire 1557013639 last 1557013562 May 04 16:49:49 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 04 16:50:02 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 04 16:50:02 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 04 16:59:42 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client dde38836-d5bf-8e01-0932-9c2ac1c28821 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff933035740800, cur 1557014382 expire 1557014232 last 1557014155 May 04 16:59:42 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 04 17:00:18 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 04 17:00:19 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to (at 10.8.27.23@o2ib6) May 04 17:25:50 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 93205d60-586e-3744-ce33-b1b459643386 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931df9a11c00, cur 1557015950 expire 1557015800 last 1557015723 May 04 17:25:50 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 04 17:26:03 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 04 17:26:03 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 04 17:33:14 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 511a286a-7555-3179-23d0-2620081196ae (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931dd9727000, cur 1557016394 expire 1557016244 last 1557016167 May 04 17:33:14 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 04 17:33:35 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 04 17:33:35 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 04 17:44:56 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 54ad9af7-50a0-90a2-7f16-e45b723f531c (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931b45e43000, cur 1557017096 expire 1557016946 last 1557016869 May 04 17:44:56 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 04 17:45:17 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 04 17:45:17 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 04 17:53:06 fir-md1-s2 kernel: Lustre: 121653:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557017579/real 1557017579] req@ffff93186571d400 x1632292957266224/t0(0) o104->fir-MDT0003@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1557017586 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 04 17:53:06 fir-md1-s2 kernel: Lustre: 121653:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message May 04 17:53:13 fir-md1-s2 kernel: Lustre: 121653:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557017586/real 1557017586] req@ffff93186571d400 x1632292957266224/t0(0) o104->fir-MDT0003@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1557017593 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 04 17:53:14 fir-md1-s2 kernel: Lustre: 122645:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff93156df93300 x1631286650026304/t0(0) o36->6e0b1c17-2142-9190-acc8-624208298012@10.8.8.17@o2ib6:19/0 lens 512/448 e 1 to 0 dl 1557017599 ref 2 fl Interpret:/0/0 rc 0/0 May 04 17:53:20 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 6e0b1c17-2142-9190-acc8-624208298012 (at 10.8.8.17@o2ib6) reconnecting May 04 17:53:20 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to (at 10.8.8.17@o2ib6) May 04 17:53:20 fir-md1-s2 kernel: Lustre: 121653:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557017593/real 1557017593] req@ffff93186571d400 x1632292957266224/t0(0) o104->fir-MDT0003@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1557017600 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 04 17:53:34 fir-md1-s2 kernel: Lustre: 121653:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557017607/real 1557017607] req@ffff93186571d400 x1632292957266224/t0(0) o104->fir-MDT0003@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1557017614 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 04 17:53:34 fir-md1-s2 kernel: Lustre: 121653:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message May 04 17:53:42 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 6e0b1c17-2142-9190-acc8-624208298012 (at 10.8.8.17@o2ib6) reconnecting May 04 17:53:42 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to (at 10.8.8.17@o2ib6) May 04 17:53:55 fir-md1-s2 kernel: Lustre: 121653:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557017628/real 1557017628] req@ffff93186571d400 x1632292957266224/t0(0) o104->fir-MDT0003@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1557017635 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 04 17:53:55 fir-md1-s2 kernel: Lustre: 121653:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages May 04 17:54:03 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 6e0b1c17-2142-9190-acc8-624208298012 (at 10.8.8.17@o2ib6) reconnecting May 04 17:54:03 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to (at 10.8.8.17@o2ib6) May 04 17:54:08 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 1239aa75-3a7b-000a-099b-e23fd7e6fded (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930d8b263c00, cur 1557017648 expire 1557017498 last 1557017421 May 04 17:54:08 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 04 17:57:04 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 04 17:57:04 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 04 18:06:47 fir-md1-s2 kernel: Lustre: 122058:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557018400/real 1557018400] req@ffff930f083d2700 x1632293109787984/t0(0) o104->fir-MDT0003@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1557018407 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 04 18:06:47 fir-md1-s2 kernel: Lustre: 122058:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message May 04 18:06:54 fir-md1-s2 kernel: Lustre: 122058:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557018407/real 1557018407] req@ffff930f083d2700 x1632293109787984/t0(0) o104->fir-MDT0003@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1557018414 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 04 18:06:55 fir-md1-s2 kernel: Lustre: 122026:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff9308f638ce00 x1631286650137840/t0(0) o36->6e0b1c17-2142-9190-acc8-624208298012@10.8.8.17@o2ib6:0/0 lens 512/448 e 1 to 0 dl 1557018420 ref 2 fl Interpret:/0/0 rc 0/0 May 04 18:07:01 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 6e0b1c17-2142-9190-acc8-624208298012 (at 10.8.8.17@o2ib6) reconnecting May 04 18:07:01 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to (at 10.8.8.17@o2ib6) May 04 18:07:08 fir-md1-s2 kernel: Lustre: 122058:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557018421/real 1557018421] req@ffff930f083d2700 x1632293109787984/t0(0) o104->fir-MDT0003@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1557018428 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 04 18:07:08 fir-md1-s2 kernel: Lustre: 122058:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message May 04 18:07:22 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 6e0b1c17-2142-9190-acc8-624208298012 (at 10.8.8.17@o2ib6) reconnecting May 04 18:07:22 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to (at 10.8.8.17@o2ib6) May 04 18:07:29 fir-md1-s2 kernel: Lustre: 122058:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557018442/real 1557018442] req@ffff930f083d2700 x1632293109787984/t0(0) o104->fir-MDT0003@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1557018449 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 04 18:07:29 fir-md1-s2 kernel: Lustre: 122058:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages May 04 18:07:35 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client cda96f87-639d-9cf0-7897-0ebd87a2d1c8 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9315b22fac00, cur 1557018455 expire 1557018305 last 1557018228 May 04 18:07:35 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 04 18:07:59 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 04 18:07:59 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 04 18:16:55 fir-md1-s2 kernel: Lustre: 121653:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff9316732db000 x1631286650226704/t0(0) o36->6e0b1c17-2142-9190-acc8-624208298012@10.8.8.17@o2ib6:0/0 lens 512/448 e 1 to 0 dl 1557019020 ref 2 fl Interpret:/0/0 rc 0/0 May 04 18:17:01 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 6e0b1c17-2142-9190-acc8-624208298012 (at 10.8.8.17@o2ib6) reconnecting May 04 18:17:01 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to (at 10.8.8.17@o2ib6) May 04 18:17:11 fir-md1-s2 kernel: Lustre: 122186:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557019000/real 1557019000] req@ffff9316df730900 x1632293217708992/t0(0) o104->fir-MDT0003@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1557019031 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 04 18:17:22 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 6e0b1c17-2142-9190-acc8-624208298012 (at 10.8.8.17@o2ib6) reconnecting May 04 18:17:42 fir-md1-s2 kernel: LustreError: 122186:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.27.23@o2ib6) returned error from blocking AST (req@ffff9316df730900 x1632293217708992 status -107 rc -107), evict it ns: mdt-fir-MDT0003_UUID lock: ffff932d7a6018c0/0x1c35e9d0f0db1ba6 lrc: 4/0,0 mode: PR/PR res: [0x2800266da:0xfc6a:0x0].0x0 bits 0x13/0x0 rrc: 5 type: IBT flags: 0x60200400000020 nid: 10.8.27.23@o2ib6 remote: 0x9ea59d7a65c97692 expref: 19 pid: 122217 timeout: 544449 lvb_type: 0 May 04 18:17:42 fir-md1-s2 kernel: LustreError: 138-a: fir-MDT0003: A client on nid 10.8.27.23@o2ib6 was evicted due to a lock blocking callback time out: rc -107 May 04 18:17:42 fir-md1-s2 kernel: LustreError: 121399:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 62s: evicting client at 10.8.27.23@o2ib6 ns: mdt-fir-MDT0003_UUID lock: ffff932d7a6018c0/0x1c35e9d0f0db1ba6 lrc: 3/0,0 mode: PR/PR res: [0x2800266da:0xfc6a:0x0].0x0 bits 0x13/0x0 rrc: 5 type: IBT flags: 0x60200400000020 nid: 10.8.27.23@o2ib6 remote: 0x9ea59d7a65c97692 expref: 20 pid: 122217 timeout: 0 lvb_type: 0 May 04 18:18:05 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 52867129-fbd8-a539-d444-366a85e4c09b (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93187832ac00, cur 1557019085 expire 1557018935 last 1557018858 May 04 18:18:05 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 04 18:26:23 fir-md1-s2 kernel: Lustre: 122117:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557019576/real 1557019576] req@ffff9316f8704800 x1632293323021104/t0(0) o104->fir-MDT0003@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1557019583 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 04 18:26:23 fir-md1-s2 kernel: Lustre: 122117:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message May 04 18:26:31 fir-md1-s2 kernel: Lustre: 122623:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff93184fa60c00 x1631286650311216/t0(0) o36->6e0b1c17-2142-9190-acc8-624208298012@10.8.8.17@o2ib6:6/0 lens 512/448 e 1 to 0 dl 1557019596 ref 2 fl Interpret:/0/0 rc 0/0 May 04 18:26:37 fir-md1-s2 kernel: LustreError: 122117:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.27.23@o2ib6) returned error from blocking AST (req@ffff9316f8704800 x1632293323021104 status -107 rc -107), evict it ns: mdt-fir-MDT0003_UUID lock: ffff932749a1f500/0x1c35e9d109ad4437 lrc: 4/0,0 mode: PR/PR res: [0x2800266da:0xfc70:0x0].0x0 bits 0x13/0x0 rrc: 5 type: IBT flags: 0x60200400000020 nid: 10.8.27.23@o2ib6 remote: 0x6e26aa4174a4c1c0 expref: 18 pid: 122306 timeout: 544984 lvb_type: 0 May 04 18:26:37 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 6e0b1c17-2142-9190-acc8-624208298012 (at 10.8.8.17@o2ib6) reconnecting May 04 18:26:37 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to (at 10.8.8.17@o2ib6) May 04 18:26:37 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 04 18:26:37 fir-md1-s2 kernel: LustreError: 138-a: fir-MDT0003: A client on nid 10.8.27.23@o2ib6 was evicted due to a lock blocking callback time out: rc -107 May 04 18:26:37 fir-md1-s2 kernel: LustreError: 121399:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 21s: evicting client at 10.8.27.23@o2ib6 ns: mdt-fir-MDT0003_UUID lock: ffff932749a1f500/0x1c35e9d109ad4437 lrc: 3/0,0 mode: PR/PR res: [0x2800266da:0xfc70:0x0].0x0 bits 0x13/0x0 rrc: 5 type: IBT flags: 0x60200400000020 nid: 10.8.27.23@o2ib6 remote: 0x6e26aa4174a4c1c0 expref: 19 pid: 122306 timeout: 0 lvb_type: 0 May 04 18:27:40 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 4217312c-de7d-699c-df37-b7b9ed79d1d2 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff933ff1580800, cur 1557019660 expire 1557019510 last 1557019433 May 04 18:34:31 fir-md1-s2 kernel: Lustre: 122355:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557020064/real 1557020064] req@ffff93189471aa00 x1632293413673424/t0(0) o104->fir-MDT0003@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1557020071 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 04 18:34:31 fir-md1-s2 kernel: Lustre: 122355:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages May 04 18:34:39 fir-md1-s2 kernel: Lustre: 122658:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff9316df731e00 x1631607329260848/t0(0) o36->0d6f0e39-2d80-1899-feb1-caec05c1c7f0@10.8.2.23@o2ib6:14/0 lens 512/448 e 1 to 0 dl 1557020084 ref 2 fl Interpret:/0/0 rc 0/0 May 04 18:34:45 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 0d6f0e39-2d80-1899-feb1-caec05c1c7f0 (at 10.8.2.23@o2ib6) reconnecting May 04 18:34:45 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to (at 10.8.2.23@o2ib6) May 04 18:34:45 fir-md1-s2 kernel: Lustre: Skipped 2 previous similar messages May 04 18:35:06 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 0d6f0e39-2d80-1899-feb1-caec05c1c7f0 (at 10.8.2.23@o2ib6) reconnecting May 04 18:35:39 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 2c083488-2dde-8148-1c71-2773a71ca78d (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931ee0b1a000, cur 1557020139 expire 1557019989 last 1557019912 May 04 18:52:58 fir-md1-s2 kernel: Lustre: 122276:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff931403f01b00 x1631546166208000/t0(0) o101->c5d29146-8e69-99bb-85ae-0e928604facc@10.8.0.68@o2ib6:3/0 lens 1784/3288 e 1 to 0 dl 1557021183 ref 2 fl Interpret:/0/0 rc 0/0 May 04 18:53:04 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client c5d29146-8e69-99bb-85ae-0e928604facc (at 10.8.0.68@o2ib6) reconnecting May 04 18:53:04 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 04 18:53:04 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to c5d29146-8e69-99bb-85ae-0e928604facc (at 10.8.0.68@o2ib6) May 04 18:53:04 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages May 04 18:53:08 fir-md1-s2 kernel: Lustre: 122004:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff930e133b9800 x1631730914251888/t0(0) o101->d4d733ff-8d4b-d8de-bbc6-b5ae7cc529ba@10.8.12.35@o2ib6:13/0 lens 592/3264 e 1 to 0 dl 1557021193 ref 2 fl Interpret:/0/0 rc 0/0 May 04 18:53:14 fir-md1-s2 kernel: Lustre: 122350:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557021163/real 1557021163] req@ffff931935644e00 x1632293610103616/t0(0) o104->fir-MDT0003@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1557021194 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 04 18:53:14 fir-md1-s2 kernel: Lustre: 122350:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 9 previous similar messages May 04 18:53:14 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client d4d733ff-8d4b-d8de-bbc6-b5ae7cc529ba (at 10.8.12.35@o2ib6) reconnecting May 04 18:53:16 fir-md1-s2 kernel: Lustre: 122660:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff930cc2ef7200 x1631684012171680/t0(0) o101->7c55e6d6-417c-9880-1f82-80244c9a7c6b@10.8.19.3@o2ib6:21/0 lens 592/3264 e 1 to 0 dl 1557021201 ref 2 fl Interpret:/0/0 rc 0/0 May 04 18:53:25 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client c5d29146-8e69-99bb-85ae-0e928604facc (at 10.8.0.68@o2ib6) reconnecting May 04 18:53:25 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 04 18:53:26 fir-md1-s2 kernel: Lustre: 122151:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff932824394e00 x1631686959953808/t0(0) o101->e069e613-f413-14c2-adc9-8bb2c0565535@10.8.20.30@o2ib6:1/0 lens 592/3264 e 0 to 0 dl 1557021211 ref 2 fl Interpret:/0/0 rc 0/0 May 04 18:53:37 fir-md1-s2 kernel: Lustre: 121419:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff932d4aac4200 x1631321753004480/t0(0) o101->fafb3280-fd8a-565a-20cd-97f85a227ff6@10.8.11.31@o2ib6:12/0 lens 592/3264 e 0 to 0 dl 1557021222 ref 2 fl Interpret:/0/0 rc 0/0 May 04 18:53:43 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client fafb3280-fd8a-565a-20cd-97f85a227ff6 (at 10.8.11.31@o2ib6) reconnecting May 04 18:53:43 fir-md1-s2 kernel: Lustre: Skipped 2 previous similar messages May 04 18:53:54 fir-md1-s2 kernel: Lustre: 122024:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff932069a68000 x1632257296222928/t0(0) o101->1f6dcbe1-0bdc-a36f-a698-e7085eab26b7@10.8.11.7@o2ib6:29/0 lens 592/3264 e 0 to 0 dl 1557021239 ref 2 fl Interpret:/0/0 rc 0/0 May 04 18:53:54 fir-md1-s2 kernel: Lustre: 122024:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages May 04 18:54:16 fir-md1-s2 kernel: Lustre: 122350:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557021225/real 1557021225] req@ffff931935644e00 x1632293610103616/t0(0) o104->fir-MDT0003@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1557021256 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 04 18:54:16 fir-md1-s2 kernel: Lustre: 122350:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message May 04 18:54:17 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client bb0a5132-bb89-b076-3d1c-a0a716c38321 (at 10.8.12.3@o2ib6) reconnecting May 04 18:54:17 fir-md1-s2 kernel: Lustre: Skipped 11 previous similar messages May 04 18:54:23 fir-md1-s2 kernel: LustreError: 121420:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1557021173, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff9327a13e4ec0/0x1c35e9d151a4e612 lrc: 3/1,0 mode: --/PR res: [0x280000e42:0x53e:0x0].0x0 bits 0x13/0x0 rrc: 13 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 121420 timeout: 0 lvb_type: 0 May 04 18:54:31 fir-md1-s2 kernel: LustreError: 9074:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1557021181, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff9317dda98fc0/0x1c35e9d151f3ed07 lrc: 3/1,0 mode: --/PR res: [0x280000e42:0x53e:0x0].0x0 bits 0x13/0x0 rrc: 13 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 9074 timeout: 0 lvb_type: 0 May 04 18:54:42 fir-md1-s2 kernel: LustreError: 9078:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1557021192, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff9329fefacec0/0x1c35e9d152673611 lrc: 3/1,0 mode: --/PR res: [0x280000e42:0x53e:0x0].0x0 bits 0x13/0x0 rrc: 13 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 9078 timeout: 0 lvb_type: 0 May 04 18:54:42 fir-md1-s2 kernel: LustreError: 9078:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 2 previous similar messages May 04 18:54:47 fir-md1-s2 kernel: LustreError: 122350:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.27.23@o2ib6) returned error from blocking AST (req@ffff931935644e00 x1632293610103616 status -107 rc -107), evict it ns: mdt-fir-MDT0003_UUID lock: ffff931ffa7157c0/0x1c35e9d128970146 lrc: 4/0,0 mode: PR/PR res: [0x280000e42:0x53e:0x0].0x0 bits 0x13/0x0 rrc: 13 type: IBT flags: 0x60200400000020 nid: 10.8.27.23@o2ib6 remote: 0x7b55a62784a113ef expref: 13 pid: 122188 timeout: 546675 lvb_type: 0 May 04 18:54:47 fir-md1-s2 kernel: LustreError: 138-a: fir-MDT0003: A client on nid 10.8.27.23@o2ib6 was evicted due to a lock blocking callback time out: rc -107 May 04 18:54:47 fir-md1-s2 kernel: LustreError: 121399:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 124s: evicting client at 10.8.27.23@o2ib6 ns: mdt-fir-MDT0003_UUID lock: ffff931ffa7157c0/0x1c35e9d128970146 lrc: 3/0,0 mode: PR/PR res: [0x280000e42:0x53e:0x0].0x0 bits 0x13/0x0 rrc: 13 type: IBT flags: 0x60200400000020 nid: 10.8.27.23@o2ib6 remote: 0x7b55a62784a113ef expref: 14 pid: 122188 timeout: 0 lvb_type: 0 May 04 18:55:03 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 04b19bfb-73c4-35aa-ab54-95d0fb967def (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9327102e3000, cur 1557021303 expire 1557021153 last 1557021076 May 04 18:55:03 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 04 19:07:13 fir-md1-s2 kernel: Lustre: 122647:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557022026/real 1557022026] req@ffff932058fca100 x1632293763110016/t0(0) o104->fir-MDT0003@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1557022033 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 04 19:07:13 fir-md1-s2 kernel: Lustre: 122647:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message May 04 19:07:13 fir-md1-s2 kernel: LustreError: 122647:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.27.23@o2ib6) returned error from blocking AST (req@ffff932058fca100 x1632293763110016 status -107 rc -107), evict it ns: mdt-fir-MDT0003_UUID lock: ffff93285bb91440/0x1c35e9d16a0f87b5 lrc: 4/0,0 mode: PR/PR res: [0x2800266da:0xfc5e:0x0].0x0 bits 0x13/0x0 rrc: 4 type: IBT flags: 0x60200400000020 nid: 10.8.27.23@o2ib6 remote: 0x546a2c422a620279 expref: 15 pid: 122683 timeout: 547421 lvb_type: 0 May 04 19:07:13 fir-md1-s2 kernel: LustreError: 138-a: fir-MDT0003: A client on nid 10.8.27.23@o2ib6 was evicted due to a lock blocking callback time out: rc -107 May 04 19:07:13 fir-md1-s2 kernel: LustreError: 121399:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 7s: evicting client at 10.8.27.23@o2ib6 ns: mdt-fir-MDT0003_UUID lock: ffff93285bb91440/0x1c35e9d16a0f87b5 lrc: 3/0,0 mode: PR/PR res: [0x2800266da:0xfc5e:0x0].0x0 bits 0x13/0x0 rrc: 4 type: IBT flags: 0x60200400000020 nid: 10.8.27.23@o2ib6 remote: 0x546a2c422a620279 expref: 16 pid: 122683 timeout: 0 lvb_type: 0 May 04 19:08:15 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 44d99712-0a32-8224-fa27-023394f7a2e7 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff932faf3e0400, cur 1557022095 expire 1557021945 last 1557021868 May 04 19:08:27 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 04 19:08:27 fir-md1-s2 kernel: Lustre: Skipped 31 previous similar messages May 04 19:25:35 fir-md1-s2 kernel: Lustre: 121650:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 May 04 20:00:47 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 924721f2-8b7a-d9e8-9d84-6d74a6d68bc4 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff933053bce000, cur 1557025247 expire 1557025097 last 1557025020 May 04 20:01:12 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 04 20:01:12 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 04 21:02:43 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 14d8e923-c6f9-c89f-2108-7e07133a0534 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9327967fec00, cur 1557028963 expire 1557028813 last 1557028736 May 04 21:02:43 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 04 21:02:59 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 14d8e923-c6f9-c89f-2108-7e07133a0534 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff933f9f21b800, cur 1557028979 expire 1557028829 last 1557028752 May 04 21:03:16 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 04 21:03:16 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 04 21:12:20 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client d92925b0-6950-d73a-3c19-18f60b71f77c (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93403e60a800, cur 1557029540 expire 1557029390 last 1557029313 May 04 21:12:31 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client d92925b0-6950-d73a-3c19-18f60b71f77c (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff933078f00400, cur 1557029551 expire 1557029401 last 1557029324 May 04 21:12:45 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 04 21:12:45 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 04 21:42:30 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client b10ee8b8-e94f-f5ea-853b-6331b42e15b7 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9314ca27b800, cur 1557031350 expire 1557031200 last 1557031123 May 04 21:42:33 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client b10ee8b8-e94f-f5ea-853b-6331b42e15b7 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930e5ce32400, cur 1557031353 expire 1557031203 last 1557031126 May 04 21:42:49 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 04 21:42:49 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 04 21:49:35 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client ef06e220-e09f-677f-2fdb-acf8938b16d5 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff933baa23ec00, cur 1557031775 expire 1557031625 last 1557031548 May 04 21:49:46 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client ef06e220-e09f-677f-2fdb-acf8938b16d5 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff933d26a8c000, cur 1557031786 expire 1557031636 last 1557031559 May 04 21:49:58 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 04 21:49:58 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 04 21:59:14 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 2bf243cb-6f67-e42b-51f4-f6a7c5ee982b (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931a5a26c800, cur 1557032354 expire 1557032204 last 1557032127 May 04 21:59:38 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 04 21:59:38 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 04 22:07:22 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 480555d5-5d64-d073-3639-0f5b2ce35e26 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930e3d3c1800, cur 1557032842 expire 1557032692 last 1557032615 May 04 22:07:22 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 04 22:07:31 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 04 22:07:31 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 04 22:21:48 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 7d1b2398-9a4a-11b4-6f97-80af175ca4ee (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931df4613800, cur 1557033708 expire 1557033558 last 1557033481 May 04 22:21:48 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 04 22:24:36 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 04 22:24:36 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 04 22:28:24 fir-md1-s2 kernel: LNetError: 121175:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 04 22:47:40 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client f1e8f696-2dcb-fb2e-d28b-ace730b2fdeb (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93077169a000, cur 1557035260 expire 1557035110 last 1557035033 May 04 22:47:40 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 04 22:48:13 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 04 22:48:13 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 04 23:38:51 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client ec44671f-3d65-fa77-ff7c-0f9f8180fb37 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930094723800, cur 1557038331 expire 1557038181 last 1557038104 May 04 23:38:51 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 04 23:39:06 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client ec44671f-3d65-fa77-ff7c-0f9f8180fb37 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9314752b5c00, cur 1557038346 expire 1557038196 last 1557038119 May 04 23:44:58 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 04 23:44:58 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 05 00:06:22 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 4ac8fb02-ddf9-28e4-b563-927c1f401f4a (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff932777f3c400, cur 1557039982 expire 1557039832 last 1557039755 May 05 00:06:47 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 05 00:06:47 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 05 00:16:02 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 46ec5507-6f4e-bc2f-33b6-50ca0b2ab0b3 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931feb3d0400, cur 1557040562 expire 1557040412 last 1557040335 May 05 00:16:02 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 05 00:16:06 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 46ec5507-6f4e-bc2f-33b6-50ca0b2ab0b3 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff932e0d796c00, cur 1557040566 expire 1557040416 last 1557040339 May 05 00:21:00 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 05 00:21:00 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 05 00:59:32 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 65f3208b-c8c0-19d5-f74e-5b534d0ff340 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff933c3ea6d000, cur 1557043172 expire 1557043022 last 1557042945 May 05 00:59:47 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 05 00:59:47 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 05 01:06:07 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 672caf69-51af-50bf-c8ba-5b4846099a5d (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9318a7201000, cur 1557043567 expire 1557043417 last 1557043340 May 05 01:06:07 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 05 01:06:21 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 05 01:06:21 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 05 01:14:22 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client ab036b1d-a85e-d27e-e5fc-7955b853ddcc (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff932ba8e50800, cur 1557044062 expire 1557043912 last 1557043835 May 05 01:14:22 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 05 01:14:55 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 05 01:14:55 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 05 01:22:05 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 33446197-b79e-e7f3-0d84-b0bbf23bc42f (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930e744b5000, cur 1557044525 expire 1557044375 last 1557044298 May 05 01:22:05 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 05 01:25:23 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 05 01:25:23 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 05 01:40:04 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 17a7a482-1271-33ea-b064-feb50332109c (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9315c3a26400, cur 1557045604 expire 1557045454 last 1557045377 May 05 01:40:04 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 05 01:40:31 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 05 01:40:31 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 05 01:45:51 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 05 01:45:51 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 05 01:46:51 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 101f9e20-27fc-7a6d-215b-5bbd3ac1ba7a (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931b4fb8d800, cur 1557046011 expire 1557045861 last 1557045784 May 05 01:46:51 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 05 01:53:24 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client e7ca7b5c-b991-452d-a104-2b90c14ee66b (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931b4fbfb000, cur 1557046404 expire 1557046254 last 1557046177 May 05 01:53:24 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 05 01:53:44 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 05 01:53:44 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 05 02:00:40 fir-md1-s2 kernel: LNetError: 121177:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 05 03:30:23 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client ce9e113d-e8f1-e106-02dd-4c3be89e6b69 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff932bd3232800, cur 1557052223 expire 1557052073 last 1557051996 May 05 03:30:23 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 05 03:30:44 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 05 03:30:44 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 05 08:36:10 fir-md1-s2 kernel: LNetError: 121178:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 05 08:52:18 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client fe6e0c95-4050-3263-6441-0c8bd9611956 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930aa12bc000, cur 1557071538 expire 1557071388 last 1557071311 May 05 08:52:18 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 05 08:52:29 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client fe6e0c95-4050-3263-6441-0c8bd9611956 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93161a355c00, cur 1557071549 expire 1557071399 last 1557071322 May 05 08:52:49 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.23.14@o2ib6) May 05 08:52:49 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 05 09:14:12 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 55455114-9841-d0dd-10fa-08a14775789b (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931757f38400, cur 1557072852 expire 1557072702 last 1557072625 May 05 09:14:20 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 55455114-9841-d0dd-10fa-08a14775789b (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9329dbb82400, cur 1557072860 expire 1557072710 last 1557072633 May 05 09:14:35 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.23.14@o2ib6) May 05 09:14:35 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 05 09:58:26 fir-md1-s2 kernel: LNetError: 121180:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 05 10:05:55 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 10867627-0a3b-0ced-8cf7-ab7628cfde78 (at 10.8.9.8@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff933e32247400, cur 1557075955 expire 1557075805 last 1557075728 May 05 12:07:41 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.14.2@o2ib6) May 05 12:07:41 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 05 12:08:31 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 46e0bcf2-fd2a-2954-a40d-64dcbd9e1b39 (at 10.8.14.2@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93105146fc00, cur 1557083311 expire 1557083161 last 1557083084 May 05 12:08:31 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 05 12:57:43 fir-md1-s2 kernel: LNetError: 121179:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 05 17:02:03 fir-md1-s2 kernel: Lustre: 122308:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557100916/real 1557100916] req@ffff930a6f21ef00 x1632306954207616/t0(0) o104->fir-MDT0001@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1557100923 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 05 17:02:10 fir-md1-s2 kernel: Lustre: 122308:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557100923/real 1557100923] req@ffff930a6f21ef00 x1632306954207616/t0(0) o104->fir-MDT0001@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1557100930 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 05 17:02:11 fir-md1-s2 kernel: Lustre: 122199:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff931071ef8f00 x1631568280933664/t0(0) o36->75c31e1e-77de-1d06-3ba1-5bf70911b79e@10.9.104.58@o2ib4:16/0 lens 520/2888 e 1 to 0 dl 1557100936 ref 2 fl Interpret:/0/0 rc 0/0 May 05 17:02:11 fir-md1-s2 kernel: Lustre: 122199:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message May 05 17:02:17 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 75c31e1e-77de-1d06-3ba1-5bf70911b79e (at 10.9.104.58@o2ib4) reconnecting May 05 17:02:17 fir-md1-s2 kernel: Lustre: 122308:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557100930/real 1557100930] req@ffff930a6f21ef00 x1632306954207616/t0(0) o104->fir-MDT0001@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1557100937 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 05 17:02:17 fir-md1-s2 kernel: Lustre: Skipped 10 previous similar messages May 05 17:02:17 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.104.58@o2ib4) May 05 17:02:24 fir-md1-s2 kernel: Lustre: 122308:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557100937/real 1557100937] req@ffff930a6f21ef00 x1632306954207616/t0(0) o104->fir-MDT0001@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1557100944 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 05 17:02:38 fir-md1-s2 kernel: Lustre: 122308:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557100951/real 1557100951] req@ffff930a6f21ef00 x1632306954207616/t0(0) o104->fir-MDT0001@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1557100958 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 05 17:02:38 fir-md1-s2 kernel: Lustre: 122308:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message May 05 17:02:38 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 75c31e1e-77de-1d06-3ba1-5bf70911b79e (at 10.9.104.58@o2ib4) reconnecting May 05 17:02:38 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.104.58@o2ib4) May 05 17:02:59 fir-md1-s2 kernel: Lustre: 122308:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557100972/real 1557100972] req@ffff930a6f21ef00 x1632306954207616/t0(0) o104->fir-MDT0001@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1557100979 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 05 17:02:59 fir-md1-s2 kernel: Lustre: 122308:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages May 05 17:02:59 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 75c31e1e-77de-1d06-3ba1-5bf70911b79e (at 10.9.104.58@o2ib4) reconnecting May 05 17:02:59 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.104.58@o2ib4) May 05 17:03:20 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.104.58@o2ib4) May 05 17:03:41 fir-md1-s2 kernel: Lustre: 122308:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557101014/real 1557101014] req@ffff930a6f21ef00 x1632306954207616/t0(0) o104->fir-MDT0001@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1557101021 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 05 17:03:41 fir-md1-s2 kernel: Lustre: 122308:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages May 05 17:03:41 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 75c31e1e-77de-1d06-3ba1-5bf70911b79e (at 10.9.104.58@o2ib4) reconnecting May 05 17:03:41 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 05 17:03:41 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.104.58@o2ib4) May 05 17:03:50 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 314b365e-14a5-70c7-25e5-6bc358cba3f0 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931fd1eff400, cur 1557101030 expire 1557100880 last 1557100803 May 05 17:03:50 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 05 17:04:07 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 05 17:04:07 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 05 17:15:47 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client e610d38c-5bd4-9ae6-45c1-3312b300f5a9 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff932be965c000, cur 1557101747 expire 1557101597 last 1557101520 May 05 17:15:47 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 05 17:16:10 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 05 17:16:10 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 05 17:23:08 fir-md1-s2 kernel: Lustre: 121416:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557102181/real 1557102181] req@ffff931f249b7800 x1632307161063312/t0(0) o106->fir-MDT0001@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557102188 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 05 17:23:08 fir-md1-s2 kernel: Lustre: 121416:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message May 05 17:23:16 fir-md1-s2 kernel: Lustre: 122125:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff93174fabef00 x1631582488446672/t0(0) o101->9d9a34f1-f4e0-0f10-cc72-f899159f3999@10.9.108.44@o2ib4:21/0 lens 480/568 e 1 to 0 dl 1557102201 ref 2 fl Interpret:/0/0 rc 0/0 May 05 17:23:22 fir-md1-s2 kernel: Lustre: 121416:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557102195/real 1557102195] req@ffff931f249b7800 x1632307161063312/t0(0) o106->fir-MDT0001@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557102202 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 05 17:23:22 fir-md1-s2 kernel: Lustre: 121416:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message May 05 17:23:22 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 9d9a34f1-f4e0-0f10-cc72-f899159f3999 (at 10.9.108.44@o2ib4) reconnecting May 05 17:23:22 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.108.44@o2ib4) May 05 17:23:33 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client e0d1e5c9-c6ad-9c5f-0cfb-a4429801fddb (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9327d1360400, cur 1557102213 expire 1557102063 last 1557101986 May 05 17:23:33 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 05 17:23:42 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client e0d1e5c9-c6ad-9c5f-0cfb-a4429801fddb (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9315defe2400, cur 1557102222 expire 1557102072 last 1557101995 May 05 17:23:43 fir-md1-s2 kernel: Lustre: 121416:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557102216/real 1557102216] req@ffff931f249b7800 x1632307161063312/t0(0) o106->fir-MDT0001@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557102223 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 05 17:23:43 fir-md1-s2 kernel: Lustre: 121416:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages May 05 17:23:44 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 9d9a34f1-f4e0-0f10-cc72-f899159f3999 (at 10.9.108.44@o2ib4) reconnecting May 05 17:23:44 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.108.44@o2ib4) May 05 17:24:05 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 9d9a34f1-f4e0-0f10-cc72-f899159f3999 (at 10.9.108.44@o2ib4) reconnecting May 05 17:24:05 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.108.44@o2ib4) May 05 17:24:25 fir-md1-s2 kernel: Lustre: 121416:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557102258/real 1557102258] req@ffff931f249b7800 x1632307161063312/t0(0) o106->fir-MDT0001@10.8.27.23@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557102265 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 05 17:24:25 fir-md1-s2 kernel: Lustre: 121416:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages May 05 17:24:33 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.108.44@o2ib4) May 05 17:25:10 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 9d9a34f1-f4e0-0f10-cc72-f899159f3999 (at 10.9.108.44@o2ib4) reconnecting May 05 17:25:10 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 05 17:25:10 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.108.44@o2ib4) May 05 17:25:15 fir-md1-s2 kernel: Lustre: 122161:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff93130b31ef00 x1631535698037312/t0(0) o101->25c05458-1ff8-5b3c-505b-360943a414ba@10.9.104.66@o2ib4:20/0 lens 480/568 e 1 to 0 dl 1557102320 ref 2 fl Interpret:/0/0 rc 0/0 May 05 17:25:25 fir-md1-s2 kernel: Lustre: 122147:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff9337912cd400 x1631535003130880/t0(0) o101->10f2d4d7-14b1-b74d-6b9a-4882d722713e@10.9.104.61@o2ib4:0/0 lens 480/568 e 0 to 0 dl 1557102330 ref 2 fl Interpret:/0/0 rc 0/0 May 05 17:25:25 fir-md1-s2 kernel: Lustre: 122147:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message May 05 17:25:28 fir-md1-s2 kernel: LustreError: 121416:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.27.23@o2ib6) returned error from glimpse AST (req@ffff931f249b7800 x1632307161063312 status -107 rc -107), evict it ns: mdt-fir-MDT0001_UUID lock: ffff93097f6af080/0x1c35e9de53d3c977 lrc: 10/0,0 mode: PW/PW res: [0x240026835:0x1b:0x0].0x0 bits 0x40/0x0 rrc: 15 type: IBT flags: 0x40200000000000 nid: 10.8.27.23@o2ib6 remote: 0x33faf7351ed9ee3d expref: 398 pid: 122067 timeout: 0 lvb_type: 0 May 05 17:25:28 fir-md1-s2 kernel: LustreError: 138-a: fir-MDT0001: A client on nid 10.8.27.23@o2ib6 was evicted due to a lock glimpse callback time out: rc -107 May 05 17:25:28 fir-md1-s2 kernel: LustreError: 121399:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 377s: evicting client at 10.8.27.23@o2ib6 ns: mdt-fir-MDT0001_UUID lock: ffff93097f6af080/0x1c35e9de53d3c977 lrc: 10/0,0 mode: PW/PW res: [0x240026835:0x1b:0x0].0x0 bits 0x40/0x0 rrc: 15 type: IBT flags: 0x40200000000000 nid: 10.8.27.23@o2ib6 remote: 0x33faf7351ed9ee3d expref: 399 pid: 122067 timeout: 0 lvb_type: 0 May 05 17:25:28 fir-md1-s2 kernel: LustreError: Skipped 1 previous similar message May 05 17:26:20 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 14597763-6ba9-eacb-a6fa-2c21e3eae766 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff933ec3e64c00, cur 1557102380 expire 1557102230 last 1557102153 May 05 17:26:39 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 05 17:26:39 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 05 17:29:21 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.10.29@o2ib6) May 05 17:29:21 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 05 17:30:16 fir-md1-s2 kernel: Lustre: 121638:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff930d306d6900 x1631540342916224/t0(0) o101->f7d39296-2681-999e-c9dd-38a3ef8bf584@10.9.106.15@o2ib4:21/0 lens 480/568 e 1 to 0 dl 1557102621 ref 2 fl Interpret:/0/0 rc 0/0 May 05 17:30:22 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client f7d39296-2681-999e-c9dd-38a3ef8bf584 (at 10.9.106.15@o2ib4) reconnecting May 05 17:30:22 fir-md1-s2 kernel: Lustre: Skipped 2 previous similar messages May 05 17:30:32 fir-md1-s2 kernel: Lustre: 122717:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557102601/real 1557102601] req@ffff930a7aeaa100 x1632307232439856/t0(0) o104->fir-MDT0001@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1557102632 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 05 17:30:32 fir-md1-s2 kernel: Lustre: 122717:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 20 previous similar messages May 05 17:31:17 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client bd7325a9-b1c4-77a5-13a8-c3ddd4816d3e (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930c09bb7400, cur 1557102677 expire 1557102527 last 1557102450 May 05 17:31:46 fir-md1-s2 kernel: Lustre: 122242:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff93385ebbec00 x1631626483553456/t0(0) o36->553a403a-82aa-538a-0604-abb10f5fa6f2@10.9.101.33@o2ib4:21/0 lens 624/2888 e 1 to 0 dl 1557102711 ref 2 fl Interpret:/0/0 rc 0/0 May 05 17:31:55 fir-md1-s2 kernel: Lustre: 122199:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff930a29746300 x1631592790766688/t0(0) o36->3ddfc0e1-d9a8-93ac-6e7d-3e2edb9b897f@10.8.0.65@o2ib6:0/0 lens 512/2888 e 0 to 0 dl 1557102720 ref 2 fl Interpret:/0/0 rc 0/0 May 05 17:32:20 fir-md1-s2 kernel: Lustre: 122220:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff931dac661b00 x1631744581452416/t0(0) o36->ea3d1456-cd4f-7735-a810-3c3f6db723ce@10.9.113.13@o2ib4:25/0 lens 600/2888 e 1 to 0 dl 1557102745 ref 2 fl Interpret:/0/0 rc 0/0 May 05 17:32:20 fir-md1-s2 kernel: Lustre: 122220:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 3 previous similar messages May 05 17:32:31 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 582e027c-5e44-ba39-3dcf-c73c4d22df06 (at 10.9.108.50@o2ib4) reconnecting May 05 17:32:31 fir-md1-s2 kernel: Lustre: Skipped 14 previous similar messages May 05 17:32:33 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client cc2c8b04-003e-22cf-de7e-1188f933372b (at 10.8.27.23@o2ib6) in 176 seconds. I think it's dead, and I am evicting it. exp ffff93304d6e0c00, cur 1557102753 expire 1557102603 last 1557102577 May 05 17:32:33 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 05 17:32:33 fir-md1-s2 kernel: Lustre: 122346:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (61:1s); client may timeout. req@ffff93130bbd0f00 x1631538537815328/t383233292022(0) o36->71ccab3c-8b5e-faa9-79d9-350fa5476430@10.9.104.28@o2ib4:1/0 lens 576/424 e 0 to 0 dl 1557102752 ref 1 fl Complete:/0/0 rc 0/0 May 05 17:36:19 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 07a0c852-b2e0-290a-b50d-405e7345e0e6 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff932eb3b99c00, cur 1557102979 expire 1557102829 last 1557102752 May 05 17:36:19 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 05 17:37:35 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 902d93a3-0bf2-a086-7f30-c11b3f787098 (at 10.8.11.9@o2ib6) in 182 seconds. I think it's dead, and I am evicting it. exp ffff932deb220800, cur 1557103055 expire 1557102905 last 1557102873 May 05 17:37:35 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 05 17:40:47 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.11.9@o2ib6) May 05 17:40:47 fir-md1-s2 kernel: Lustre: Skipped 23 previous similar messages May 05 17:44:18 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client b4bd7b93-cacc-15b1-2cfb-98eabdadef45 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931b41a0f000, cur 1557103458 expire 1557103308 last 1557103231 May 05 17:44:18 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 05 17:53:31 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client b2b98883-4628-1f53-309c-6ef2c28133a9 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff932025b74800, cur 1557104011 expire 1557103861 last 1557103784 May 05 17:53:31 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 05 17:53:48 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 05 17:53:48 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages May 05 18:09:56 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 05 18:09:56 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 05 18:10:17 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 3a266b78-35e3-8d2e-d60d-b1d3410316d8 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931ad7311400, cur 1557105017 expire 1557104867 last 1557104790 May 05 18:10:17 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 05 18:40:04 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client f0de2cdc-c8d7-f876-7ae9-dfeeafee4256 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931741bc6000, cur 1557106804 expire 1557106654 last 1557106577 May 05 18:40:04 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 05 18:40:11 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client f0de2cdc-c8d7-f876-7ae9-dfeeafee4256 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93148e3a4400, cur 1557106811 expire 1557106661 last 1557106584 May 05 18:40:26 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 05 18:40:26 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 05 19:03:32 fir-md1-s2 kernel: Lustre: 122870:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557108205/real 1557108205] req@ffff93257ea43000 x1632308161311232/t0(0) o104->fir-MDT0001@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1557108212 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 05 19:03:32 fir-md1-s2 kernel: Lustre: 122870:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages May 05 19:03:40 fir-md1-s2 kernel: Lustre: 122731:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff93301fd0b300 x1631565559975024/t0(0) o36->7cc0f019-7fa6-17b1-76f1-8ecb3c84ba82@10.8.27.24@o2ib6:15/0 lens 496/2888 e 1 to 0 dl 1557108225 ref 2 fl Interpret:/0/0 rc 0/0 May 05 19:03:40 fir-md1-s2 kernel: Lustre: 122731:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 4 previous similar messages May 05 19:03:46 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 7cc0f019-7fa6-17b1-76f1-8ecb3c84ba82 (at 10.8.27.24@o2ib6) reconnecting May 05 19:03:46 fir-md1-s2 kernel: Lustre: Skipped 2 previous similar messages May 05 19:03:46 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.24@o2ib6) May 05 19:03:49 fir-md1-s2 kernel: Lustre: 122004:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff932b5f34bf00 x1631538527650208/t0(0) o101->b02080b5-da3e-09d3-6c8c-2d597107cbcf@10.9.108.39@o2ib4:24/0 lens 576/3264 e 1 to 0 dl 1557108234 ref 2 fl Interpret:/0/0 rc 0/0 May 05 19:03:49 fir-md1-s2 kernel: Lustre: 122004:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message May 05 19:03:53 fir-md1-s2 kernel: Lustre: 122870:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557108226/real 1557108226] req@ffff93257ea43000 x1632308161311232/t0(0) o104->fir-MDT0001@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1557108233 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 05 19:03:53 fir-md1-s2 kernel: Lustre: 122870:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages May 05 19:03:59 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.101.46@o2ib4) May 05 19:03:59 fir-md1-s2 kernel: Lustre: Skipped 2 previous similar messages May 05 19:04:20 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 2d384d58-fd4c-f6d6-342b-6f9f296484e1 (at 10.9.101.46@o2ib4) reconnecting May 05 19:04:20 fir-md1-s2 kernel: Lustre: Skipped 6 previous similar messages May 05 19:04:20 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.101.46@o2ib4) May 05 19:04:20 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 05 19:04:35 fir-md1-s2 kernel: Lustre: 122870:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557108268/real 1557108268] req@ffff93257ea43000 x1632308161311232/t0(0) o104->fir-MDT0001@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1557108275 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 05 19:04:35 fir-md1-s2 kernel: Lustre: 122870:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages May 05 19:04:58 fir-md1-s2 kernel: LustreError: 122299:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1557108208, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0001_UUID lock: ffff931dd966de80/0x1c35e9de6d76acdc lrc: 3/1,0 mode: --/PR res: [0x240026837:0xb:0x0].0x0 bits 0x13/0x0 rrc: 16 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 122299 timeout: 0 lvb_type: 0 May 05 19:04:58 fir-md1-s2 kernel: LustreError: 122299:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 1 previous similar message May 05 19:04:58 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to b02080b5-da3e-09d3-6c8c-2d597107cbcf (at 10.9.108.39@o2ib4) May 05 19:04:58 fir-md1-s2 kernel: Lustre: Skipped 6 previous similar messages May 05 19:05:04 fir-md1-s2 kernel: LustreError: 122181:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1557108214, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0001_UUID lock: ffff931fa3b11680/0x1c35e9de6d7d363e lrc: 3/1,0 mode: --/PR res: [0x240026837:0xb:0x0].0x0 bits 0x13/0x0 rrc: 16 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 122181 timeout: 0 lvb_type: 0 May 05 19:05:08 fir-md1-s2 kernel: LustreError: 121632:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1557108218, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0001_UUID lock: ffff933f7bf49f80/0x1c35e9de6d81b88f lrc: 3/1,0 mode: --/PR res: [0x240026837:0xb:0x0].0x0 bits 0x13/0x0 rrc: 16 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 121632 timeout: 0 lvb_type: 0 May 05 19:05:31 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 7cc0f019-7fa6-17b1-76f1-8ecb3c84ba82 (at 10.8.27.24@o2ib6) reconnecting May 05 19:05:31 fir-md1-s2 kernel: Lustre: Skipped 12 previous similar messages May 05 19:05:38 fir-md1-s2 kernel: LustreError: 122870:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.27.23@o2ib6) returned error from blocking AST (req@ffff93257ea43000 x1632308161311232 status -107 rc -107), evict it ns: mdt-fir-MDT0001_UUID lock: ffff931d6c6dd580/0x1c35e9de6d1ceec0 lrc: 4/0,0 mode: PR/PR res: [0x240025f32:0xcb4:0x0].0x0 bits 0x5b/0x0 rrc: 4 type: IBT flags: 0x60200400000020 nid: 10.8.27.23@o2ib6 remote: 0x5c3888f864484aa2 expref: 808 pid: 122663 timeout: 633726 lvb_type: 0 May 05 19:05:38 fir-md1-s2 kernel: LustreError: 122870:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) Skipped 1 previous similar message May 05 19:05:38 fir-md1-s2 kernel: LustreError: 138-a: fir-MDT0001: A client on nid 10.8.27.23@o2ib6 was evicted due to a lock blocking callback time out: rc -107 May 05 19:05:38 fir-md1-s2 kernel: LustreError: 121399:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 133s: evicting client at 10.8.27.23@o2ib6 ns: mdt-fir-MDT0001_UUID lock: ffff931d6c6dd580/0x1c35e9de6d1ceec0 lrc: 3/0,0 mode: PR/PR res: [0x240025f32:0xcb4:0x0].0x0 bits 0x5b/0x0 rrc: 4 type: IBT flags: 0x60200400000020 nid: 10.8.27.23@o2ib6 remote: 0x5c3888f864484aa2 expref: 809 pid: 122663 timeout: 0 lvb_type: 0 May 05 19:06:26 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 750359b1-25ef-0897-47c0-badbd7fbdfde (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff932c6f716400, cur 1557108386 expire 1557108236 last 1557108159 May 05 19:06:42 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 05 19:06:42 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages May 05 19:23:03 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 31108e76-1364-d253-fa5d-86986522d1ba (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9309027a0400, cur 1557109383 expire 1557109233 last 1557109156 May 05 19:23:05 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 31108e76-1364-d253-fa5d-86986522d1ba (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931b7ae1ec00, cur 1557109385 expire 1557109235 last 1557109158 May 05 19:23:25 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 05 19:23:25 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 05 20:58:27 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 514ad686-cd5b-be48-3af9-a061fcf7c5e8 (at 10.9.114.5@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93207c39e400, cur 1557115107 expire 1557114957 last 1557114880 May 05 20:58:33 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.114.5@o2ib4) May 05 20:58:33 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 05 21:40:01 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client daa77fae-4b8c-75d0-1ecc-0a79e00652e7 (at 10.9.114.5@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff933f9caa7000, cur 1557117601 expire 1557117451 last 1557117374 May 05 21:40:01 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 05 21:42:07 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.114.5@o2ib4) May 05 21:42:07 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 05 22:37:30 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 901c6d03-45fd-4c09-4368-156dc13152bb (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9318a326ac00, cur 1557121050 expire 1557120900 last 1557120823 May 05 22:37:30 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 05 22:38:29 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 05 22:38:29 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 05 22:43:32 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 05 22:43:32 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 05 22:43:34 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 8e9e430a-c85d-67e0-c730-0438e3526f0d (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff933d1a310400, cur 1557121414 expire 1557121264 last 1557121187 May 05 22:43:34 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 05 23:22:29 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client e711c326-9f7d-56b9-9495-262ae7e853f7 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff933d11af8000, cur 1557123749 expire 1557123599 last 1557123522 May 05 23:22:29 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 05 23:22:52 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 05 23:22:52 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 05 23:50:07 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client b1c12f7e-0a87-6b2a-55d8-d25e6cdfd920 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff933de9ae2000, cur 1557125407 expire 1557125257 last 1557125180 May 05 23:50:07 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 05 23:50:38 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 05 23:50:38 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 01:27:33 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 24ca8116-461e-c903-a940-fe7ca2d04ce6 (at 10.9.114.5@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9312ede16800, cur 1557131253 expire 1557131103 last 1557131026 May 06 01:27:33 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 01:34:47 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.114.5@o2ib4) May 06 01:34:47 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 01:44:50 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 9177b8cc-fbf6-7092-3950-f0b9a9cae43a (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff932bfa8a1400, cur 1557132290 expire 1557132140 last 1557132063 May 06 01:44:50 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 01:45:23 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 06 01:45:23 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 01:53:30 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 67d4a7c9-5d29-d88b-f4cd-c33daa6af09b (at 10.8.15.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931b306bfc00, cur 1557132810 expire 1557132660 last 1557132583 May 06 01:53:30 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 01:54:21 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.15.4@o2ib6) May 06 01:54:21 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 02:29:37 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client d77b9fc6-f5d4-2a19-3520-fb6d1e074654 (at 10.9.114.5@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930feeaa4c00, cur 1557134977 expire 1557134827 last 1557134750 May 06 02:29:37 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 02:31:26 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.114.5@o2ib4) May 06 02:40:31 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client f49684ef-e9d0-d51f-e4f9-6eb30344d63e (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9319c6ec4000, cur 1557135631 expire 1557135481 last 1557135404 May 06 02:40:31 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 02:47:18 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 06 02:47:18 fir-md1-s2 kernel: Lustre: Skipped 2 previous similar messages May 06 03:02:50 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 6cfba528-87e9-efcd-a4bf-8720e8486959 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93008ce41000, cur 1557136970 expire 1557136820 last 1557136743 May 06 03:02:50 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 03:04:12 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 06 03:04:12 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 03:04:18 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 06 03:04:18 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 03:13:28 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 7183b88b-811e-f5b2-96a6-c6fe99d7288b (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff932785ac8c00, cur 1557137608 expire 1557137458 last 1557137381 May 06 03:13:28 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 06 03:13:41 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 7183b88b-811e-f5b2-96a6-c6fe99d7288b (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9312a027a000, cur 1557137621 expire 1557137471 last 1557137394 May 06 03:14:02 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 06 03:14:02 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 03:18:35 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client d6fba4ba-5f7d-7172-1bf9-db6eb412de93 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff932782266000, cur 1557137915 expire 1557137765 last 1557137688 May 06 03:19:05 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 06 03:19:05 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 03:23:50 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 7bc6242f-6717-4eef-3be9-f35c1f51d2fe (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93129cf2c400, cur 1557138230 expire 1557138080 last 1557138003 May 06 03:23:50 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 03:24:08 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 7bc6242f-6717-4eef-3be9-f35c1f51d2fe (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930e47690c00, cur 1557138248 expire 1557138098 last 1557138021 May 06 03:24:33 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 06 03:24:33 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 03:25:06 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 3a103876-5e4f-5367-0db9-e25b704853af (at 10.8.26.4@o2ib6) in 207 seconds. I think it's dead, and I am evicting it. exp ffff9314577cf400, cur 1557138306 expire 1557138156 last 1557138099 May 06 03:25:24 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 3a103876-5e4f-5367-0db9-e25b704853af (at 10.8.26.4@o2ib6) in 225 seconds. I think it's dead, and I am evicting it. exp ffff932de7331800, cur 1557138324 expire 1557138174 last 1557138099 May 06 03:26:18 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 06 03:33:03 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client fa1763e1-d3e1-7734-ad19-cfa61e3b8bee (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931b4fb09c00, cur 1557138783 expire 1557138633 last 1557138556 May 06 03:33:54 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 06 03:33:54 fir-md1-s2 kernel: Lustre: Skipped 2 previous similar messages May 06 03:34:19 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 343ba3e3-d70f-a3da-f13c-3f6a9bc1bebd (at 10.8.27.23@o2ib6) in 157 seconds. I think it's dead, and I am evicting it. exp ffff93129cab2000, cur 1557138859 expire 1557138709 last 1557138702 May 06 03:34:19 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 03:35:47 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 06 03:35:47 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 03:42:20 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 520bd10e-9f0d-8e63-4df7-8d30dee0285d (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93044b287400, cur 1557139340 expire 1557139190 last 1557139113 May 06 03:42:20 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 03:45:30 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 06 03:45:30 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 03:51:50 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client a9a926b0-41e3-b96d-29a7-ef486f00a747 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930fdbf19000, cur 1557139910 expire 1557139760 last 1557139683 May 06 03:51:50 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 03:53:45 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 06 03:53:45 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 04:04:16 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 4ff17c06-109f-408d-69ea-950054b255fd (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9308eb2a0c00, cur 1557140656 expire 1557140506 last 1557140429 May 06 04:04:16 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 04:04:21 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 06 04:04:21 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 04:16:17 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client f12a8c98-f4fd-747b-2ce7-336123fd5d99 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9312903ad800, cur 1557141377 expire 1557141227 last 1557141150 May 06 04:16:17 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 06 04:17:43 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 06 04:17:43 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 06 04:34:22 fir-md1-s2 kernel: Lustre: 122005:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557142455/real 1557142455] req@ffff93129bfd1800 x1632313688479664/t0(0) o104->fir-MDT0003@10.8.26.4@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1557142462 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 06 04:34:22 fir-md1-s2 kernel: Lustre: 122005:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 9 previous similar messages May 06 04:34:30 fir-md1-s2 kernel: Lustre: 122197:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff931e3cbd6900 x1631895805215360/t0(0) o36->38012cd9-c129-8cc6-54ac-519b05aa44c7@10.8.11.14@o2ib6:5/0 lens 512/448 e 1 to 0 dl 1557142475 ref 2 fl Interpret:/0/0 rc 0/0 May 06 04:34:30 fir-md1-s2 kernel: Lustre: 122197:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message May 06 04:34:36 fir-md1-s2 kernel: Lustre: 122005:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557142469/real 1557142469] req@ffff93129bfd1800 x1632313688479664/t0(0) o104->fir-MDT0003@10.8.26.4@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1557142476 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 06 04:34:36 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 38012cd9-c129-8cc6-54ac-519b05aa44c7 (at 10.8.11.14@o2ib6) reconnecting May 06 04:34:36 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 04:34:36 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to (at 10.8.11.14@o2ib6) May 06 04:34:36 fir-md1-s2 kernel: Lustre: Skipped 2 previous similar messages May 06 04:34:36 fir-md1-s2 kernel: Lustre: 122005:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message May 06 04:34:57 fir-md1-s2 kernel: Lustre: 122005:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557142490/real 1557142490] req@ffff93129bfd1800 x1632313688479664/t0(0) o104->fir-MDT0003@10.8.26.4@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1557142497 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 06 04:34:57 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 38012cd9-c129-8cc6-54ac-519b05aa44c7 (at 10.8.11.14@o2ib6) reconnecting May 06 04:34:57 fir-md1-s2 kernel: Lustre: 122005:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages May 06 04:35:24 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 12182b47-d21c-6246-b9eb-d0d956754233 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93129425d000, cur 1557142524 expire 1557142374 last 1557142297 May 06 04:35:24 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 06 04:47:33 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client ce07cd06-fc7e-cd7e-5154-e77e8edfdcd9 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93128bac8000, cur 1557143253 expire 1557143103 last 1557143026 May 06 04:47:33 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages May 06 04:52:37 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 06 04:52:37 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages May 06 05:01:14 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client f9e9b045-6235-217c-c69f-23d78fca9da8 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93105481e000, cur 1557144074 expire 1557143924 last 1557143847 May 06 05:01:14 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 05:03:15 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 06 05:03:15 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 06 05:09:39 fir-md1-s2 kernel: Lustre: 122033:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557144572/real 1557144572] req@ffff93233ba1c500 x1632314059962304/t0(0) o104->fir-MDT0001@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1557144579 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 06 05:09:39 fir-md1-s2 kernel: Lustre: 122033:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages May 06 05:09:46 fir-md1-s2 kernel: Lustre: 122033:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557144579/real 1557144579] req@ffff93233ba1c500 x1632314059962304/t0(0) o104->fir-MDT0001@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1557144586 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 06 05:09:47 fir-md1-s2 kernel: Lustre: 121419:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff9323383ada00 x1631536647738672/t0(0) o101->6519c919-e2e8-89a4-1fa4-d0ad3d892e61@10.8.27.20@o2ib6:22/0 lens 480/568 e 1 to 0 dl 1557144592 ref 2 fl Interpret:/0/0 rc 0/0 May 06 05:09:53 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 6519c919-e2e8-89a4-1fa4-d0ad3d892e61 (at 10.8.27.20@o2ib6) reconnecting May 06 05:09:53 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 05:10:00 fir-md1-s2 kernel: Lustre: 122033:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557144593/real 1557144593] req@ffff93233ba1c500 x1632314059962304/t0(0) o104->fir-MDT0001@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1557144600 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 06 05:10:00 fir-md1-s2 kernel: Lustre: 122033:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message May 06 05:10:14 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 6519c919-e2e8-89a4-1fa4-d0ad3d892e61 (at 10.8.27.20@o2ib6) reconnecting May 06 05:10:21 fir-md1-s2 kernel: Lustre: 122033:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557144614/real 1557144614] req@ffff93233ba1c500 x1632314059962304/t0(0) o104->fir-MDT0001@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1557144621 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 06 05:10:21 fir-md1-s2 kernel: Lustre: 122033:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages May 06 05:10:36 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 6519c919-e2e8-89a4-1fa4-d0ad3d892e61 (at 10.8.27.20@o2ib6) reconnecting May 06 05:10:57 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 6519c919-e2e8-89a4-1fa4-d0ad3d892e61 (at 10.8.27.20@o2ib6) reconnecting May 06 05:14:03 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 06 05:14:03 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages May 06 05:14:14 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 16ccca64-89bf-afcb-7d02-bc38a437f431 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9312c82f4400, cur 1557144854 expire 1557144704 last 1557144627 May 06 05:14:14 fir-md1-s2 kernel: Lustre: Skipped 7 previous similar messages May 06 05:29:07 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 7121a242-338a-1ba4-f70e-31e5c799be7d (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff932025488800, cur 1557145747 expire 1557145597 last 1557145520 May 06 05:29:07 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 06 05:29:25 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 06 05:29:25 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 06 05:42:19 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 06 05:42:19 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages May 06 05:44:27 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 4f3b06e6-1104-cce8-291c-d4758aea5e4e (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff933027f2d000, cur 1557146667 expire 1557146517 last 1557146440 May 06 05:44:27 fir-md1-s2 kernel: Lustre: Skipped 7 previous similar messages May 06 05:57:45 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client d9cb198e-d811-8068-7f12-28bd98732f10 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff932350e25400, cur 1557147465 expire 1557147315 last 1557147238 May 06 05:57:45 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages May 06 06:00:30 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 06 06:00:30 fir-md1-s2 kernel: Lustre: Skipped 7 previous similar messages May 06 06:08:14 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 706384ea-f41b-ff57-ed02-af4b44495488 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93093f347000, cur 1557148094 expire 1557147944 last 1557147867 May 06 06:08:14 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 06 06:16:25 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 06 06:16:25 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages May 06 06:22:20 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 8d53a37e-b13e-696a-792a-63f4340009d8 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931ba5b5d800, cur 1557148940 expire 1557148790 last 1557148713 May 06 06:22:20 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 06 06:27:26 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 06 06:27:26 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 06 06:34:59 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 47e215b0-c979-bff7-f1bf-a67b2088f7a4 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93127964d000, cur 1557149699 expire 1557149549 last 1557149472 May 06 06:34:59 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 06 06:42:16 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 06 06:42:16 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 06 07:01:57 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client c2a984b8-884a-1562-e36e-5bea7c087c2e (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff932cf9f91000, cur 1557151317 expire 1557151167 last 1557151090 May 06 07:01:57 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 06 07:02:46 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 06 07:02:46 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 07:08:41 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 97e06487-815b-ad03-ae76-bb0bf9a66e68 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93203f395400, cur 1557151721 expire 1557151571 last 1557151494 May 06 07:08:41 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 07:09:27 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 06 07:09:27 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 07:13:42 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 5c2667ef-405b-08b1-d8c7-f2b8cfd105f2 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9300a9bda000, cur 1557152022 expire 1557151872 last 1557151795 May 06 07:13:42 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 07:13:57 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 06 07:13:57 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 07:19:50 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 06 07:19:50 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 07:19:52 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client db5a2821-4ddb-74b9-4a39-2df36e495426 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930a277a1c00, cur 1557152392 expire 1557152242 last 1557152165 May 06 07:19:52 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 07:34:17 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 327b978a-137f-70bf-43b8-6f356295a322 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931d91a93c00, cur 1557153257 expire 1557153107 last 1557153030 May 06 07:34:17 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 06 07:35:03 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 06 07:35:03 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 06 07:48:54 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 60d06d9f-490e-bb13-4e7a-db59a5f7fc49 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff933053b64400, cur 1557154134 expire 1557153984 last 1557153907 May 06 07:48:54 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 06 07:49:07 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 06 07:49:07 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 06 08:05:20 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client fdef6866-4d02-f296-44b6-04d50d09abe2 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9308cc248800, cur 1557155120 expire 1557154970 last 1557154893 May 06 08:05:20 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages May 06 08:05:50 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 06 08:05:50 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages May 06 08:15:44 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 93b9084b-1c2e-8cf4-5955-85e835bcdcb2 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9304c077a800, cur 1557155744 expire 1557155594 last 1557155517 May 06 08:15:44 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 06 08:16:12 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.21.21@o2ib6) May 06 08:16:12 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 06 08:26:43 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 17ffebea-96cb-f0bf-2d45-65dcf724fd08 (at 10.8.21.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9329fb661800, cur 1557156403 expire 1557156253 last 1557156176 May 06 08:26:43 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 06 08:27:32 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 06 08:27:32 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages May 06 08:43:36 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client a319dd98-c842-c46e-24f6-b29aa32319a9 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931af9ebf000, cur 1557157416 expire 1557157266 last 1557157189 May 06 08:43:36 fir-md1-s2 kernel: Lustre: Skipped 7 previous similar messages May 06 08:43:39 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 06 08:43:39 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages May 06 08:59:57 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 072fbe89-de73-a628-eacf-0142936a2a9d (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931e2e3d9400, cur 1557158397 expire 1557158247 last 1557158170 May 06 08:59:57 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages May 06 09:02:42 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 06 09:02:42 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages May 06 09:20:01 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client b5268e10-4518-b666-695a-5d9cae456d77 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9326cb30d800, cur 1557159601 expire 1557159451 last 1557159374 May 06 09:20:01 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 06 09:23:59 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 06 09:23:59 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 06 09:30:31 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client d7e49c9f-8146-7fe5-a38a-a362e877fd64 (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93195be6ac00, cur 1557160231 expire 1557160081 last 1557160004 May 06 09:30:31 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 09:32:15 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.11.9@o2ib6) May 06 09:32:15 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 09:49:18 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 06 09:49:18 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 09:56:53 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 7e2298e1-48e8-e680-96bb-e15d97d99a0e (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9318e86af000, cur 1557161813 expire 1557161663 last 1557161586 May 06 09:56:53 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 06 10:01:26 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 06 10:01:26 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 10:09:51 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 469a77a9-f16a-f1b8-e132-3f330ca8fc3c (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9314fca85c00, cur 1557162591 expire 1557162441 last 1557162364 May 06 10:09:51 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 10:13:05 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 06 10:13:05 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 10:22:42 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 26a94d75-def1-f812-a8fb-3d07b8b33fa6 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931277a59000, cur 1557163362 expire 1557163212 last 1557163135 May 06 10:22:42 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 10:22:48 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 06 10:22:48 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 10:29:08 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client bf7ddf25-2c21-97b7-fd16-0978818b941b (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930825215000, cur 1557163748 expire 1557163598 last 1557163521 May 06 10:29:08 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 10:30:00 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 06 10:30:00 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 10:32:06 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 333a536a-25d4-142d-1889-07167f2dcdc6 (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931258e70800, cur 1557163926 expire 1557163776 last 1557163699 May 06 10:32:06 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 10:32:24 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 06 10:32:24 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 10:33:22 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client fc4a73cd-b94b-5573-7947-7e92e3555159 (at 10.8.26.4@o2ib6) in 202 seconds. I think it's dead, and I am evicting it. exp ffff933e1678dc00, cur 1557164002 expire 1557163852 last 1557163800 May 06 10:33:22 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 10:33:34 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.11.9@o2ib6) May 06 10:33:34 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 10:33:47 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client fc4a73cd-b94b-5573-7947-7e92e3555159 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff933dd3ea8400, cur 1557164027 expire 1557163877 last 1557163800 May 06 10:35:38 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 06 10:35:38 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 10:36:39 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client fa94fe19-9187-6fef-639a-a6f5f2a6424e (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930fd2a0a400, cur 1557164199 expire 1557164049 last 1557163972 May 06 10:45:41 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 9223f222-77bc-063e-4d5b-9e61a153e868 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9319f62f7400, cur 1557164741 expire 1557164591 last 1557164514 May 06 10:45:41 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 10:52:59 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 06 10:52:59 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 10:56:36 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client b5850518-2b56-b7a4-1a2a-ce10c7c061ef (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9319b0732c00, cur 1557165396 expire 1557165246 last 1557165169 May 06 10:56:36 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 10:56:55 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 06 10:56:55 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 11:02:40 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 21995ffb-bda9-2f51-f494-926f5bb5a4ab (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931b2e256c00, cur 1557165760 expire 1557165610 last 1557165533 May 06 11:02:40 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 11:05:29 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 06 11:05:29 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 11:10:53 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 06 11:10:53 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 11:10:56 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 1235fdf1-ea0b-2ff3-7b7a-5905efcd76aa (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930e36143000, cur 1557166256 expire 1557166106 last 1557166029 May 06 11:10:56 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 11:23:47 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 06 11:23:47 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 11:24:44 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 4ce1a997-ba24-0e18-3bc8-a3a73955800e (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff932698e5d000, cur 1557167084 expire 1557166934 last 1557166857 May 06 11:24:44 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 11:28:16 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 06 11:28:16 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 11:31:58 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 06 11:31:58 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 11:38:26 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 7292487d-4c3f-beaa-f7a3-4a39edecde1a (at 10.9.114.5@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9323596a3800, cur 1557167906 expire 1557167756 last 1557167679 May 06 11:38:26 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages May 06 11:46:03 fir-md1-s2 kernel: Lustre: 122145:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557168356/real 1557168356] req@ffff931c8ebffb00 x1632318386489152/t0(0) o104->fir-MDT0001@10.8.1.8@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1557168363 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 06 11:46:03 fir-md1-s2 kernel: Lustre: 122145:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages May 06 11:46:10 fir-md1-s2 kernel: Lustre: 122145:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557168363/real 1557168363] req@ffff931c8ebffb00 x1632318386489152/t0(0) o104->fir-MDT0001@10.8.1.8@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1557168370 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 06 11:46:11 fir-md1-s2 kernel: Lustre: 122314:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff931b5d70dd00 x1632257678662768/t0(0) o101->87dc7894-17ad-35e9-debd-4fd0a600b9db@10.8.1.3@o2ib6:16/0 lens 1784/3288 e 1 to 0 dl 1557168376 ref 2 fl Interpret:/0/0 rc 0/0 May 06 11:46:12 fir-md1-s2 kernel: Lustre: 122248:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff93228eee4b00 x1631543367646240/t0(0) o101->2e1837bb-385a-af64-a5d1-7a58230af8b2@10.9.0.64@o2ib4:17/0 lens 584/3264 e 1 to 0 dl 1557168377 ref 2 fl Interpret:/0/0 rc 0/0 May 06 11:46:17 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 87dc7894-17ad-35e9-debd-4fd0a600b9db (at 10.8.1.3@o2ib6) reconnecting May 06 11:46:17 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.1.3@o2ib6) May 06 11:46:18 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.0.64@o2ib4) May 06 11:46:24 fir-md1-s2 kernel: Lustre: 122145:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557168377/real 1557168377] req@ffff931c8ebffb00 x1632318386489152/t0(0) o104->fir-MDT0001@10.8.1.8@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1557168384 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 06 11:46:24 fir-md1-s2 kernel: Lustre: 122145:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message May 06 11:46:38 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 87dc7894-17ad-35e9-debd-4fd0a600b9db (at 10.8.1.3@o2ib6) reconnecting May 06 11:46:38 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 11:46:38 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.1.3@o2ib6) May 06 11:46:45 fir-md1-s2 kernel: Lustre: 122145:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557168398/real 1557168398] req@ffff931c8ebffb00 x1632318386489152/t0(0) o104->fir-MDT0001@10.8.1.8@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1557168405 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 06 11:46:45 fir-md1-s2 kernel: Lustre: 122145:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages May 06 11:46:47 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.114.5@o2ib4) May 06 11:46:47 fir-md1-s2 kernel: Lustre: Skipped 2 previous similar messages May 06 11:46:59 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 87dc7894-17ad-35e9-debd-4fd0a600b9db (at 10.8.1.3@o2ib6) reconnecting May 06 11:46:59 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 11:46:59 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.1.3@o2ib6) May 06 11:47:04 fir-md1-s2 kernel: Lustre: 121633:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff931b5ae13850 x1632180781631824/t0(0) o101->a15481a3-7f9b-2fb5-a19f-95b625f6846e@10.8.1.2@o2ib6:9/0 lens 576/3264 e 0 to 0 dl 1557168429 ref 2 fl Interpret:/0/0 rc 0/0 May 06 11:47:14 fir-md1-s2 kernel: Lustre: 9075:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-1), not sending early reply req@ffff932f7526b300 x1632257678668480/t0(0) o101->87dc7894-17ad-35e9-debd-4fd0a600b9db@10.8.1.3@o2ib6:19/0 lens 576/3264 e 1 to 0 dl 1557168439 ref 2 fl Interpret:/0/0 rc 0/0 May 06 11:47:20 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 87dc7894-17ad-35e9-debd-4fd0a600b9db (at 10.8.1.3@o2ib6) reconnecting May 06 11:47:20 fir-md1-s2 kernel: Lustre: Skipped 2 previous similar messages May 06 11:47:20 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.1.3@o2ib6) May 06 11:47:20 fir-md1-s2 kernel: Lustre: Skipped 2 previous similar messages May 06 11:47:27 fir-md1-s2 kernel: Lustre: 122145:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557168440/real 1557168440] req@ffff931c8ebffb00 x1632318386489152/t0(0) o104->fir-MDT0001@10.8.1.8@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1557168447 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 06 11:47:27 fir-md1-s2 kernel: Lustre: 122145:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages May 06 11:47:27 fir-md1-s2 kernel: LustreError: 49301:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1557168357, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0001_UUID lock: ffff933c35fb8000/0x1c35e9df4e98a67e lrc: 3/1,0 mode: --/PR res: [0x240023fff:0x8af9:0x0].0x0 bits 0x13/0x0 rrc: 17 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 49301 timeout: 0 lvb_type: 0 May 06 11:47:49 fir-md1-s2 kernel: Lustre: 122038:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff932c32791500 x1632180691911024/t0(0) o101->7b8ef936-4105-b014-ef94-c36c21315324@10.8.1.1@o2ib6:24/0 lens 576/3264 e 0 to 0 dl 1557168474 ref 2 fl Interpret:/0/0 rc 0/0 May 06 11:47:55 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 7b8ef936-4105-b014-ef94-c36c21315324 (at 10.8.1.1@o2ib6) reconnecting May 06 11:47:55 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages May 06 11:47:55 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.1.1@o2ib6) May 06 11:47:55 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages May 06 11:48:07 fir-md1-s2 kernel: Lustre: 122054:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff931b37ec9b00 x1631534153696560/t0(0) o101->d3013375-2e90-b76e-c4d8-76867f2b4a32@10.8.2.20@o2ib6:12/0 lens 480/568 e 0 to 0 dl 1557168492 ref 2 fl Interpret:/0/0 rc 0/0 May 06 11:48:07 fir-md1-s2 kernel: Lustre: 122054:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message May 06 11:48:09 fir-md1-s2 kernel: LustreError: 122222:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1557168399, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0001_UUID lock: ffff931a7dbaca40/0x1c35e9df4ec11015 lrc: 3/1,0 mode: --/PR res: [0x240023fff:0x8af9:0x0].0x0 bits 0x13/0x0 rrc: 17 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 122222 timeout: 0 lvb_type: 0 May 06 11:48:23 fir-md1-s2 kernel: LustreError: 122690:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1557168413, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0001_UUID lock: ffff932be8d48000/0x1c35e9df4ecec545 lrc: 3/1,0 mode: --/PR res: [0x240023fff:0x8af9:0x0].0x0 bits 0x13/0x0 rrc: 17 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 122690 timeout: 0 lvb_type: 0 May 06 11:48:30 fir-md1-s2 kernel: LustreError: 122145:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.1.8@o2ib6) failed to reply to blocking AST (req@ffff931c8ebffb00 x1632318386489152 status 0 rc -110), evict it ns: mdt-fir-MDT0001_UUID lock: ffff933d156ce0c0/0x1c35e9df4e911b3b lrc: 4/0,0 mode: PR/PR res: [0x240023fff:0x8af9:0x0].0x0 bits 0x13/0x0 rrc: 17 type: IBT flags: 0x60200400000020 nid: 10.8.1.8@o2ib6 remote: 0x2ce4b3d7751ed0fb expref: 210 pid: 121639 timeout: 693891 lvb_type: 0 May 06 11:48:30 fir-md1-s2 kernel: LustreError: 138-a: fir-MDT0001: A client on nid 10.8.1.8@o2ib6 was evicted due to a lock blocking callback time out: rc -110 May 06 11:48:30 fir-md1-s2 kernel: LustreError: 121399:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 154s: evicting client at 10.8.1.8@o2ib6 ns: mdt-fir-MDT0001_UUID lock: ffff933d156ce0c0/0x1c35e9df4e911b3b lrc: 3/0,0 mode: PR/PR res: [0x240023fff:0x8af9:0x0].0x0 bits 0x13/0x0 rrc: 17 type: IBT flags: 0x60200400000020 nid: 10.8.1.8@o2ib6 remote: 0x2ce4b3d7751ed0fb expref: 211 pid: 121639 timeout: 0 lvb_type: 0 May 06 11:48:45 fir-md1-s2 kernel: Lustre: 122166:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557168518/real 1557168518] req@ffff931622ac3600 x1632318405127392/t0(0) o106->fir-MDT0001@10.9.106.3@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1557168525 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 06 11:48:45 fir-md1-s2 kernel: Lustre: 122166:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 17 previous similar messages May 06 11:48:51 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 6e281d64-50b1-ba40-7a98-35184b0fb522 (at 10.8.1.17@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9315e46f0000, cur 1557168531 expire 1557168381 last 1557168304 May 06 11:48:51 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 12:00:53 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client acede4dd-4e5a-15d9-b600-94f6cf908bdb (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931634fdd800, cur 1557169253 expire 1557169103 last 1557169026 May 06 12:00:53 fir-md1-s2 kernel: Lustre: Skipped 58 previous similar messages May 06 12:01:30 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 06 12:01:30 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages May 06 12:06:18 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 06 12:06:18 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 12:17:10 fir-md1-s2 kernel: Lustre: 122455:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff930f91ede850 x1631546723084608/t0(0) o4->77feb475-136e-a3f9-5453-fd421c405ed4@10.9.107.5@o2ib4:15/0 lens 2560/448 e 1 to 0 dl 1557170235 ref 2 fl Interpret:/0/0 rc 0/0 May 06 12:17:17 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 77feb475-136e-a3f9-5453-fd421c405ed4 (at 10.9.107.5@o2ib4) reconnecting May 06 12:17:17 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages May 06 12:17:17 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.107.5@o2ib4) May 06 12:17:23 fir-md1-s2 kernel: Lustre: 122184:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:8s); client may timeout. req@ffff930f91ede850 x1631546723084608/t296744212945(0) o4->77feb475-136e-a3f9-5453-fd421c405ed4@10.9.107.5@o2ib4:15/0 lens 2560/416 e 1 to 0 dl 1557170235 ref 1 fl Complete:/0/0 rc 0/0 May 06 12:33:32 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client e82f6d89-d034-c003-3b7e-5d3893bce363 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931ee688d400, cur 1557171212 expire 1557171062 last 1557170985 May 06 12:33:32 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages May 06 12:39:43 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 06 12:39:43 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 12:43:33 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client d2d138e7-a262-89db-c817-b962448e9f59 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930be635e000, cur 1557171813 expire 1557171663 last 1557171586 May 06 12:43:33 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 12:43:52 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 06 12:43:52 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 12:50:26 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client b18b7a53-be90-2f32-7eae-8c062d22c266 (at 10.9.108.11@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9315c335a000, cur 1557172226 expire 1557172076 last 1557171999 May 06 12:50:26 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 13:06:29 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 9fb03ea9-0aba-6ac3-73c3-4b8f9d8e2193 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93080bbda800, cur 1557173189 expire 1557173039 last 1557172962 May 06 13:06:29 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 13:07:20 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 06 13:07:20 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 13:07:45 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client f7bddeb5-cf2d-7b09-7625-5fdb36d4978d (at 10.8.27.14@o2ib6) in 186 seconds. I think it's dead, and I am evicting it. exp ffff931b17fbec00, cur 1557173265 expire 1557173115 last 1557173079 May 06 13:07:45 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 13:11:35 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 56f89491-06d9-7455-5e64-eb6ec04ed229 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93175528a000, cur 1557173495 expire 1557173345 last 1557173268 May 06 13:11:35 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 13:11:37 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 06 13:11:37 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 13:13:01 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 1067cc99-569b-6c01-e8cd-7bfbb2eea42a (at 10.8.14.5@o2ib6) May 06 13:13:01 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 13:13:47 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.14.6@o2ib6) May 06 13:13:47 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 13:14:24 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.14.7@o2ib6) May 06 13:14:24 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 13:14:39 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 770c7550-60ce-b00d-c0ae-73d52a13d9c0 (at 10.8.13.23@o2ib6) May 06 13:14:39 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 06 13:15:03 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.9.1@o2ib6) May 06 13:15:03 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages May 06 13:15:37 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.101.20@o2ib4) May 06 13:15:37 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 13:16:42 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client af4685da-89b6-0ca2-9a5a-3f2a8408f108 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931a037b6800, cur 1557173802 expire 1557173652 last 1557173575 May 06 13:16:42 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 13:16:51 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to e2f2980b-3f7f-4d9e-21af-41a9302e7720 (at 10.9.104.41@o2ib4) May 06 13:16:51 fir-md1-s2 kernel: Lustre: Skipped 7 previous similar messages May 06 13:19:03 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.1.17@o2ib6) May 06 13:19:03 fir-md1-s2 kernel: Lustre: Skipped 21 previous similar messages May 06 13:25:58 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 6e6fa5c3-eb03-711c-667c-55b18b265db5 (at 10.8.15.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff932b5e64a000, cur 1557174358 expire 1557174208 last 1557174131 May 06 13:25:58 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 13:44:10 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 06 13:44:10 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 06 14:00:20 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client f8994245-bed0-7e79-3bee-894973fb0d61 (at 10.8.1.10@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9315e42f0400, cur 1557176420 expire 1557176270 last 1557176193 May 06 14:00:20 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 06 14:03:04 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 42bc888f-d04c-9162-e1f7-365337eb9a74 (at 10.8.10.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff933079040c00, cur 1557176584 expire 1557176434 last 1557176357 May 06 14:03:04 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 14:18:04 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 29ba976c-0339-3250-27f8-d032be65fedc (at 10.8.17.3@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93105146e800, cur 1557177484 expire 1557177334 last 1557177257 May 06 14:18:04 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 06 14:29:16 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to d02451d9-5951-f2b9-906f-fc0a3df2d1d2 (at 10.8.14.4@o2ib6) May 06 14:29:16 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 14:32:14 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client bd62463f-9754-d7dd-c0ae-947965e600cc (at 10.8.14.2@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93202fe83400, cur 1557178334 expire 1557178184 last 1557178107 May 06 14:32:14 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 14:33:30 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 5d6d99c9-66e9-1b74-491b-ca51f1cc6e94 (at 10.8.13.24@o2ib6) in 219 seconds. I think it's dead, and I am evicting it. exp ffff930b76ffac00, cur 1557178410 expire 1557178260 last 1557178191 May 06 14:33:30 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 06 14:34:46 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client e7f1f31f-82d4-20eb-7b7d-3bdf55af6073 (at 10.8.14.6@o2ib6) in 214 seconds. I think it's dead, and I am evicting it. exp ffff931ec4343000, cur 1557178486 expire 1557178336 last 1557178272 May 06 14:34:46 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 06 14:51:57 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 1067cc99-569b-6c01-e8cd-7bfbb2eea42a (at 10.8.14.5@o2ib6) May 06 14:51:57 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 14:52:21 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.14.9@o2ib6) May 06 14:52:21 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 14:54:35 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.112.13@o2ib4) May 06 14:54:35 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 14:56:02 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 1aa535b7-e80b-2e8d-f6eb-863a659de9ae (at 10.9.108.54@o2ib4) May 06 14:56:02 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 14:56:49 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.112.17@o2ib4) May 06 14:56:49 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 14:57:29 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.107.33@o2ib4) May 06 14:57:29 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 06 14:57:55 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to (at 10.8.14.2@o2ib6) May 06 14:57:55 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.14.2@o2ib6) May 06 14:57:55 fir-md1-s2 kernel: Lustre: Skipped 2 previous similar messages May 06 15:00:01 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.105.12@o2ib4) May 06 15:00:01 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 06 15:04:09 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.105.4@o2ib4) May 06 15:04:09 fir-md1-s2 kernel: Lustre: Skipped 7 previous similar messages May 06 15:20:22 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.14@o2ib6) May 06 15:20:22 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 06 15:21:08 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.17.3@o2ib6) May 06 15:21:08 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 15:23:07 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.1.10@o2ib6) May 06 15:23:07 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 15:43:35 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 2f17b0c4-f886-f86e-900b-0be5810e3dce (at 10.8.1.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93279835a000, cur 1557182615 expire 1557182465 last 1557182388 May 06 15:43:35 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 16:15:32 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.13.24@o2ib6) May 06 16:15:32 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 16:16:15 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.14.6@o2ib6) May 06 16:16:15 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 16:26:41 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 3c74c277-aa1c-7cd2-ac13-ab73bee6c6b3 (at 10.8.1.16@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930732a96000, cur 1557185201 expire 1557185051 last 1557184974 May 06 16:26:41 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 16:33:24 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.23.3@o2ib6) May 06 16:33:24 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 06 16:34:31 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.15.4@o2ib6) May 06 16:34:31 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 16:35:11 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 3b67c5bf-e6bd-e83d-48ff-52eff19df7b4 (at 10.9.109.46@o2ib4) May 06 16:35:11 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 16:35:35 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 040e0374-d725-24d4-2f80-077b03acafdc (at 10.9.109.48@o2ib4) May 06 16:35:35 fir-md1-s2 kernel: Lustre: Skipped 7 previous similar messages May 06 16:37:56 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to c3c0b3ca-d32d-b2fd-5925-47b1f0ecab26 (at 10.9.109.36@o2ib4) May 06 16:37:56 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 16:39:05 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 41a282e1-5a8a-9807-65d1-c1bc5cec1068 (at 10.9.109.20@o2ib4) May 06 16:39:05 fir-md1-s2 kernel: Lustre: Skipped 53 previous similar messages May 06 16:46:23 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.5@o2ib6) May 06 16:46:23 fir-md1-s2 kernel: Lustre: Skipped 7 previous similar messages May 06 17:01:00 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 29aab465-7cf3-868c-3ff7-4901255a2788 (at 10.8.9.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff932013ab8800, cur 1557187260 expire 1557187110 last 1557187033 May 06 17:01:00 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 17:24:21 fir-md1-s2 kernel: LNetError: 121169:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 06 18:38:31 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client a63699e6-d0e7-040a-1db9-cc9068a34f0c (at 10.8.20.5@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930be8ba2800, cur 1557193111 expire 1557192961 last 1557192884 May 06 18:38:31 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 19:07:24 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 718e2070-f49a-08c6-62d7-7d002d5b938d (at 10.9.104.69@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93163f79dc00, cur 1557194844 expire 1557194694 last 1557194617 May 06 19:07:24 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 19:07:26 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 718e2070-f49a-08c6-62d7-7d002d5b938d (at 10.9.104.69@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93278cebc800, cur 1557194846 expire 1557194696 last 1557194619 May 06 19:14:29 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client a2b26b96-a929-8f63-cbb2-ebc7ce2d142e (at 10.8.1.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93272ba23c00, cur 1557195269 expire 1557195119 last 1557195042 May 06 20:08:29 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 48bf043f-4a40-2dca-0013-959a568eb747 (at 10.8.1.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93106a6c3c00, cur 1557198509 expire 1557198359 last 1557198282 May 06 20:08:29 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 20:10:53 fir-md1-s2 kernel: LNetError: 121172:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 06 20:23:17 fir-md1-s2 kernel: LNetError: 121177:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 06 20:28:16 fir-md1-s2 kernel: LNetError: 121175:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 06 20:43:00 fir-md1-s2 kernel: LNetError: 121177:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 06 20:43:49 fir-md1-s2 kernel: LNetError: 121177:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 06 20:46:58 fir-md1-s2 kernel: LNetError: 121180:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 06 20:50:37 fir-md1-s2 kernel: LNetError: 121177:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 06 20:56:51 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client d3ad20a5-81a2-915a-58a3-1542c85784cf (at 10.9.107.53@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff932e326c9c00, cur 1557201411 expire 1557201261 last 1557201184 May 06 20:56:51 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 20:59:54 fir-md1-s2 kernel: LNetError: 121174:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 06 21:01:41 fir-md1-s2 kernel: LNetError: 121176:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 06 21:05:51 fir-md1-s2 kernel: LNetError: 121174:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 06 21:08:45 fir-md1-s2 kernel: LNetError: 121178:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 06 21:15:00 fir-md1-s2 kernel: LNetError: 121174:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 06 21:21:54 fir-md1-s2 kernel: LNetError: 121180:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 06 21:30:08 fir-md1-s2 kernel: LNetError: 121174:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 06 21:30:08 fir-md1-s2 kernel: LNetError: 121174:0:(lib-msg.c:811:lnet_is_health_check()) Skipped 1 previous similar message May 06 21:42:13 fir-md1-s2 kernel: LNetError: 121179:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 06 21:46:23 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 77874b32-e186-7da5-3231-7675bcd6ec17 (at 10.9.102.6@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931761bd8c00, cur 1557204383 expire 1557204233 last 1557204156 May 06 21:46:23 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 22:18:38 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client a99a0fd8-be49-f5cf-71c5-91c5b8a9ee37 (at 10.9.102.33@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930faea7e800, cur 1557206318 expire 1557206168 last 1557206091 May 06 22:18:38 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 22:27:49 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 2157fc51-ad2b-af56-95eb-ebec24704c29 (at 10.8.1.5@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff932013ab8400, cur 1557206869 expire 1557206719 last 1557206642 May 06 22:27:49 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 22:37:18 fir-md1-s2 kernel: LNetError: 121178:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 06 22:53:27 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 42c27884-7f5f-6ccb-073b-56ac764ed5ce (at 10.9.103.12@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931b47377400, cur 1557208407 expire 1557208257 last 1557208180 May 06 22:53:27 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 23:05:05 fir-md1-s2 kernel: LNetError: 121180:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 06 23:24:30 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client d0c39c1f-f040-e7be-8021-8fbd0c8d72d2 (at 10.8.11.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93272ba26400, cur 1557210270 expire 1557210120 last 1557210043 May 06 23:24:30 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 23:24:34 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client d0c39c1f-f040-e7be-8021-8fbd0c8d72d2 (at 10.8.11.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931fe2f5e400, cur 1557210274 expire 1557210124 last 1557210047 May 06 23:39:39 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 1e35dbc2-c2aa-b6ce-6f18-8939c99fde0f (at 10.8.30.5@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930796283c00, cur 1557211179 expire 1557211029 last 1557210952 May 06 23:39:40 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 1e35dbc2-c2aa-b6ce-6f18-8939c99fde0f (at 10.8.30.5@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931b20ed4000, cur 1557211180 expire 1557211030 last 1557210953 May 06 23:58:01 fir-md1-s2 kernel: Lustre: 122648:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557212274/real 1557212274] req@ffff930059df8900 x1632326715740688/t0(0) o104->fir-MDT0001@10.8.11.3@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1557212281 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 06 23:58:19 fir-md1-s2 kernel: Lustre: 122646:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff930f90c75a00 x1631900899550240/t0(0) o36->9a8bc7f0-674a-721d-c255-50108001b9f0@10.8.0.66@o2ib6:24/0 lens 496/448 e 0 to 0 dl 1557212304 ref 2 fl Interpret:/0/0 rc 0/0 May 06 23:58:22 fir-md1-s2 kernel: Lustre: 122648:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557212295/real 1557212295] req@ffff930059df8900 x1632326715740688/t0(0) o104->fir-MDT0001@10.8.11.3@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1557212302 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 06 23:58:22 fir-md1-s2 kernel: Lustre: 122648:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages May 06 23:58:25 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 9a8bc7f0-674a-721d-c255-50108001b9f0 (at 10.8.0.66@o2ib6) reconnecting May 06 23:58:25 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.0.66@o2ib6) May 06 23:58:25 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages May 06 23:58:50 fir-md1-s2 kernel: Lustre: 122350:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-28), not sending early reply req@ffff93194e798300 x1631900899585312/t0(0) o101->9a8bc7f0-674a-721d-c255-50108001b9f0@10.8.0.66@o2ib6:25/0 lens 576/3264 e 0 to 0 dl 1557212335 ref 2 fl Interpret:/0/0 rc 0/0 May 06 23:58:56 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 9a8bc7f0-674a-721d-c255-50108001b9f0 (at 10.8.0.66@o2ib6) reconnecting May 06 23:59:04 fir-md1-s2 kernel: Lustre: 122648:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557212337/real 1557212337] req@ffff930059df8900 x1632326715740688/t0(0) o104->fir-MDT0001@10.8.11.3@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1557212344 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 06 23:59:04 fir-md1-s2 kernel: Lustre: 122648:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 11 previous similar messages May 06 23:59:27 fir-md1-s2 kernel: Lustre: 122281:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff932753bc6900 x1631900899815952/t0(0) o101->9a8bc7f0-674a-721d-c255-50108001b9f0@10.8.0.66@o2ib6:2/0 lens 576/3264 e 0 to 0 dl 1557212372 ref 2 fl Interpret:/0/0 rc 0/0 May 06 23:59:28 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 9a8bc7f0-674a-721d-c255-50108001b9f0 (at 10.8.0.66@o2ib6) reconnecting May 06 23:59:28 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.0.66@o2ib6) May 06 23:59:28 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 06 23:59:32 fir-md1-s2 kernel: LustreError: 122606:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1557212282, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0001_UUID lock: ffff9327732ead00/0x1c35e9e440b2e7f2 lrc: 3/1,0 mode: --/PR res: [0x240025f1a:0x2473:0x0].0x0 bits 0x13/0x0 rrc: 8 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 122606 timeout: 0 lvb_type: 0 May 06 23:59:59 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 9a8bc7f0-674a-721d-c255-50108001b9f0 (at 10.8.0.66@o2ib6) reconnecting May 07 00:00:22 fir-md1-s2 kernel: Lustre: 122648:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557212414/real 1557212414] req@ffff930059df8900 x1632326715740688/t0(0) o104->fir-MDT0001@10.8.11.3@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1557212421 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 07 00:00:22 fir-md1-s2 kernel: Lustre: 122648:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 21 previous similar messages May 07 00:00:27 fir-md1-s2 kernel: Lustre: 122222:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff9315beecd700 x1631900900065184/t0(0) o101->9a8bc7f0-674a-721d-c255-50108001b9f0@10.8.0.66@o2ib6:2/0 lens 576/3264 e 0 to 0 dl 1557212432 ref 2 fl Interpret:/0/0 rc 0/0 May 07 00:00:29 fir-md1-s2 kernel: LustreError: 122648:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.11.3@o2ib6) failed to reply to blocking AST (req@ffff930059df8900 x1632326715740688 status 0 rc -110), evict it ns: mdt-fir-MDT0001_UUID lock: ffff931804208b40/0x1c35e9deb379802f lrc: 4/0,0 mode: PR/PR res: [0x240025f1a:0x2473:0x0].0x0 bits 0x13/0x0 rrc: 12 type: IBT flags: 0x60200400000020 nid: 10.8.11.3@o2ib6 remote: 0x75386a23e78b35f8 expref: 15 pid: 121467 timeout: 737809 lvb_type: 0 May 07 00:00:29 fir-md1-s2 kernel: LustreError: 138-a: fir-MDT0001: A client on nid 10.8.11.3@o2ib6 was evicted due to a lock blocking callback time out: rc -110 May 07 00:00:29 fir-md1-s2 kernel: LustreError: 121399:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 155s: evicting client at 10.8.11.3@o2ib6 ns: mdt-fir-MDT0001_UUID lock: ffff931804208b40/0x1c35e9deb379802f lrc: 3/0,0 mode: PR/PR res: [0x240025f1a:0x2473:0x0].0x0 bits 0x13/0x0 rrc: 12 type: IBT flags: 0x60200400000020 nid: 10.8.11.3@o2ib6 remote: 0x75386a23e78b35f8 expref: 16 pid: 121467 timeout: 0 lvb_type: 0 May 07 00:00:52 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 4d220cc4-d3e3-4651-12cb-6e4bd675cf57 (at 10.8.22.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93075db4fc00, cur 1557212452 expire 1557212302 last 1557212225 May 07 00:56:14 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 02143cf9-a33a-721d-1d20-5e9ca2a6e670 (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93076d251000, cur 1557215774 expire 1557215624 last 1557215547 May 07 00:56:14 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 00:56:16 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 02143cf9-a33a-721d-1d20-5e9ca2a6e670 (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931c7b209000, cur 1557215776 expire 1557215626 last 1557215549 May 07 00:57:39 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.11.9@o2ib6) May 07 00:57:39 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 02:02:40 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 8291e59c-ac50-4b3f-bf9b-4a485234c804 (at 10.9.114.5@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930c82c74400, cur 1557219760 expire 1557219610 last 1557219533 May 07 02:03:04 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 8291e59c-ac50-4b3f-bf9b-4a485234c804 (at 10.9.114.5@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93370f227000, cur 1557219784 expire 1557219634 last 1557219557 May 07 02:08:37 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.114.5@o2ib4) May 07 02:08:37 fir-md1-s2 kernel: Lustre: Skipped 2 previous similar messages May 07 02:24:34 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client b9d08568-0dfb-b72b-5b71-42d51a8d04c5 (at 10.9.114.5@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff932a63211000, cur 1557221074 expire 1557220924 last 1557220847 May 07 02:27:36 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 00e1f138-c014-8d58-b144-b69a54760ae8 (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff932de2e40800, cur 1557221256 expire 1557221106 last 1557221029 May 07 02:27:36 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 02:29:20 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.11.9@o2ib6) May 07 02:29:20 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 02:31:26 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.114.5@o2ib4) May 07 02:31:26 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 02:49:31 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 32e552d9-553a-99e0-f22a-c15ac116b169 (at 10.9.108.5@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9315f5311000, cur 1557222571 expire 1557222421 last 1557222344 May 07 02:49:31 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 02:49:34 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 32e552d9-553a-99e0-f22a-c15ac116b169 (at 10.9.108.5@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930be8ba1000, cur 1557222574 expire 1557222424 last 1557222347 May 07 03:03:40 fir-md1-s2 kernel: LNetError: 121178:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 07 03:19:35 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 02ea929c-a46f-c52b-c40b-0790522b7de6 (at 10.8.1.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931046f4e000, cur 1557224375 expire 1557224225 last 1557224148 May 07 03:20:28 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.1.29@o2ib6) May 07 03:20:28 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 03:26:32 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 385ad677-c9f9-e2fe-015c-c152ce073ee0 (at 10.9.102.57@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930765be5800, cur 1557224792 expire 1557224642 last 1557224565 May 07 03:26:32 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 03:44:17 fir-md1-s2 kernel: LNetError: 121179:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 07 03:51:46 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client d4b920a7-fa43-47d1-36f8-6c5e2715eb6d (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93004ef15000, cur 1557226306 expire 1557226156 last 1557226079 May 07 03:51:46 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 03:52:07 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client d4b920a7-fa43-47d1-36f8-6c5e2715eb6d (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931762260c00, cur 1557226327 expire 1557226177 last 1557226100 May 07 03:54:46 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 07 03:54:46 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 04:01:31 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 08876076-308b-16ed-fef6-3539ec3414c1 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9324d53ea000, cur 1557226891 expire 1557226741 last 1557226664 May 07 04:01:47 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 07 04:01:47 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 04:06:35 fir-md1-s2 kernel: LNetError: 121173:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 07 04:08:57 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 2f3c3b21-f910-699c-fb11-7d9409e39ccb (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931ab77b6000, cur 1557227337 expire 1557227187 last 1557227110 May 07 04:08:57 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 04:09:17 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 07 04:09:17 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 04:17:17 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 2ce2ff3c-6f50-7561-f990-2838c405e040 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93176be05400, cur 1557227837 expire 1557227687 last 1557227610 May 07 04:17:17 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 04:17:52 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 07 04:17:52 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 04:24:37 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client b819a236-3c20-45c6-c515-a5f31f7fdb5f (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9318c3b77400, cur 1557228277 expire 1557228127 last 1557228050 May 07 04:24:37 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 04:24:46 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 07 04:24:46 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 04:28:09 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 07 04:28:09 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 04:28:45 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 5e0de70d-f314-c595-2c05-b0f5e1db8013 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931d7e0a9000, cur 1557228525 expire 1557228375 last 1557228298 May 07 04:28:45 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 04:34:24 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 74fb56c5-8bc6-38a9-8624-788945b7232f (at 10.9.115.2@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93207c0f7800, cur 1557228864 expire 1557228714 last 1557228637 May 07 04:34:24 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 04:34:50 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 07 04:34:50 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 04:35:07 fir-md1-s2 kernel: LNetError: 121179:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 07 04:42:25 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client a8459b8b-f733-b042-6d1e-7ebe1321b89f (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9316133d2400, cur 1557229345 expire 1557229195 last 1557229118 May 07 04:42:25 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 07 04:42:48 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 07 04:42:48 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 04:47:27 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 72f7533c-a81d-633d-f6a2-6cb31c1ace4e (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff932a8c218800, cur 1557229647 expire 1557229497 last 1557229420 May 07 04:47:27 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 04:48:01 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 07 04:48:01 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 04:54:21 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client f3b84fa8-506b-6fb1-c444-1dbe27e3ce68 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931978b5c000, cur 1557230061 expire 1557229911 last 1557229834 May 07 04:54:21 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 04:57:50 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.11.9@o2ib6) May 07 04:57:50 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to (at 10.8.11.9@o2ib6) May 07 05:03:13 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client d6b92241-7105-f38f-e007-be4fc19fe9ee (at 10.8.11.9@o2ib6) in 170 seconds. I think it's dead, and I am evicting it. exp ffff93204ac86c00, cur 1557230593 expire 1557230443 last 1557230423 May 07 05:03:13 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages May 07 05:05:10 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.11.9@o2ib6) May 07 05:05:10 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages May 07 05:14:32 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.11.9@o2ib6) May 07 05:14:32 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 05:17:06 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 17240575-f5ba-c466-4ce0-cfd1b3054463 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93084b918800, cur 1557231426 expire 1557231276 last 1557231199 May 07 05:17:06 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 07 05:28:30 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 7dbd9b09-ecbf-56f3-78ec-84aa8794d98c (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931b7fb6f000, cur 1557232110 expire 1557231960 last 1557231883 May 07 05:28:30 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 07 05:29:22 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 07 05:29:22 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 07 05:45:43 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 8585c42f-96ef-d59f-ea1a-9cf9f34c62ca (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931d137ecc00, cur 1557233143 expire 1557232993 last 1557232916 May 07 05:45:43 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 07 05:46:42 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 07 05:46:42 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 07 05:56:48 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 57f5ff3d-ed7e-8bd2-f4f8-c6cbe70ab7a7 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff932fb6e32000, cur 1557233808 expire 1557233658 last 1557233581 May 07 05:56:48 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 07 05:57:46 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 07 05:57:46 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 07 05:58:18 fir-md1-s2 kernel: LNetError: 121180:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 07 06:09:11 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 07 06:09:11 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 07 06:09:36 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client f931f4dd-24d0-4e09-5490-0594b1a39382 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff933d06f1f400, cur 1557234576 expire 1557234426 last 1557234349 May 07 06:09:36 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 07 06:22:42 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client ba0442bc-14d6-8675-c23a-8dc65b1fa6a6 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff932bfaf28c00, cur 1557235362 expire 1557235212 last 1557235135 May 07 06:22:42 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 07 06:23:31 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 07 06:23:31 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 07 06:35:38 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 56121a8a-fa26-ef94-bb8a-0996ba74499b (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9307494a4c00, cur 1557236138 expire 1557235988 last 1557235911 May 07 06:35:38 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 07 06:36:46 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 07 06:36:46 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 07 06:51:56 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 07 06:51:56 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 07 06:52:25 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 0ceb4a36-d485-0e50-ab89-95eb861c4946 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931e0e7dec00, cur 1557237145 expire 1557236995 last 1557236918 May 07 06:52:25 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 07 07:07:09 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 833403f0-6f37-5af2-6362-829570a40278 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93145ab9fc00, cur 1557238029 expire 1557237879 last 1557237802 May 07 07:07:09 fir-md1-s2 kernel: Lustre: Skipped 7 previous similar messages May 07 07:09:01 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 07 07:09:01 fir-md1-s2 kernel: Lustre: Skipped 7 previous similar messages May 07 07:23:24 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client bfdafa74-3349-cb55-da4a-642f21523763 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9327a2f05c00, cur 1557239004 expire 1557238854 last 1557238777 May 07 07:23:24 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages May 07 07:23:48 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 07 07:23:48 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 07 07:35:26 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 03736220-e9e7-7d88-5819-8709177dceb7 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931f3df8cc00, cur 1557239726 expire 1557239576 last 1557239499 May 07 07:35:26 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages May 07 07:36:25 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 07 07:36:25 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages May 07 07:47:53 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client cd42bd14-63e7-d4ae-3f6d-cfdf4625d90c (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931b6177c400, cur 1557240473 expire 1557240323 last 1557240246 May 07 07:47:53 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages May 07 07:48:30 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 07 07:48:30 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages May 07 08:01:16 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client bda33544-34db-f855-cae6-56a4d4826687 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930d45768c00, cur 1557241276 expire 1557241126 last 1557241049 May 07 08:01:16 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 07 08:01:44 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 07 08:01:44 fir-md1-s2 kernel: Lustre: Skipped 2 previous similar messages May 07 08:42:15 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 7676a4e6-cd18-5e4a-0855-17e7cc11eb46 (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931a6075a400, cur 1557243735 expire 1557243585 last 1557243508 May 07 08:42:15 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 07 08:49:38 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.1.15@o2ib6) May 07 08:49:38 fir-md1-s2 kernel: Lustre: Skipped 2 previous similar messages May 07 08:50:59 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 30831171-7bbb-a8f0-d2eb-e9a35e74d0e8 (at 10.9.114.5@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9306a689f000, cur 1557244259 expire 1557244109 last 1557244032 May 07 08:50:59 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 08:55:15 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client fbe55575-ffec-483a-9f4b-6c9765813e37 (at 10.9.112.1@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93207b956c00, cur 1557244515 expire 1557244365 last 1557244288 May 07 08:55:15 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 08:56:04 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.114.5@o2ib4) May 07 08:56:04 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 09:03:49 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.108.5@o2ib4) May 07 09:03:49 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 09:05:58 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client c9336415-f6c5-883d-24d5-6a752dd59401 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff932de62de000, cur 1557245158 expire 1557245008 last 1557244931 May 07 09:05:58 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 09:09:45 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.23.14@o2ib6) May 07 09:09:45 fir-md1-s2 kernel: Lustre: Skipped 7 previous similar messages May 07 09:15:51 fir-md1-s2 kernel: LNetError: 121178:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 07 09:18:35 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 08569354-3ed8-bb43-01f8-a20f1ea467bc (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930da9eb5400, cur 1557245915 expire 1557245765 last 1557245688 May 07 09:18:35 fir-md1-s2 kernel: Lustre: Skipped 7 previous similar messages May 07 09:28:40 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.23.14@o2ib6) May 07 09:28:40 fir-md1-s2 kernel: Lustre: Skipped 39 previous similar messages May 07 09:33:44 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 7ac77253-ddd7-7c91-beea-fd5dfde008b3 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931055d27400, cur 1557246824 expire 1557246674 last 1557246597 May 07 09:33:44 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages May 07 09:43:08 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.114.5@o2ib4) May 07 09:43:08 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages May 07 09:50:41 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 0bba6657-cd54-9ad7-b9ba-120aef5c9cad (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9316a7697c00, cur 1557247841 expire 1557247691 last 1557247614 May 07 09:50:41 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 07 10:00:10 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 07 10:00:10 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 07 10:09:16 fir-md1-s2 kernel: LNetError: 121178:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 07 10:13:37 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client e0c7c099-87a1-d05e-fd38-39feb1f5c5a1 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff932e57732400, cur 1557249217 expire 1557249067 last 1557248990 May 07 10:13:37 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 07 10:13:51 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.23.14@o2ib6) May 07 10:13:51 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages May 07 10:17:41 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client f6a81a90-ae4f-0e51-e817-632d30c780f4 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931ddd23ec00, cur 1557249461 expire 1557249311 last 1557249234 May 07 10:17:41 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 10:21:57 fir-md1-s2 kernel: LNetError: 121177:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 07 10:24:09 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client e8caa1e0-95d9-b020-e242-0b07ce07b35c (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff933702b42400, cur 1557249849 expire 1557249699 last 1557249622 May 07 10:24:09 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 10:24:49 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 07 10:24:49 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 07 10:37:27 fir-md1-s2 kernel: LNetError: 121179:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 07 10:42:34 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 1ef4bd81-95f3-029a-2025-6401b9a7961e (at 10.8.13.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931f18d34800, cur 1557250954 expire 1557250804 last 1557250727 May 07 10:42:34 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages May 07 10:44:24 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 07 10:44:24 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages May 07 10:54:05 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 40f9c724-6bc8-1588-d4af-684773732c28 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931821f38800, cur 1557251645 expire 1557251495 last 1557251418 May 07 10:54:05 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 07 10:54:28 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 07 10:54:28 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 07 11:00:23 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 30eece9e-b079-bed9-0d97-76b9b6aa4aa3 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931704b57800, cur 1557252023 expire 1557251873 last 1557251796 May 07 11:00:23 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 11:01:39 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 06a346a7-13b1-aaef-624b-fa1575cee4b9 (at 10.8.27.23@o2ib6) in 186 seconds. I think it's dead, and I am evicting it. exp ffff932aa066c000, cur 1557252099 expire 1557251949 last 1557251913 May 07 11:01:39 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 11:07:10 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 3f97eb9f-1a20-7179-6229-0e033571f763 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff932fb271bc00, cur 1557252430 expire 1557252280 last 1557252203 May 07 11:07:10 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 11:08:03 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 07 11:08:03 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages May 07 11:08:35 fir-md1-s2 kernel: Lustre: 122151:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557252508/real 1557252508] req@ffff933058f0f200 x1632334090666176/t0(0) o104->fir-MDT0001@10.9.101.49@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1557252515 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 07 11:08:35 fir-md1-s2 kernel: Lustre: 122151:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages May 07 11:08:44 fir-md1-s2 kernel: Lustre: 122222:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff931310eadd00 x1632821712021424/t0(0) o101->dcf713a7-e466-63ba-d7fd-f8bd360f2bc9@10.9.108.54@o2ib4:19/0 lens 480/568 e 1 to 0 dl 1557252529 ref 2 fl Interpret:/0/0 rc 0/0 May 07 11:08:44 fir-md1-s2 kernel: Lustre: 122222:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message May 07 11:08:44 fir-md1-s2 kernel: Lustre: 122150:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff932f2b430c00 x1631548781639616/t0(0) o101->ff5adeab-1158-6b77-bf0c-cf0414b48364@10.9.101.55@o2ib4:19/0 lens 480/568 e 1 to 0 dl 1557252529 ref 2 fl Interpret:/0/0 rc 0/0 May 07 11:08:45 fir-md1-s2 kernel: Lustre: 122069:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff9329adbcec00 x1631551421880096/t0(0) o101->773c3f88-b988-c40f-0857-260c7cbe8aa4@10.9.101.29@o2ib4:20/0 lens 480/568 e 1 to 0 dl 1557252530 ref 2 fl Interpret:/0/0 rc 0/0 May 07 11:08:45 fir-md1-s2 kernel: Lustre: 122069:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages May 07 11:08:50 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client ff5adeab-1158-6b77-bf0c-cf0414b48364 (at 10.9.101.55@o2ib4) reconnecting May 07 11:08:51 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client e891cc28-9c10-be1b-29fe-00592513d891 (at 10.9.101.41@o2ib4) reconnecting May 07 11:08:51 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 07 11:08:53 fir-md1-s2 kernel: Lustre: 122730:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff932fbc202d00 x1631562889344048/t0(0) o101->d70a6b22-11d0-75ad-65d1-77429eca863f@10.9.108.51@o2ib4:28/0 lens 480/568 e 0 to 0 dl 1557252538 ref 2 fl Interpret:/0/0 rc 0/0 May 07 11:08:54 fir-md1-s2 kernel: Lustre: 122720:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557252527/real 1557252527] req@ffff932d98292d00 x1632334091451008/t0(0) o106->fir-MDT0001@10.9.101.49@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1557252534 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 07 11:08:54 fir-md1-s2 kernel: Lustre: 122720:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 68 previous similar messages May 07 11:08:57 fir-md1-s2 kernel: Lustre: 121990:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff932811e66900 x1631675448390608/t0(0) o101->2d384d58-fd4c-f6d6-342b-6f9f296484e1@10.9.101.46@o2ib4:2/0 lens 480/568 e 0 to 0 dl 1557252542 ref 2 fl Interpret:/0/0 rc 0/0 May 07 11:08:57 fir-md1-s2 kernel: Lustre: 121990:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 4 previous similar messages May 07 11:08:59 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client d70a6b22-11d0-75ad-65d1-77429eca863f (at 10.9.108.51@o2ib4) reconnecting May 07 11:09:03 fir-md1-s2 kernel: LustreError: 122151:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.9.101.49@o2ib4) failed to reply to blocking AST (req@ffff933058f0f200 x1632334090666176 status 0 rc -110), evict it ns: mdt-fir-MDT0001_UUID lock: ffff9316d9219f80/0x1c35e9ec0b05f6b2 lrc: 4/0,0 mode: PR/PR res: [0x240025f5f:0x1a:0x0].0x0 bits 0x40/0x0 rrc: 63 type: IBT flags: 0x60000400000020 nid: 10.9.101.49@o2ib4 remote: 0xcc134f75cc13639 expref: 2204 pid: 122091 timeout: 777804 lvb_type: 0 May 07 11:09:03 fir-md1-s2 kernel: LustreError: 122151:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) Skipped 1 previous similar message May 07 11:09:03 fir-md1-s2 kernel: LustreError: 138-a: fir-MDT0001: A client on nid 10.9.101.49@o2ib4 was evicted due to a lock blocking callback time out: rc -110 May 07 11:09:03 fir-md1-s2 kernel: LustreError: Skipped 1 previous similar message May 07 11:09:03 fir-md1-s2 kernel: LustreError: 121399:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 35s: evicting client at 10.9.101.49@o2ib4 ns: mdt-fir-MDT0001_UUID lock: ffff9316d9219f80/0x1c35e9ec0b05f6b2 lrc: 3/0,0 mode: PR/PR res: [0x240025f5f:0x1a:0x0].0x0 bits 0x40/0x0 rrc: 63 type: IBT flags: 0x60000400000020 nid: 10.9.101.49@o2ib4 remote: 0xcc134f75cc13639 expref: 2205 pid: 122091 timeout: 0 lvb_type: 0 May 07 11:09:03 fir-md1-s2 kernel: LustreError: 121399:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message May 07 11:11:43 fir-md1-s2 kernel: LNetError: 121176:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 07 11:12:13 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 9134d6ed-2ca0-ddb8-9f7a-4d783ed8d98e (at 10.9.101.49@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930712b3f400, cur 1557252733 expire 1557252583 last 1557252506 May 07 11:12:13 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 07 11:13:29 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 153ba73b-12e9-b8bc-3b0a-f9a29f253b77 (at 10.8.26.4@o2ib6) in 173 seconds. I think it's dead, and I am evicting it. exp ffff931964ea0c00, cur 1557252809 expire 1557252659 last 1557252636 May 07 11:15:46 fir-md1-s2 kernel: LNetError: 121179:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 07 11:17:55 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client ba256afc-5e3e-5541-48f0-98fd49600415 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930e26fd7000, cur 1557253075 expire 1557252925 last 1557252848 May 07 11:17:55 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 11:18:26 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.23@o2ib6) May 07 11:18:26 fir-md1-s2 kernel: Lustre: Skipped 16 previous similar messages May 07 11:20:16 fir-md1-s2 kernel: Lustre: 122161:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557253209/real 1557253209] req@ffff93190aa38900 x1632334207198272/t0(0) o106->fir-MDT0003@10.8.7.33@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557253216 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 07 11:20:16 fir-md1-s2 kernel: Lustre: 122161:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 45 previous similar messages May 07 11:20:23 fir-md1-s2 kernel: Lustre: 122161:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557253216/real 1557253216] req@ffff93190aa38900 x1632334207198272/t0(0) o106->fir-MDT0003@10.8.7.33@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557253223 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 07 11:20:23 fir-md1-s2 kernel: Lustre: 122161:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 12 previous similar messages May 07 11:20:24 fir-md1-s2 kernel: Lustre: 122286:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff93151fb50f00 x1632898153206880/t0(0) o101->f4f411cb-15ce-1f07-9f23-424f6c4aa426@10.8.27.23@o2ib6:29/0 lens 480/568 e 1 to 0 dl 1557253229 ref 2 fl Interpret:/0/0 rc 0/0 May 07 11:20:24 fir-md1-s2 kernel: Lustre: 122286:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 10 previous similar messages May 07 11:20:25 fir-md1-s2 kernel: Lustre: 9068:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff932d9331c500 x1632898153209824/t0(0) o101->f4f411cb-15ce-1f07-9f23-424f6c4aa426@10.8.27.23@o2ib6:0/0 lens 480/568 e 1 to 0 dl 1557253230 ref 2 fl Interpret:/0/0 rc 0/0 May 07 11:20:25 fir-md1-s2 kernel: Lustre: 9068:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 10 previous similar messages May 07 11:20:30 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client f4f411cb-15ce-1f07-9f23-424f6c4aa426 (at 10.8.27.23@o2ib6) reconnecting May 07 11:20:30 fir-md1-s2 kernel: Lustre: Skipped 6 previous similar messages May 07 11:20:31 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 9a46a636-d807-725a-1806-a4c05a6a1620 (at 10.8.18.24@o2ib6) reconnecting May 07 11:20:31 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 07 11:20:35 fir-md1-s2 kernel: Lustre: 122140:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557253228/real 1557253228] req@ffff9311fd69a100 x1632334209118720/t0(0) o106->fir-MDT0003@10.8.18.18@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557253235 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 07 11:20:35 fir-md1-s2 kernel: Lustre: 122140:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 22 previous similar messages May 07 11:20:51 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client f4f411cb-15ce-1f07-9f23-424f6c4aa426 (at 10.8.27.23@o2ib6) reconnecting May 07 11:20:51 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 11:20:54 fir-md1-s2 kernel: Lustre: 122172:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557253246/real 1557253246] req@ffff9315f0bae300 x1632334213303664/t0(0) o106->fir-MDT0003@10.8.18.14@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557253253 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 07 11:20:54 fir-md1-s2 kernel: Lustre: 122172:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 63 previous similar messages May 07 11:20:54 fir-md1-s2 kernel: Lustre: 122051:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff931357220300 x1631547890163472/t0(0) o101->9fbdcf23-b7ca-0a3d-0172-b65213b0edff@10.8.18.21@o2ib6:29/0 lens 480/568 e 0 to 0 dl 1557253259 ref 2 fl Interpret:/0/0 rc 0/0 May 07 11:20:59 fir-md1-s2 kernel: LustreError: 122647:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff93150a6e8300 x1632334215215920/t0(0) o104->fir-MDT0003@10.8.17.20@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 May 07 11:20:59 fir-md1-s2 kernel: LustreError: 122647:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 2 previous similar messages May 07 11:21:00 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client cf596cc9-7297-c8cd-7acc-7023bcdbec89 (at 10.8.18.23@o2ib6) reconnecting May 07 11:21:00 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages May 07 11:23:33 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client cea7a43b-aa43-8a60-b35d-3d2743c001ac (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93193330dc00, cur 1557253413 expire 1557253263 last 1557253186 May 07 11:23:33 fir-md1-s2 kernel: Lustre: Skipped 15 previous similar messages May 07 11:32:13 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 07 11:32:13 fir-md1-s2 kernel: Lustre: Skipped 18 previous similar messages May 07 11:33:34 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client f4f411cb-15ce-1f07-9f23-424f6c4aa426 (at 10.8.27.23@o2ib6) in 186 seconds. I think it's dead, and I am evicting it. exp ffff9318f5f00c00, cur 1557254014 expire 1557253864 last 1557253828 May 07 11:33:34 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 07 11:42:49 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.23.14@o2ib6) May 07 11:42:49 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages May 07 11:43:50 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client e8fb0979-89ce-005c-84f4-115c1a5d2ebe (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9300505ce400, cur 1557254630 expire 1557254480 last 1557254403 May 07 11:43:50 fir-md1-s2 kernel: Lustre: Skipped 7 previous similar messages May 07 11:46:26 fir-md1-s2 kernel: Lustre: 9070:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557254779/real 1557254779] req@ffff932f01bbd700 x1632334463842752/t0(0) o104->fir-MDT0001@10.9.108.4@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1557254786 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 07 11:46:26 fir-md1-s2 kernel: Lustre: 9070:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 35 previous similar messages May 07 11:46:33 fir-md1-s2 kernel: Lustre: 9070:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557254786/real 1557254786] req@ffff932f01bbd700 x1632334463842752/t0(0) o104->fir-MDT0001@10.9.108.4@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1557254793 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 07 11:46:34 fir-md1-s2 kernel: Lustre: 122069:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff9329bea65700 x1632899171482672/t0(0) o36->88e01d34-ce2d-ec7b-0b7d-9a143101b634@10.8.27.23@o2ib6:9/0 lens 496/2888 e 1 to 0 dl 1557254799 ref 2 fl Interpret:/0/0 rc 0/0 May 07 11:46:34 fir-md1-s2 kernel: Lustre: 122069:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 7 previous similar messages May 07 11:46:40 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 88e01d34-ce2d-ec7b-0b7d-9a143101b634 (at 10.8.27.23@o2ib6) reconnecting May 07 11:46:40 fir-md1-s2 kernel: Lustre: Skipped 2 previous similar messages May 07 11:46:47 fir-md1-s2 kernel: Lustre: 9070:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557254800/real 1557254800] req@ffff932f01bbd700 x1632334463842752/t0(0) o104->fir-MDT0001@10.9.108.4@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1557254807 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 07 11:46:47 fir-md1-s2 kernel: Lustre: 9070:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message May 07 11:46:54 fir-md1-s2 kernel: LustreError: 9070:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.9.108.4@o2ib4) failed to reply to blocking AST (req@ffff932f01bbd700 x1632334463842752 status 0 rc -110), evict it ns: mdt-fir-MDT0001_UUID lock: ffff931f9bf39680/0x1c35e9ec6a270c61 lrc: 4/0,0 mode: PR/PR res: [0x240025f32:0xa23:0x0].0x0 bits 0x13/0x0 rrc: 3 type: IBT flags: 0x60200400000020 nid: 10.9.108.4@o2ib4 remote: 0x47be192387ddf37 expref: 24 pid: 122009 timeout: 780075 lvb_type: 0 May 07 11:46:54 fir-md1-s2 kernel: LustreError: 138-a: fir-MDT0001: A client on nid 10.9.108.4@o2ib4 was evicted due to a lock blocking callback time out: rc -110 May 07 11:46:54 fir-md1-s2 kernel: LustreError: 121399:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 35s: evicting client at 10.9.108.4@o2ib4 ns: mdt-fir-MDT0001_UUID lock: ffff931f9bf39680/0x1c35e9ec6a270c61 lrc: 3/0,0 mode: PR/PR res: [0x240025f32:0xa23:0x0].0x0 bits 0x13/0x0 rrc: 3 type: IBT flags: 0x60200400000020 nid: 10.9.108.4@o2ib4 remote: 0x47be192387ddf37 expref: 25 pid: 122009 timeout: 0 lvb_type: 0 May 07 11:56:12 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 4bf91ef2-656c-56f2-bc8d-6317a85073ff (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931b6ee31000, cur 1557255372 expire 1557255222 last 1557255145 May 07 11:56:12 fir-md1-s2 kernel: Lustre: Skipped 6 previous similar messages May 07 11:59:24 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.23.14@o2ib6) May 07 11:59:24 fir-md1-s2 kernel: Lustre: Skipped 17 previous similar messages May 07 12:08:58 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 02bcd367-45c9-f294-deda-08045d8722c3 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931840e6dc00, cur 1557256138 expire 1557255988 last 1557255911 May 07 12:08:58 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 07 12:10:23 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.109.7@o2ib4) May 07 12:10:23 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages May 07 12:21:36 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 264a8fc1-0463-2875-ec85-6c3245c570a8 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff932e86b07800, cur 1557256896 expire 1557256746 last 1557256669 May 07 12:21:36 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 12:21:55 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.23.14@o2ib6) May 07 12:21:55 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages May 07 12:32:26 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client d8583d7f-ba8f-1675-e5a4-3f69f2d6da58 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931b1c7a5800, cur 1557257546 expire 1557257396 last 1557257319 May 07 12:32:26 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 07 12:33:00 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.23.14@o2ib6) May 07 12:33:00 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 07 12:43:36 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 07 12:43:36 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 07 12:44:33 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client ab8fc5bf-6f10-4c83-559e-3f6fc2255fd7 (at 10.9.114.5@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff932a73707400, cur 1557258273 expire 1557258123 last 1557258046 May 07 12:44:33 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages May 07 12:58:19 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 4ff37796-c024-485d-10eb-91250bc869b7 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931759224000, cur 1557259099 expire 1557258949 last 1557258872 May 07 12:58:19 fir-md1-s2 kernel: Lustre: Skipped 7 previous similar messages May 07 12:58:48 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.23.14@o2ib6) May 07 12:58:48 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages May 07 13:02:29 fir-md1-s2 kernel: LNetError: 121170:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 07 13:09:56 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.114.5@o2ib4) May 07 13:09:56 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 07 13:22:00 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 07 13:22:00 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 13:22:56 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 3695ec97-1e33-a94a-7a18-4e8ccb31b79d (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931de963dc00, cur 1557260576 expire 1557260426 last 1557260349 May 07 13:22:56 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages May 07 13:30:25 fir-md1-s2 kernel: Lustre: 122058:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557261018/real 1557261018] req@ffff930d36ef8000 x1632335493781824/t0(0) o104->fir-MDT0001@10.9.101.43@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1557261025 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 07 13:30:25 fir-md1-s2 kernel: Lustre: 122058:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message May 07 13:30:27 fir-md1-s2 kernel: Lustre: 122293:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557261020/real 1557261020] req@ffff9334b67c9b00 x1632335494189952/t0(0) o104->fir-MDT0001@10.9.101.43@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1557261027 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 07 13:30:27 fir-md1-s2 kernel: Lustre: 122293:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 6 previous similar messages May 07 13:30:32 fir-md1-s2 kernel: Lustre: 122046:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557261025/real 1557261025] req@ffff932431a96c00 x1632335494993248/t0(0) o104->fir-MDT0001@10.9.101.43@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1557261032 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 07 13:30:32 fir-md1-s2 kernel: Lustre: 122046:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages May 07 13:30:33 fir-md1-s2 kernel: Lustre: 122119:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff9323ddedc800 x1631592723404752/t0(0) o101->939c0635-d3e5-7945-6eca-6a92a2676304@10.9.101.4@o2ib4:8/0 lens 480/568 e 1 to 0 dl 1557261038 ref 2 fl Interpret:/0/0 rc 0/0 May 07 13:30:34 fir-md1-s2 kernel: Lustre: 121639:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff93276ce52a00 x1631675518090800/t0(0) o101->2d384d58-fd4c-f6d6-342b-6f9f296484e1@10.9.101.46@o2ib4:9/0 lens 480/568 e 1 to 0 dl 1557261039 ref 2 fl Interpret:/0/0 rc 0/0 May 07 13:30:34 fir-md1-s2 kernel: Lustre: 121639:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message May 07 13:30:35 fir-md1-s2 kernel: Lustre: 121657:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff932f3a3e1b00 x1631565792838736/t0(0) o101->59eb580d-79f6-c9fa-7886-60aca8aaf8c9@10.9.101.22@o2ib4:10/0 lens 480/568 e 1 to 0 dl 1557261040 ref 2 fl Interpret:/0/0 rc 0/0 May 07 13:30:35 fir-md1-s2 kernel: Lustre: 121657:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message May 07 13:30:37 fir-md1-s2 kernel: Lustre: 122265:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff931e357bdd00 x1631538718040832/t0(0) o101->66a54434-eb06-b80f-3630-3aa618c202aa@10.9.108.55@o2ib4:12/0 lens 480/568 e 1 to 0 dl 1557261042 ref 2 fl Interpret:/0/0 rc 0/0 May 07 13:30:37 fir-md1-s2 kernel: Lustre: 122265:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message May 07 13:30:39 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client e3278c0d-d9f3-d354-8ead-4a36d1df71cf (at 10.9.101.30@o2ib4) reconnecting May 07 13:30:40 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 939c0635-d3e5-7945-6eca-6a92a2676304 (at 10.9.101.4@o2ib4) reconnecting May 07 13:30:41 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 59eb580d-79f6-c9fa-7886-60aca8aaf8c9 (at 10.9.101.22@o2ib4) reconnecting May 07 13:30:41 fir-md1-s2 kernel: Lustre: Skipped 2 previous similar messages May 07 13:30:42 fir-md1-s2 kernel: Lustre: 122710:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557261035/real 1557261035] req@ffff93371be69800 x1632335496479664/t0(0) o104->fir-MDT0001@10.9.101.43@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1557261042 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 07 13:30:42 fir-md1-s2 kernel: Lustre: 122710:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 27 previous similar messages May 07 13:30:45 fir-md1-s2 kernel: Lustre: 122606:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff932000e38f00 x1631926452868368/t0(0) o101->7dff8a6e-e3f6-696d-9545-c3ce3c471f1b@10.9.101.1@o2ib4:20/0 lens 480/568 e 0 to 0 dl 1557261050 ref 2 fl Interpret:/0/0 rc 0/0 May 07 13:30:45 fir-md1-s2 kernel: Lustre: 122606:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages May 07 13:30:46 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client ff63acf1-7514-0f93-947d-5c9294be9e18 (at 10.9.101.21@o2ib4) reconnecting May 07 13:30:46 fir-md1-s2 kernel: Lustre: Skipped 2 previous similar messages May 07 13:30:51 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 7dff8a6e-e3f6-696d-9545-c3ce3c471f1b (at 10.9.101.1@o2ib4) reconnecting May 07 13:30:51 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 13:30:53 fir-md1-s2 kernel: LustreError: 122058:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.9.101.43@o2ib4) failed to reply to blocking AST (req@ffff930d36ef8000 x1632335493781824 status 0 rc -110), evict it ns: mdt-fir-MDT0001_UUID lock: ffff931276a0ba80/0x1c35e9ed7139093e lrc: 4/0,0 mode: PR/PR res: [0x24002627c:0x4b1:0x0].0x0 bits 0x40/0x0 rrc: 60 type: IBT flags: 0x60000400000020 nid: 10.9.101.43@o2ib4 remote: 0x4c51118508a59d9f expref: 2477 pid: 122147 timeout: 786314 lvb_type: 0 May 07 13:30:53 fir-md1-s2 kernel: LustreError: 138-a: fir-MDT0001: A client on nid 10.9.101.43@o2ib4 was evicted due to a lock blocking callback time out: rc -110 May 07 13:30:53 fir-md1-s2 kernel: LustreError: 121399:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 35s: evicting client at 10.9.101.43@o2ib4 ns: mdt-fir-MDT0001_UUID lock: ffff931276a0ba80/0x1c35e9ed7139093e lrc: 3/0,0 mode: PR/PR res: [0x24002627c:0x4b1:0x0].0x0 bits 0x40/0x0 rrc: 60 type: IBT flags: 0x60000400000020 nid: 10.9.101.43@o2ib4 remote: 0x4c51118508a59d9f expref: 2478 pid: 122147 timeout: 0 lvb_type: 0 May 07 13:33:59 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client f208ebb8-4376-4043-6760-d7bbad764cbd (at 10.9.101.43@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff933079043c00, cur 1557261239 expire 1557261089 last 1557261012 May 07 13:33:59 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 13:36:21 fir-md1-s2 kernel: LNetError: 121172:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) May 07 13:50:27 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client f2044964-9abb-674f-9c85-4840b63d6662 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931a4a214400, cur 1557262227 expire 1557262077 last 1557262000 May 07 13:51:06 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 07 13:51:06 fir-md1-s2 kernel: Lustre: Skipped 12 previous similar messages May 07 14:01:34 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.101.43@o2ib4) May 07 14:01:34 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 14:08:44 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 33e4d383-afe4-7850-9e31-74f1d3fe3003 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9316502dc000, cur 1557263324 expire 1557263174 last 1557263097 May 07 14:08:44 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 14:09:25 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 07 14:09:25 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 14:10:00 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 38315ffb-5342-ed63-c1e3-0b2c0fe96b7c (at 10.8.10.29@o2ib6) in 211 seconds. I think it's dead, and I am evicting it. exp ffff932ac33c6800, cur 1557263400 expire 1557263250 last 1557263189 May 07 14:10:00 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 14:11:16 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client d959b386-a70e-7a16-dcb7-b27fd3011742 (at 10.9.114.5@o2ib4) in 185 seconds. I think it's dead, and I am evicting it. exp ffff93202d2fc000, cur 1557263476 expire 1557263326 last 1557263291 May 07 14:11:16 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 14:13:57 fir-md1-s2 kernel: Lustre: 122176:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557263630/real 1557263630] req@ffff93161d7d0600 x1632335923710480/t0(0) o106->fir-MDT0003@10.8.24.5@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557263637 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 07 14:13:57 fir-md1-s2 kernel: Lustre: 122176:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 35 previous similar messages May 07 14:14:04 fir-md1-s2 kernel: Lustre: 122176:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557263637/real 1557263637] req@ffff93161d7d0600 x1632335923710480/t0(0) o106->fir-MDT0003@10.8.24.5@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557263644 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 07 14:14:05 fir-md1-s2 kernel: Lustre: 122168:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff931917f80000 x1631530762648560/t0(0) o101->0b49eccd-cda4-7bac-8560-4f28415786a3@10.9.0.62@o2ib4:10/0 lens 480/568 e 1 to 0 dl 1557263650 ref 2 fl Interpret:/0/0 rc 0/0 May 07 14:14:05 fir-md1-s2 kernel: Lustre: 122168:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 3 previous similar messages May 07 14:14:11 fir-md1-s2 kernel: Lustre: 122176:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557263644/real 1557263644] req@ffff93161d7d0600 x1632335923710480/t0(0) o106->fir-MDT0003@10.8.24.5@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557263651 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 07 14:14:11 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 0b49eccd-cda4-7bac-8560-4f28415786a3 (at 10.9.0.62@o2ib4) reconnecting May 07 14:14:11 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 14:14:25 fir-md1-s2 kernel: Lustre: 122176:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557263658/real 1557263658] req@ffff93161d7d0600 x1632335923710480/t0(0) o106->fir-MDT0003@10.8.24.5@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557263665 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 07 14:14:25 fir-md1-s2 kernel: Lustre: 122176:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message May 07 14:14:32 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 0b49eccd-cda4-7bac-8560-4f28415786a3 (at 10.9.0.62@o2ib4) reconnecting May 07 14:14:32 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to (at 10.9.0.62@o2ib4) May 07 14:14:32 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 14:14:46 fir-md1-s2 kernel: Lustre: 122176:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557263679/real 1557263679] req@ffff93161d7d0600 x1632335923710480/t0(0) o106->fir-MDT0003@10.8.24.5@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557263686 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 07 14:14:46 fir-md1-s2 kernel: Lustre: 122176:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages May 07 14:14:53 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 0b49eccd-cda4-7bac-8560-4f28415786a3 (at 10.9.0.62@o2ib4) reconnecting May 07 14:15:14 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 0b49eccd-cda4-7bac-8560-4f28415786a3 (at 10.9.0.62@o2ib4) reconnecting May 07 14:15:28 fir-md1-s2 kernel: Lustre: 122176:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557263721/real 1557263721] req@ffff93161d7d0600 x1632335923710480/t0(0) o106->fir-MDT0003@10.8.24.5@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557263728 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 07 14:15:28 fir-md1-s2 kernel: Lustre: 122176:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages May 07 14:15:36 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 0b49eccd-cda4-7bac-8560-4f28415786a3 (at 10.9.0.62@o2ib4) reconnecting May 07 14:15:57 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 0b49eccd-cda4-7bac-8560-4f28415786a3 (at 10.9.0.62@o2ib4) reconnecting May 07 14:16:45 fir-md1-s2 kernel: Lustre: 122176:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557263798/real 1557263798] req@ffff93161d7d0600 x1632335923710480/t0(0) o106->fir-MDT0003@10.8.24.5@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1557263805 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 07 14:16:45 fir-md1-s2 kernel: Lustre: 122176:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 10 previous similar messages May 07 14:16:49 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 0b49eccd-cda4-7bac-8560-4f28415786a3 (at 10.9.0.62@o2ib4) reconnecting May 07 14:16:49 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 14:17:10 fir-md1-s2 kernel: LNet: Service thread pid 122176 was inactive for 200.35s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: May 07 14:17:10 fir-md1-s2 kernel: Pid: 122176, comm: mdt01_064 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 May 07 14:17:10 fir-md1-s2 kernel: Call Trace: May 07 14:17:10 fir-md1-s2 kernel: [] ptlrpc_set_wait+0x500/0x8d0 [ptlrpc] May 07 14:17:10 fir-md1-s2 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] May 07 14:17:10 fir-md1-s2 kernel: [] ldlm_glimpse_locks+0x3b/0x100 [ptlrpc] May 07 14:17:10 fir-md1-s2 kernel: [] mdt_do_glimpse+0x1e9/0x4c0 [mdt] May 07 14:17:10 fir-md1-s2 kernel: [] mdt_glimpse_enqueue+0x3d3/0x4f0 [mdt] May 07 14:17:10 fir-md1-s2 kernel: [] mdt_intent_glimpse+0x1f/0x30 [mdt] May 07 14:17:10 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] May 07 14:17:10 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] May 07 14:17:10 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] May 07 14:17:10 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] May 07 14:17:10 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] May 07 14:17:10 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] May 07 14:17:10 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] May 07 14:17:10 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 May 07 14:17:10 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 May 07 14:17:10 fir-md1-s2 kernel: [] 0xffffffffffffffff May 07 14:17:10 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1557263830.122176 May 07 14:17:15 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 24f54b66-220b-e07b-5df6-a50a2f3914f4 (at 10.8.24.5@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9315d27a2400, cur 1557263835 expire 1557263685 last 1557263608 May 07 14:17:15 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 14:17:31 fir-md1-s2 kernel: LNet: Service thread pid 122176 completed after 220.75s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). May 07 14:23:21 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 31568f88-a5da-6bea-eb69-146c5b1b856e (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93004f83ec00, cur 1557264201 expire 1557264051 last 1557263974 May 07 14:23:21 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 14:28:11 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.10.29@o2ib6) May 07 14:28:11 fir-md1-s2 kernel: Lustre: Skipped 12 previous similar messages May 07 14:36:12 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 5960584d-8bd1-90dd-6045-b7e6792172b7 (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931baeb89800, cur 1557264972 expire 1557264822 last 1557264745 May 07 14:36:12 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 14:39:29 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.10.29@o2ib6) May 07 14:39:29 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 14:41:40 fir-md1-s2 kernel: Lustre: 122024:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557265293/real 1557265293] req@ffff93145b2c3300 x1632336228955072/t0(0) o104->fir-MDT0001@10.8.26.4@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1557265300 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 07 14:41:40 fir-md1-s2 kernel: Lustre: 122024:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 6 previous similar messages May 07 14:41:48 fir-md1-s2 kernel: Lustre: 122222:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff93152479a700 x1631391526246656/t0(0) o36->ef5f6b7b-5844-0e36-bcbe-e166ef8ee02e@10.8.25.30@o2ib6:23/0 lens 512/448 e 1 to 0 dl 1557265313 ref 2 fl Interpret:/0/0 rc 0/0 May 07 14:41:54 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client ef5f6b7b-5844-0e36-bcbe-e166ef8ee02e (at 10.8.25.30@o2ib6) reconnecting May 07 14:41:54 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 14:42:01 fir-md1-s2 kernel: Lustre: 122024:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557265314/real 1557265314] req@ffff93145b2c3300 x1632336228955072/t0(0) o104->fir-MDT0001@10.8.26.4@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1557265321 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 07 14:42:01 fir-md1-s2 kernel: Lustre: 122024:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages May 07 14:42:15 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client ef5f6b7b-5844-0e36-bcbe-e166ef8ee02e (at 10.8.25.30@o2ib6) reconnecting May 07 14:42:31 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 926e7156-e3ea-d21b-5570-976ffcc525b8 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93174ff69400, cur 1557265351 expire 1557265201 last 1557265124 May 07 14:42:31 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 14:46:39 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client e5c6e418-9402-b4c8-b17f-405046b45b54 (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9308f0a0f000, cur 1557265599 expire 1557265449 last 1557265372 May 07 14:46:39 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 14:49:32 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.10.29@o2ib6) May 07 14:49:32 fir-md1-s2 kernel: Lustre: Skipped 7 previous similar messages May 07 14:50:02 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 6164810e-2f17-0da4-b2de-82ab2fa31076 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9318bb25b800, cur 1557265802 expire 1557265652 last 1557265575 May 07 14:50:02 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 14:56:09 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 96d0d7ea-f6f4-656c-8e7f-84d0f2fccdc9 (at 10.9.108.24@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93207c63dc00, cur 1557266169 expire 1557266019 last 1557265942 May 07 14:56:09 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 15:01:06 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.108.24@o2ib4) May 07 15:01:06 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 07 15:08:33 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client dfc30246-ecbd-f024-7bb0-28e603f81590 (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff932538201800, cur 1557266913 expire 1557266763 last 1557266686 May 07 15:08:33 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 07 15:12:45 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.10.29@o2ib6) May 07 15:12:45 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 07 15:15:32 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client b8c209bc-ddc2-5ae2-ef34-ca529f4f7274 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931c9123d800, cur 1557267332 expire 1557267182 last 1557267105 May 07 15:15:32 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 15:21:37 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 8a34a7ef-2661-ea2e-15af-98d192d821d8 (at 10.9.108.43@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931b20ed4c00, cur 1557267697 expire 1557267547 last 1557267470 May 07 15:21:37 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 15:22:53 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 02541ef7-58a2-bedf-b212-270090ac5ab8 (at 10.8.26.4@o2ib6) in 187 seconds. I think it's dead, and I am evicting it. exp ffff93162e21d000, cur 1557267773 expire 1557267623 last 1557267586 May 07 15:22:53 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 15:23:15 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 07 15:23:15 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 07 15:28:14 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 4f271412-c3e2-b3b8-adf4-b291b150b7ab (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930e0deea800, cur 1557268094 expire 1557267944 last 1557267867 May 07 15:28:14 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 15:30:48 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 5ad2f414-8163-9fb0-ca35-f6a107214b44 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff932f7213b400, cur 1557268248 expire 1557268098 last 1557268021 May 07 15:30:48 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 15:35:21 fir-md1-s2 kernel: Lustre: 122731:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557268514/real 1557268514] req@ffff9328426f7500 x1632336805592528/t0(0) o106->fir-MDT0001@10.9.109.43@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1557268521 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 07 15:35:21 fir-md1-s2 kernel: Lustre: 122731:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages May 07 15:35:27 fir-md1-s2 kernel: Lustre: 121635:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557268520/real 1557268520] req@ffff9328426f5400 x1632336806581920/t0(0) o106->fir-MDT0003@10.9.109.41@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1557268527 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 07 15:35:27 fir-md1-s2 kernel: Lustre: 121635:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages May 07 15:35:29 fir-md1-s2 kernel: Lustre: 122330:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff932837e2ce00 x1631610964661808/t0(0) o101->1c578c74-5128-6e3f-cdf7-83221a90bc4e@10.8.27.8@o2ib6:4/0 lens 480/568 e 1 to 0 dl 1557268534 ref 2 fl Interpret:/0/0 rc 0/0 May 07 15:35:30 fir-md1-s2 kernel: Lustre: 122005:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff93201fe91800 x1631642873645552/t0(0) o101->b4dc4310-abd3-57a8-960f-a27b33e667d3@10.8.27.7@o2ib6:5/0 lens 480/568 e 1 to 0 dl 1557268535 ref 2 fl Interpret:/0/0 rc 0/0 May 07 15:35:35 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 1c578c74-5128-6e3f-cdf7-83221a90bc4e (at 10.8.27.8@o2ib6) reconnecting May 07 15:35:35 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.27.8@o2ib6) May 07 15:35:35 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages May 07 15:35:41 fir-md1-s2 kernel: Lustre: 121635:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557268534/real 1557268534] req@ffff9328426f5400 x1632336806581920/t0(0) o106->fir-MDT0003@10.9.109.41@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1557268541 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 07 15:35:41 fir-md1-s2 kernel: Lustre: 121635:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 11 previous similar messages May 07 15:35:45 fir-md1-s2 kernel: Lustre: 122065:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff93349abc1500 x1631535492311136/t0(0) o101->205cae49-b70d-4635-7302-5f62b1c05bbe@10.9.102.18@o2ib4:20/0 lens 480/568 e 0 to 0 dl 1557268550 ref 2 fl Interpret:/0/0 rc 0/0 May 07 15:35:45 fir-md1-s2 kernel: Lustre: 122065:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages May 07 15:35:51 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 7b074a6a-71d1-b588-1012-cdc05d24d9a2 (at 10.9.106.1@o2ib4) reconnecting May 07 15:35:51 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 07 15:35:56 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 1c578c74-5128-6e3f-cdf7-83221a90bc4e (at 10.8.27.8@o2ib6) reconnecting May 07 15:35:56 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 15:36:01 fir-md1-s2 kernel: Lustre: 122672:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557268554/real 1557268554] req@ffff932bf270c500 x1632336811465744/t0(0) o106->fir-MDT0001@10.9.109.32@o2ib4:15/16 lens 296/280 e 0 to 1 dl 1557268561 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 07 15:36:01 fir-md1-s2 kernel: Lustre: 122672:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 23 previous similar messages May 07 15:36:02 fir-md1-s2 kernel: Lustre: 121420:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff932771316c00 x1631570288556912/t0(0) o101->409782ab-594c-0837-10bd-459bd6e52b7f@10.9.106.26@o2ib4:7/0 lens 480/568 e 1 to 0 dl 1557268567 ref 2 fl Interpret:/0/0 rc 0/0 May 07 15:36:02 fir-md1-s2 kernel: Lustre: 121420:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message May 07 15:36:07 fir-md1-s2 kernel: Lustre: 122652:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff93199a745700 x1631536061889888/t0(0) o101->77cf16cf-22cf-31cd-be2c-d11a46be6b40@10.8.2.18@o2ib6:12/0 lens 480/568 e 0 to 0 dl 1557268572 ref 2 fl Interpret:/0/0 rc 0/0 May 07 15:36:07 fir-md1-s2 kernel: Lustre: 122652:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message May 07 15:36:08 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 409782ab-594c-0837-10bd-459bd6e52b7f (at 10.9.106.26@o2ib4) reconnecting May 07 15:36:08 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 07 15:36:29 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 409782ab-594c-0837-10bd-459bd6e52b7f (at 10.9.106.26@o2ib4) reconnecting May 07 15:36:29 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages May 07 15:36:33 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 15e7b88c-a93b-062c-45b0-856903418d38 (at 10.9.109.21@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93083f2fe000, cur 1557268593 expire 1557268443 last 1557268366 May 07 15:36:33 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 15:47:15 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.10.29@o2ib6) May 07 15:47:15 fir-md1-s2 kernel: Lustre: Skipped 20 previous similar messages May 07 16:03:31 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client e6bce195-8b75-8100-43dd-92ae3b1a9171 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9313826ee000, cur 1557270211 expire 1557270061 last 1557269984 May 07 16:03:31 fir-md1-s2 kernel: Lustre: Skipped 65 previous similar messages May 07 16:04:19 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 07 16:04:19 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 07 16:04:47 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client dda09422-c907-407d-036d-a3a6d44772c2 (at 10.8.10.29@o2ib6) in 224 seconds. I think it's dead, and I am evicting it. exp ffff932230a36000, cur 1557270287 expire 1557270137 last 1557270063 May 07 16:04:47 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 16:10:14 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client d13425d2-f7f9-4d3b-4526-236128461cab (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff932527f59400, cur 1557270614 expire 1557270464 last 1557270387 May 07 16:10:14 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 16:16:18 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client f10e4c66-3318-da71-f923-7064d0e55a01 (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff933e81647000, cur 1557270978 expire 1557270828 last 1557270751 May 07 16:16:18 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 16:19:46 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 21549ee9-0107-6f79-6660-55210797501b (at 10.8.26.4@o2ib6) May 07 16:19:46 fir-md1-s2 kernel: Lustre: Skipped 69 previous similar messages May 07 16:32:38 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client e8e31406-a81e-9855-5960-72750e6f5a6f (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9306883ae000, cur 1557271958 expire 1557271808 last 1557271731 May 07 16:32:38 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages May 07 16:35:45 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.10.29@o2ib6) May 07 16:35:45 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages May 07 16:45:49 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client c1994563-ed97-f834-cb7a-c54fb48168a8 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931add675000, cur 1557272749 expire 1557272599 last 1557272522 May 07 16:45:49 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 07 16:53:41 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 040e0374-d725-24d4-2f80-077b03acafdc (at 10.9.109.48@o2ib4) May 07 16:53:41 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages May 07 17:07:17 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 45f832a0-7d8a-4d46-d891-fd630cd0c7e1 (at 10.9.109.43@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931deefd1400, cur 1557274037 expire 1557273887 last 1557273810 May 07 17:07:17 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages May 07 17:08:33 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 4b5bfe68-a7e2-f5c6-e273-5c76b638d869 (at 10.8.26.4@o2ib6) in 164 seconds. I think it's dead, and I am evicting it. exp ffff9323637af000, cur 1557274113 expire 1557273963 last 1557273949 May 07 17:08:33 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 17:11:39 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client bfd38f5b-ccd8-05cb-0d7c-c4305577818f (at 10.9.109.34@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9339cf737800, cur 1557274299 expire 1557274149 last 1557274072 May 07 17:11:39 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 07 17:12:42 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 40e0b6bc-4241-1c2f-b5d9-fe91b0eeeeee (at 10.9.109.43@o2ib4) May 07 17:12:42 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages May 07 17:20:42 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 55c0f714-bb69-9ec5-a06f-f03c78c234ec (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931d14654c00, cur 1557274842 expire 1557274692 last 1557274615 May 07 17:20:42 fir-md1-s2 kernel: Lustre: Skipped 51 previous similar messages May 07 17:22:51 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 53dc128b-5d9b-8aa7-2699-5120d42e191c (at 10.9.109.24@o2ib4) May 07 17:22:51 fir-md1-s2 kernel: Lustre: Skipped 37 previous similar messages May 07 17:32:16 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 0d98b6a6-b21c-6c5f-c579-036b4dd581bc (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931f3c3d8c00, cur 1557275536 expire 1557275386 last 1557275309 May 07 17:32:16 fir-md1-s2 kernel: Lustre: Skipped 7 previous similar messages May 07 17:36:54 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.10.29@o2ib6) May 07 17:36:54 fir-md1-s2 kernel: Lustre: Skipped 19 previous similar messages May 07 17:45:20 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client f9323ba0-64df-6cda-6dfb-dd11bf2657f0 (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9315022fb000, cur 1557276320 expire 1557276170 last 1557276093 May 07 17:45:20 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages May 07 17:49:05 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.10.29@o2ib6) May 07 17:49:05 fir-md1-s2 kernel: Lustre: Skipped 15 previous similar messages May 07 17:57:06 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client a5946be7-16af-7137-90f1-3ba3582c1591 (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930ce536e400, cur 1557277026 expire 1557276876 last 1557276799 May 07 17:57:06 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 07 18:00:11 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.10.29@o2ib6) May 07 18:00:11 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 18:09:27 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client db97291c-0e58-1db8-997e-93e0079f42e9 (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff932497b11400, cur 1557277767 expire 1557277617 last 1557277540 May 07 18:09:27 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 18:12:47 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.10.29@o2ib6) May 07 18:12:47 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 07 18:20:17 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client d654f0f4-19f6-d029-e9b9-629b432c9609 (at 10.9.112.13@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff933d17a33c00, cur 1557278417 expire 1557278267 last 1557278190 May 07 18:20:17 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 18:25:11 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.15.4@o2ib6) May 07 18:25:11 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 18:41:42 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 6c0661ac-98eb-50dd-0f02-14b2a8b09ba5 (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930c853a8800, cur 1557279702 expire 1557279552 last 1557279475 May 07 18:41:42 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 07 18:46:02 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.10.29@o2ib6) May 07 18:46:02 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 18:50:18 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 785a0a0b-c618-c066-907a-762b25db5bf2 (at 10.8.15.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff932c35b2b000, cur 1557280218 expire 1557280068 last 1557279991 May 07 18:50:18 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 19:06:26 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client f4b2b27d-5164-df59-9352-2a8dbf5b8bcd (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931b7c522800, cur 1557281186 expire 1557281036 last 1557280959 May 07 19:06:26 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 19:07:01 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.23.14@o2ib6) May 07 19:07:01 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 19:13:16 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 7bd6652b-ae50-5135-d32c-c0df332b2aef (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930d196e5000, cur 1557281596 expire 1557281446 last 1557281369 May 07 19:13:16 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 19:15:11 fir-md1-s2 kernel: Lustre: 122134:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff93170b358000 x1631770339511376/t0(0) o36->d78976e9-8fde-067a-7427-e1d7715d5787@10.8.20.22@o2ib6:16/0 lens 512/448 e 1 to 0 dl 1557281716 ref 2 fl Interpret:/0/0 rc 0/0 May 07 19:15:18 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client d78976e9-8fde-067a-7427-e1d7715d5787 (at 10.8.20.22@o2ib6) reconnecting May 07 19:15:18 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.20.22@o2ib6) May 07 19:15:28 fir-md1-s2 kernel: Lustre: 122070:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557281696/real 1557281696] req@ffff931811e27b00 x1632339185198336/t0(0) o104->fir-MDT0001@10.8.23.14@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1557281727 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 07 19:15:28 fir-md1-s2 kernel: Lustre: 122070:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 43 previous similar messages May 07 19:15:39 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client d78976e9-8fde-067a-7427-e1d7715d5787 (at 10.8.20.22@o2ib6) reconnecting May 07 19:15:39 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.20.22@o2ib6) May 07 19:15:52 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 9a56a19f-ae1b-ace0-70a7-6d8e93c71c2f (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931591ed7800, cur 1557281752 expire 1557281602 last 1557281525 May 07 19:15:52 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 19:16:18 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.23.14@o2ib6) May 07 19:16:18 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 19:17:43 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.10.29@o2ib6) May 07 19:17:43 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 19:47:52 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client c7854b01-d446-4b98-3dd0-a771eb9a48dc (at 10.8.1.11@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9315f6e92000, cur 1557283672 expire 1557283522 last 1557283445 May 07 19:47:52 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 19:52:54 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 277ff991-6cec-64c6-c68e-c37b2537028b (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931c22e57800, cur 1557283974 expire 1557283824 last 1557283747 May 07 19:52:54 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 19:57:21 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.10.29@o2ib6) May 07 19:57:21 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 20:05:22 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client fea369df-1415-a320-0427-042cc66dfb9c (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931d54774c00, cur 1557284722 expire 1557284572 last 1557284495 May 07 20:05:22 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 20:09:42 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.10.29@o2ib6) May 07 20:09:42 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 20:17:42 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 2f4ebf14-bc7c-5c22-d321-ba8a7f17a339 (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931f1cf25800, cur 1557285462 expire 1557285312 last 1557285235 May 07 20:17:42 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 20:19:58 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.1.11@o2ib6) May 07 20:20:40 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.10.29@o2ib6) May 07 20:20:40 fir-md1-s2 kernel: Lustre: Skipped 2 previous similar messages May 07 20:31:08 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client c29839db-474d-e164-4d2b-80ceb9a5f39c (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff932064657800, cur 1557286268 expire 1557286118 last 1557286041 May 07 20:31:08 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 20:35:14 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.10.29@o2ib6) May 07 20:35:14 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 20:45:45 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client c63fa75e-db43-02cb-1e97-c490767a8a3c (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93152da79c00, cur 1557287145 expire 1557286995 last 1557286918 May 07 20:45:45 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 20:45:50 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client c63fa75e-db43-02cb-1e97-c490767a8a3c (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9322f9e38c00, cur 1557287150 expire 1557287000 last 1557286923 May 07 20:49:08 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.10.29@o2ib6) May 07 20:49:08 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 22:18:51 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 209fb82a-ea02-ddd0-64cc-71f136a8206e (at 10.8.25.5@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931b4caabc00, cur 1557292731 expire 1557292581 last 1557292504 May 07 22:31:13 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 689c4a40-670a-db67-b69e-ba61a332674f (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff932fa7528c00, cur 1557293473 expire 1557293323 last 1557293246 May 07 22:31:13 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 22:35:22 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.10.29@o2ib6) May 07 22:35:22 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 22:43:02 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 740b1e7f-c950-a6f4-1c52-52725a57adef (at 10.9.102.1@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930712b3c800, cur 1557294182 expire 1557294032 last 1557293955 May 07 22:43:02 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 22:46:28 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.25.5@o2ib6) May 07 22:46:28 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 07 23:10:47 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.102.1@o2ib4) May 07 23:10:47 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 08 01:33:31 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client c6c69886-c1e4-472e-425d-70c33f0565ee (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff932ff4250400, cur 1557304411 expire 1557304261 last 1557304184 May 08 01:33:31 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 08 01:33:41 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client c6c69886-c1e4-472e-425d-70c33f0565ee (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9314da607400, cur 1557304421 expire 1557304271 last 1557304194 May 08 01:34:06 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.23.14@o2ib6) May 08 01:34:06 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 08 01:35:37 fir-md1-s2 kernel: Lustre: 122148:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557304530/real 1557304530] req@ffff93362d32e900 x1632343339932608/t0(0) o104->fir-MDT0001@10.9.101.1@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1557304537 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 08 01:35:37 fir-md1-s2 kernel: Lustre: 122148:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message May 08 01:35:38 fir-md1-s2 kernel: Lustre: 122122:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557304531/real 1557304531] req@ffff9316d5bac500 x1632343340146112/t0(0) o104->fir-MDT0001@10.9.101.1@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1557304538 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 08 01:35:38 fir-md1-s2 kernel: Lustre: 122122:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages May 08 01:35:40 fir-md1-s2 kernel: Lustre: 122147:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557304533/real 1557304533] req@ffff930e83f1cb00 x1632343340448112/t0(0) o104->fir-MDT0001@10.9.101.1@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1557304540 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 08 01:35:40 fir-md1-s2 kernel: Lustre: 122147:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages May 08 01:35:42 fir-md1-s2 kernel: Lustre: 122346:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557304535/real 1557304535] req@ffff931ac9314b00 x1632343340895728/t0(0) o104->fir-MDT0001@10.9.101.1@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1557304542 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 08 01:35:42 fir-md1-s2 kernel: Lustre: 122346:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message May 08 01:35:45 fir-md1-s2 kernel: Lustre: 122051:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff931ae43f0600 x1631552077024656/t0(0) o101->7f1d72e1-9729-0e4a-74ef-0e1a51d05d62@10.9.101.50@o2ib4:20/0 lens 480/568 e 1 to 0 dl 1557304550 ref 2 fl Interpret:/0/0 rc 0/0 May 08 01:35:46 fir-md1-s2 kernel: Lustre: 121656:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff93150ba09200 x1631550798667696/t0(0) o101->d306ff79-fb9e-2f98-a900-35120cbd847f@10.9.101.7@o2ib4:21/0 lens 480/568 e 1 to 0 dl 1557304551 ref 2 fl Interpret:/0/0 rc 0/0 May 08 01:35:46 fir-md1-s2 kernel: Lustre: 121656:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 4 previous similar messages May 08 01:35:48 fir-md1-s2 kernel: Lustre: 122862:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557304541/real 1557304541] req@ffff93363678e300 x1632343340754656/t0(0) o104->fir-MDT0001@10.9.101.1@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1557304548 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 08 01:35:48 fir-md1-s2 kernel: Lustre: 122862:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 14 previous similar messages May 08 01:35:48 fir-md1-s2 kernel: Lustre: 122706:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff93360ce80600 x1631641704494576/t0(0) o101->9254edf4-eb93-5012-62de-675136350f72@10.9.108.56@o2ib4:23/0 lens 480/568 e 1 to 0 dl 1557304553 ref 2 fl Interpret:/0/0 rc 0/0 May 08 01:35:51 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client a33810e4-5bd0-c297-1a4d-1cbd7ad1d09a (at 10.9.101.53@o2ib4) reconnecting May 08 01:35:51 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.101.50@o2ib4) May 08 01:35:51 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 08 01:35:51 fir-md1-s2 kernel: Lustre: 122643:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff93101bb21800 x1631641704494640/t0(0) o101->9254edf4-eb93-5012-62de-675136350f72@10.9.108.56@o2ib4:26/0 lens 480/568 e 1 to 0 dl 1557304556 ref 2 fl Interpret:/0/0 rc 0/0 May 08 01:35:51 fir-md1-s2 kernel: Lustre: 122643:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages May 08 01:35:52 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client d306ff79-fb9e-2f98-a900-35120cbd847f (at 10.9.101.7@o2ib4) reconnecting May 08 01:35:52 fir-md1-s2 kernel: Lustre: Skipped 2 previous similar messages May 08 01:35:52 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.101.7@o2ib4) May 08 01:35:52 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 08 01:35:55 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 9c0c610d-0561-9b9a-98c4-0ad0384caf27 (at 10.9.101.51@o2ib4) reconnecting May 08 01:35:55 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 08 01:35:55 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.101.51@o2ib4) May 08 01:35:55 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 08 01:35:56 fir-md1-s2 kernel: Lustre: 122235:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff9336373a9200 x1631548957585856/t0(0) o101->f97cfae5-aafc-8930-1f52-c15218589c16@10.9.108.37@o2ib4:1/0 lens 480/568 e 0 to 0 dl 1557304561 ref 2 fl Interpret:/0/0 rc 0/0 May 08 01:35:56 fir-md1-s2 kernel: Lustre: 122235:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages May 08 01:35:58 fir-md1-s2 kernel: Lustre: 122002:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557304551/real 1557304551] req@ffff931fa2ed7800 x1632343339935328/t0(0) o104->fir-MDT0001@10.9.101.1@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1557304558 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 08 01:35:58 fir-md1-s2 kernel: Lustre: 122002:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 27 previous similar messages May 08 01:36:00 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 2200fcb4-d39b-78c8-37f6-08d710935adf (at 10.9.101.42@o2ib4) reconnecting May 08 01:36:00 fir-md1-s2 kernel: Lustre: Skipped 2 previous similar messages May 08 01:36:00 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.101.42@o2ib4) May 08 01:36:00 fir-md1-s2 kernel: Lustre: Skipped 2 previous similar messages May 08 01:36:04 fir-md1-s2 kernel: Lustre: 121987:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff931f9fca9b00 x1631549256036016/t0(0) o101->ff5adeab-1158-6b77-bf0c-cf0414b48364@10.9.101.55@o2ib4:9/0 lens 480/568 e 1 to 0 dl 1557304569 ref 2 fl Interpret:/0/0 rc 0/0 May 08 01:36:04 fir-md1-s2 kernel: Lustre: 121987:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message May 08 01:36:05 fir-md1-s2 kernel: LustreError: 122148:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.9.101.1@o2ib4) failed to reply to blocking AST (req@ffff93362d32e900 x1632343339932608 status 0 rc -110), evict it ns: mdt-fir-MDT0001_UUID lock: ffff930dfa31a400/0x1c35e9ee1ee1c4b8 lrc: 4/0,0 mode: PR/PR res: [0x240025f1e:0xf4d:0x0].0x0 bits 0x40/0x0 rrc: 59 type: IBT flags: 0x60000400000020 nid: 10.9.101.1@o2ib4 remote: 0x75d83f4b9f93b4ed expref: 2539 pid: 122639 timeout: 829825 lvb_type: 0 May 08 01:36:05 fir-md1-s2 kernel: LustreError: 138-a: fir-MDT0001: A client on nid 10.9.101.1@o2ib4 was evicted due to a lock blocking callback time out: rc -110 May 08 01:36:05 fir-md1-s2 kernel: LustreError: 121399:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 35s: evicting client at 10.9.101.1@o2ib4 ns: mdt-fir-MDT0001_UUID lock: ffff930fdd78cec0/0x1c35e9ee1ee1a355 lrc: 3/0,0 mode: PR/PR res: [0x240026187:0x6b7:0x0].0x0 bits 0x40/0x0 rrc: 64 type: IBT flags: 0x60000400000020 nid: 10.9.101.1@o2ib4 remote: 0x75d83f4b9f930b11 expref: 2540 pid: 122639 timeout: 0 lvb_type: 0 May 08 01:36:05 fir-md1-s2 kernel: LustreError: 122148:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) Skipped 1 previous similar message May 08 01:36:05 fir-md1-s2 kernel: LustreError: 121419:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff932eda703f00 x1632343346100928/t0(0) o104->fir-MDT0001@10.9.101.1@o2ib4:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 May 08 01:36:06 fir-md1-s2 kernel: LustreError: 122148:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff93363bacc800 x1632343346190864/t0(0) o104->fir-MDT0001@10.9.101.1@o2ib4:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 May 08 01:39:15 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 7dff8a6e-e3f6-696d-9545-c3ce3c471f1b (at 10.9.101.1@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93150ef20000, cur 1557304755 expire 1557304605 last 1557304528 May 08 01:40:31 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 1af90e45-c644-3e4c-21cf-a7be75ad0893 (at 10.8.23.14@o2ib6) in 182 seconds. I think it's dead, and I am evicting it. exp ffff932188b1c400, cur 1557304831 expire 1557304681 last 1557304649 May 08 01:41:16 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 1af90e45-c644-3e4c-21cf-a7be75ad0893 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93120c33f800, cur 1557304876 expire 1557304726 last 1557304649 May 08 01:41:19 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.23.14@o2ib6) May 08 01:41:19 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 08 01:53:30 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client d29c4427-36f7-2607-18d7-9a97ca711ca0 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930f4d306400, cur 1557305610 expire 1557305460 last 1557305383 May 08 01:53:58 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.23.14@o2ib6) May 08 01:53:58 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 08 02:05:44 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 0eb998ea-51bd-a0fa-2f8b-6090ae7a33f4 (at 10.9.101.23@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9315c335a400, cur 1557306344 expire 1557306194 last 1557306117 May 08 02:05:44 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 08 02:05:58 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 0eb998ea-51bd-a0fa-2f8b-6090ae7a33f4 (at 10.9.101.23@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93106a6c0400, cur 1557306358 expire 1557306208 last 1557306131 May 08 02:05:58 fir-md1-s2 kernel: Lustre: Skipped 2 previous similar messages May 08 02:06:10 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.101.1@o2ib4) May 08 02:06:10 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 08 02:06:40 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.23.14@o2ib6) May 08 02:06:40 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 08 02:07:00 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 150c717d-fe27-1f82-6705-4c7280878b47 (at 10.8.23.14@o2ib6) in 203 seconds. I think it's dead, and I am evicting it. exp ffff93196cbbd400, cur 1557306420 expire 1557306270 last 1557306217 May 08 02:07:00 fir-md1-s2 kernel: Lustre: Skipped 2 previous similar messages May 08 02:07:24 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 150c717d-fe27-1f82-6705-4c7280878b47 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930d98ad0000, cur 1557306444 expire 1557306294 last 1557306217 May 08 02:23:46 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.23.14@o2ib6) May 08 02:23:46 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 08 02:24:40 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 9d757871-4b65-c1c5-b90b-b7792929cc38 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930c87b08c00, cur 1557307480 expire 1557307330 last 1557307253 May 08 02:28:25 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client bde0178d-16e8-13c4-36ce-9a7655cac98f (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930fa83da400, cur 1557307705 expire 1557307555 last 1557307478 May 08 02:28:25 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 08 02:28:30 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.23.14@o2ib6) May 08 02:28:30 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 08 02:28:53 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.101.28@o2ib4) May 08 02:28:53 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 08 02:33:22 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 45c395c4-5d70-fd48-c52d-0035f6dfa5b4 (at 10.9.101.9@o2ib4) May 08 02:33:22 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 08 02:34:25 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client eed2a5c8-1fbf-9380-11c4-6cc54c07792d (at 10.8.1.18@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9315e43dd800, cur 1557308065 expire 1557307915 last 1557307838 May 08 02:34:25 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 08 02:37:41 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 0eb998ea-51bd-a0fa-2f8b-6090ae7a33f4 (at 10.9.101.23@o2ib4) May 08 02:37:41 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 08 02:37:46 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 80425c46-79e7-b0ae-c3b8-a93f00e15555 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931fe2ad5400, cur 1557308266 expire 1557308116 last 1557308039 May 08 02:37:46 fir-md1-s2 kernel: Lustre: Skipped 13 previous similar messages May 08 02:38:01 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.23.14@o2ib6) May 08 02:38:01 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 08 03:01:47 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.1.6@o2ib6) May 08 03:01:47 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 08 03:01:53 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.1.12@o2ib6) May 08 03:01:53 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 08 03:02:02 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.1.18@o2ib6) May 08 03:02:02 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 08 03:02:21 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.1.4@o2ib6) May 08 03:02:21 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 08 03:08:30 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.1.13@o2ib6) May 08 03:08:30 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 08 03:45:22 fir-md1-s2 kernel: Lustre: 122147:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557312315/real 1557312315] req@ffff9336e4bb3f00 x1632344714044240/t0(0) o104->fir-MDT0001@10.9.108.49@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1557312322 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 08 03:45:22 fir-md1-s2 kernel: Lustre: 122147:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 25 previous similar messages May 08 03:45:29 fir-md1-s2 kernel: Lustre: 122147:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557312322/real 1557312322] req@ffff9336e4bb3f00 x1632344714044240/t0(0) o104->fir-MDT0001@10.9.108.49@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1557312329 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 08 03:45:30 fir-md1-s2 kernel: Lustre: 122706:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff9336eabb6300 x1632890082914208/t0(0) o36->3737afa2-fa03-6f36-4f5c-9c984231c7f1@10.9.101.37@o2ib4:5/0 lens 496/448 e 1 to 0 dl 1557312335 ref 2 fl Interpret:/0/0 rc 0/0 May 08 03:45:36 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 3737afa2-fa03-6f36-4f5c-9c984231c7f1 (at 10.9.101.37@o2ib4) reconnecting May 08 03:45:36 fir-md1-s2 kernel: Lustre: Skipped 2 previous similar messages May 08 03:45:36 fir-md1-s2 kernel: Lustre: 122147:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557312329/real 1557312329] req@ffff9336e4bb3f00 x1632344714044240/t0(0) o104->fir-MDT0001@10.9.108.49@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1557312336 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 08 03:45:36 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 90cbff3a-0d00-0704-aee2-84ded66fce8f (at 10.9.101.37@o2ib4) May 08 03:45:36 fir-md1-s2 kernel: Lustre: Skipped 2 previous similar messages May 08 03:45:37 fir-md1-s2 kernel: Lustre: 122048:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff930f98203c00 x1631667855827136/t0(0) o101->9c0c610d-0561-9b9a-98c4-0ad0384caf27@10.9.101.51@o2ib4:12/0 lens 576/3264 e 1 to 0 dl 1557312342 ref 2 fl Interpret:/0/0 rc 0/0 May 08 03:45:43 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 9c0c610d-0561-9b9a-98c4-0ad0384caf27 (at 10.9.101.51@o2ib4) reconnecting May 08 03:45:43 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.101.51@o2ib4) May 08 03:45:50 fir-md1-s2 kernel: Lustre: 122147:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557312343/real 1557312343] req@ffff9336e4bb3f00 x1632344714044240/t0(0) o104->fir-MDT0001@10.9.108.49@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1557312350 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 08 03:45:50 fir-md1-s2 kernel: Lustre: 122147:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message May 08 03:45:50 fir-md1-s2 kernel: LustreError: 122147:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.9.108.49@o2ib4) failed to reply to blocking AST (req@ffff9336e4bb3f00 x1632344714044240 status 0 rc -110), evict it ns: mdt-fir-MDT0001_UUID lock: ffff93405c212d00/0x1c35e9ee1e2a8c20 lrc: 4/0,0 mode: PR/PR res: [0x240006115:0x78ba:0x0].0x0 bits 0x13/0x0 rrc: 4 type: IBT flags: 0x60200400000020 nid: 10.9.108.49@o2ib4 remote: 0x2aa6630975ea7502 expref: 30 pid: 122235 timeout: 837610 lvb_type: 0 May 08 03:45:50 fir-md1-s2 kernel: LustreError: 138-a: fir-MDT0001: A client on nid 10.9.108.49@o2ib4 was evicted due to a lock blocking callback time out: rc -110 May 08 03:45:50 fir-md1-s2 kernel: LustreError: Skipped 1 previous similar message May 08 03:45:50 fir-md1-s2 kernel: LustreError: 121399:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 35s: evicting client at 10.9.108.49@o2ib4 ns: mdt-fir-MDT0001_UUID lock: ffff93405c212d00/0x1c35e9ee1e2a8c20 lrc: 3/0,0 mode: PR/PR res: [0x240006115:0x78ba:0x0].0x0 bits 0x13/0x0 rrc: 4 type: IBT flags: 0x60200400000020 nid: 10.9.108.49@o2ib4 remote: 0x2aa6630975ea7502 expref: 31 pid: 122235 timeout: 0 lvb_type: 0 May 08 03:46:23 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client b23f379f-72a3-1922-6d02-f29474e14fa4 (at 10.9.108.49@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93207c39fc00, cur 1557312383 expire 1557312233 last 1557312156 May 08 03:46:23 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 08 04:16:43 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.108.49@o2ib4) May 08 04:16:43 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 08 04:18:09 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.23.14@o2ib6) May 08 04:18:09 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 08 04:18:51 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 10ab4e14-a395-92da-8319-8f900ac08944 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9316e96a0400, cur 1557314331 expire 1557314181 last 1557314104 May 08 04:19:05 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 10ab4e14-a395-92da-8319-8f900ac08944 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff932e51cd1400, cur 1557314345 expire 1557314195 last 1557314118 May 08 04:42:33 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 4f066b5e-4fb2-edc0-59a3-4bb970d01f67 (at 10.9.114.5@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff932047385c00, cur 1557315753 expire 1557315603 last 1557315526 May 08 04:46:15 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.114.5@o2ib4) May 08 04:46:15 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 08 05:05:59 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client ce0f7007-d505-ed89-0b36-6f859b7c8ef2 (at 10.9.114.5@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930676fdc000, cur 1557317159 expire 1557317009 last 1557316932 May 08 05:05:59 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 08 05:06:06 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client ce0f7007-d505-ed89-0b36-6f859b7c8ef2 (at 10.9.114.5@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93136ca49000, cur 1557317166 expire 1557317016 last 1557316939 May 08 05:09:28 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.114.5@o2ib4) May 08 05:09:28 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 08 07:11:27 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 195c8595-9c38-b754-9d29-9f4dc81cde91 (at 10.9.101.35@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931f3bffb800, cur 1557324687 expire 1557324537 last 1557324460 May 08 07:11:38 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 195c8595-9c38-b754-9d29-9f4dc81cde91 (at 10.9.101.35@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9310738ec800, cur 1557324698 expire 1557324548 last 1557324471 May 08 07:20:48 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 4733b776-dee5-b68f-afc7-95bff939ffcb (at 10.9.102.70@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931b47377800, cur 1557325248 expire 1557325098 last 1557325021 May 08 07:39:20 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 195c8595-9c38-b754-9d29-9f4dc81cde91 (at 10.9.101.35@o2ib4) May 08 07:39:20 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 08 07:45:36 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.102.70@o2ib4) May 08 07:45:36 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 08 08:01:19 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 596fa154-8fdb-18a8-fa7d-8129544c0d55 (at 10.8.14.2@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931c98238c00, cur 1557327679 expire 1557327529 last 1557327452 May 08 08:01:19 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 08 08:21:35 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 94ff073b-78cb-a7d4-d32c-307d83b6df33 (at 10.8.23.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9315e46f5400, cur 1557328895 expire 1557328745 last 1557328668 May 08 08:21:35 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 08 08:27:18 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 47468953-81c7-2310-9c7c-4b0493347857 (at 10.9.114.5@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93168f3c1800, cur 1557329238 expire 1557329088 last 1557329011 May 08 08:27:18 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 08 08:36:37 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.114.5@o2ib4) May 08 08:36:37 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 08 08:49:10 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.23.9@o2ib6) May 08 08:49:10 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 08 09:15:09 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 0799f8ec-71a3-852c-92f0-8f09c24981f0 (at 10.9.114.5@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93218d665c00, cur 1557332109 expire 1557331959 last 1557331882 May 08 09:15:09 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 08 09:24:38 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.114.5@o2ib4) May 08 09:24:38 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 08 09:40:35 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 6843ac6f-804b-c34b-714d-77bb2b66baeb (at 10.9.114.5@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930e5ce16c00, cur 1557333635 expire 1557333485 last 1557333408 May 08 09:40:35 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 08 09:49:23 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.114.5@o2ib4) May 08 09:49:23 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 08 10:05:45 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 8c4b439a-dcfe-1ea3-7954-703f216b7103 (at 10.9.114.5@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff933446af5000, cur 1557335145 expire 1557334995 last 1557334918 May 08 10:05:45 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 08 10:06:04 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 8c4b439a-dcfe-1ea3-7954-703f216b7103 (at 10.9.114.5@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93005a9f2800, cur 1557335164 expire 1557335014 last 1557334937 May 08 10:10:53 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.114.5@o2ib4) May 08 10:10:53 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 08 10:30:38 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client c6da3458-72ef-a673-d8a4-96c49b3c1790 (at 10.8.1.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93263124ec00, cur 1557336638 expire 1557336488 last 1557336411 May 08 10:31:06 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.23.14@o2ib6) May 08 10:31:06 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 08 10:33:42 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.108.43@o2ib4) May 08 10:33:42 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 08 10:37:39 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.109.7@o2ib4) May 08 10:37:39 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 08 10:38:12 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 4c375d9c-371c-6c72-a23b-22f94282d1a6 (at 10.9.108.4@o2ib4) May 08 10:38:12 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 08 10:39:59 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 66e1a9ab-9220-f034-4a7b-9cb08b8b8802 (at 10.9.103.40@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93207c63ec00, cur 1557337199 expire 1557337049 last 1557336972 May 08 10:39:59 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 08 10:48:44 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 6438e03d-82a5-e48c-d12e-59dbfd034770 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931f70663c00, cur 1557337724 expire 1557337574 last 1557337497 May 08 10:48:44 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 08 10:49:06 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.23.14@o2ib6) May 08 10:49:06 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 08 10:52:06 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 3307997c-545e-5ef6-bb14-9c53dab7feb6 (at 10.9.105.3@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93273ba52400, cur 1557337926 expire 1557337776 last 1557337699 May 08 10:52:06 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 08 10:53:22 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 81ba03cb-100f-75de-7e7c-84a709257c80 (at 10.9.108.7@o2ib4) in 224 seconds. I think it's dead, and I am evicting it. exp ffff9315e42f6000, cur 1557338002 expire 1557337852 last 1557337778 May 08 10:53:22 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 08 10:56:41 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 76f46682-dce3-bb75-00b0-3db97b46fa9d (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff931b227dfc00, cur 1557338201 expire 1557338051 last 1557337974 May 08 10:56:41 fir-md1-s2 kernel: Lustre: Skipped 5 previous similar messages May 08 10:56:59 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.23.14@o2ib6) May 08 10:56:59 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 08 11:01:22 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.1.4@o2ib6) May 08 11:01:22 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 08 11:07:30 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 7a320afd-28b5-793f-1e06-c4366476cc67 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93075d61dc00, cur 1557338850 expire 1557338700 last 1557338623 May 08 11:07:30 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 08 11:07:36 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 7a320afd-28b5-793f-1e06-c4366476cc67 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff932ddc2acc00, cur 1557338856 expire 1557338706 last 1557338629 May 08 11:07:43 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.103.40@o2ib4) May 08 11:07:43 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 08 11:07:50 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.23.14@o2ib6) May 08 11:07:50 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 08 11:18:46 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client d961e6dd-82a1-6a6a-5a23-8556e5335ccd (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff932e17635c00, cur 1557339526 expire 1557339376 last 1557339299 May 08 11:19:18 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.23.14@o2ib6) May 08 11:19:18 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 08 11:20:24 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.105.3@o2ib4) May 08 11:20:24 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 08 11:22:27 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client dcabd621-66cd-08f1-3eb7-174dbf08b0d9 (at 10.8.1.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9311af334400, cur 1557339747 expire 1557339597 last 1557339520 May 08 11:22:27 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 08 11:22:55 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.108.7@o2ib4) May 08 11:22:55 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 08 11:23:12 fir-md1-s2 kernel: Lustre: 122151:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557339785/real 1557339785] req@ffff933015a70c00 x1632349497905776/t0(0) o104->fir-MDT0001@10.9.101.29@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1557339792 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 08 11:23:12 fir-md1-s2 kernel: Lustre: 122151:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message May 08 11:23:15 fir-md1-s2 kernel: Lustre: 122605:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557339788/real 1557339788] req@ffff931abaaa1500 x1632349498264000/t0(0) o104->fir-MDT0001@10.9.101.29@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1557339795 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 08 11:23:15 fir-md1-s2 kernel: Lustre: 122605:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages May 08 11:23:20 fir-md1-s2 kernel: Lustre: 122305:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff931895f0b600 x1631639087993648/t0(0) o101->97b55247-502a-4b0e-6bc7-38b7d7a6fbce@10.9.101.31@o2ib4:25/0 lens 480/568 e 1 to 0 dl 1557339805 ref 2 fl Interpret:/0/0 rc 0/0 May 08 11:23:20 fir-md1-s2 kernel: Lustre: 122156:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557339793/real 1557339793] req@ffff933993f20300 x1632349498020192/t0(0) o104->fir-MDT0001@10.9.101.29@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1557339800 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 08 11:23:20 fir-md1-s2 kernel: Lustre: 122156:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 9 previous similar messages May 08 11:23:21 fir-md1-s2 kernel: Lustre: 122008:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff9306002b7200 x1631595800068016/t0(0) o101->c429ac87-0af5-acec-8a40-5e6c2e99ccb1@10.9.101.8@o2ib4:25/0 lens 480/568 e 1 to 0 dl 1557339805 ref 2 fl Interpret:/0/0 rc 0/0 May 08 11:23:21 fir-md1-s2 kernel: Lustre: 122008:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message May 08 11:23:22 fir-md1-s2 kernel: Lustre: 122289:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff931229297500 x1631715114243536/t0(0) o101->9101e47c-5087-9ebf-bb20-6ff2bf817bf0@10.9.101.32@o2ib4:27/0 lens 480/568 e 1 to 0 dl 1557339807 ref 2 fl Interpret:/0/0 rc 0/0 May 08 11:23:22 fir-md1-s2 kernel: Lustre: 122289:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message May 08 11:23:24 fir-md1-s2 kernel: Lustre: 122023:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff932033dcb900 x1631552673515536/t0(0) o101->fb8f22c1-ceb3-fa39-aea4-695a494d32c5@10.9.101.26@o2ib4:29/0 lens 480/568 e 1 to 0 dl 1557339809 ref 2 fl Interpret:/0/0 rc 0/0 May 08 11:23:24 fir-md1-s2 kernel: Lustre: 122023:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages May 08 11:23:26 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client c429ac87-0af5-acec-8a40-5e6c2e99ccb1 (at 10.9.101.8@o2ib4) reconnecting May 08 11:23:26 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to c429ac87-0af5-acec-8a40-5e6c2e99ccb1 (at 10.9.101.8@o2ib4) May 08 11:23:26 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 08 11:23:27 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client e891cc28-9c10-be1b-29fe-00592513d891 (at 10.9.101.41@o2ib4) reconnecting May 08 11:23:27 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 08 11:23:28 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 9101e47c-5087-9ebf-bb20-6ff2bf817bf0 (at 10.9.101.32@o2ib4) reconnecting May 08 11:23:28 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 08 11:23:29 fir-md1-s2 kernel: Lustre: 122867:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff933fea400c00 x1631549646312784/t0(0) o101->ff5adeab-1158-6b77-bf0c-cf0414b48364@10.9.101.55@o2ib4:4/0 lens 480/568 e 1 to 0 dl 1557339814 ref 2 fl Interpret:/0/0 rc 0/0 May 08 11:23:29 fir-md1-s2 kernel: Lustre: 122867:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 3 previous similar messages May 08 11:23:30 fir-md1-s2 kernel: Lustre: 122654:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557339803/real 1557339803] req@ffff931b80f8bc00 x1632349499245136/t0(0) o104->fir-MDT0001@10.9.101.29@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1557339810 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 08 11:23:30 fir-md1-s2 kernel: Lustre: 122654:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 32 previous similar messages May 08 11:23:30 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client fb8f22c1-ceb3-fa39-aea4-695a494d32c5 (at 10.9.101.26@o2ib4) reconnecting May 08 11:23:30 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages May 08 11:23:35 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client ff5adeab-1158-6b77-bf0c-cf0414b48364 (at 10.9.101.55@o2ib4) reconnecting May 08 11:23:35 fir-md1-s2 kernel: Lustre: Skipped 2 previous similar messages May 08 11:23:35 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.101.55@o2ib4) May 08 11:23:35 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages May 08 11:23:38 fir-md1-s2 kernel: Lustre: 122064:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff9317f2fa7200 x1631671259387776/t0(0) o101->42a1eb99-e9c5-e517-736c-3ba8980aeef3@10.9.101.25@o2ib4:13/0 lens 480/568 e 1 to 0 dl 1557339823 ref 2 fl Interpret:/0/0 rc 0/0 May 08 11:23:38 fir-md1-s2 kernel: Lustre: 122064:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 4 previous similar messages May 08 11:23:40 fir-md1-s2 kernel: LustreError: 122073:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.9.101.29@o2ib4) failed to reply to blocking AST (req@ffff9318da617200 x1632349497908928 status 0 rc -110), evict it ns: mdt-fir-MDT0001_UUID lock: ffff932a11bc6c00/0x1c35e9ee4f44bc6e lrc: 4/0,0 mode: PR/PR res: [0x240026189:0xba9:0x0].0x0 bits 0x40/0x0 rrc: 62 type: IBT flags: 0x60000400000020 nid: 10.9.101.29@o2ib4 remote: 0xa204eb879b9dd9a1 expref: 2059 pid: 122284 timeout: 865081 lvb_type: 0 May 08 11:23:40 fir-md1-s2 kernel: LustreError: 138-a: fir-MDT0001: A client on nid 10.9.101.29@o2ib4 was evicted due to a lock blocking callback time out: rc -110 May 08 11:23:40 fir-md1-s2 kernel: LustreError: 121399:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 35s: evicting client at 10.9.101.29@o2ib4 ns: mdt-fir-MDT0001_UUID lock: ffff93301da42640/0x1c35e9ee4f44ae59 lrc: 3/0,0 mode: PR/PR res: [0x240025746:0x8160:0x0].0x0 bits 0x40/0x0 rrc: 62 type: IBT flags: 0x60000400000020 nid: 10.9.101.29@o2ib4 remote: 0xa204eb879b9da345 expref: 2060 pid: 122048 timeout: 0 lvb_type: 0 May 08 11:23:40 fir-md1-s2 kernel: LustreError: 122073:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) Skipped 1 previous similar message May 08 11:23:43 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 98099f76-bead-25a3-00a0-0c1f8bda0881 (at 10.9.106.27@o2ib4) in 208 seconds. I think it's dead, and I am evicting it. exp ffff9320788f5800, cur 1557339823 expire 1557339673 last 1557339615 May 08 11:23:43 fir-md1-s2 kernel: Lustre: Skipped 31 previous similar messages May 08 11:26:02 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.1.9@o2ib6) May 08 11:26:02 fir-md1-s2 kernel: Lustre: Skipped 7 previous similar messages May 08 11:26:34 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 773c3f88-b988-c40f-0857-260c7cbe8aa4 (at 10.9.101.29@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff930712b38000, cur 1557339994 expire 1557339844 last 1557339767 May 08 11:26:34 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 08 11:30:50 fir-md1-s2 kernel: LustreError: 121996:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.23.14@o2ib6) returned error from blocking AST (req@ffff93060eaa8000 x1632349562187952 status -107 rc -107), evict it ns: mdt-fir-MDT0003_UUID lock: ffff931d7f350240/0x1c35e9ee4ee44723 lrc: 4/0,0 mode: PR/PR res: [0x280000401:0x5:0x0].0x0 bits 0x13/0x0 rrc: 912 type: IBT flags: 0x60200400000020 nid: 10.8.23.14@o2ib6 remote: 0x4466d9bcc3c87c8f expref: 6 pid: 121577 timeout: 865638 lvb_type: 0 May 08 11:30:50 fir-md1-s2 kernel: LustreError: 138-a: fir-MDT0003: A client on nid 10.8.23.14@o2ib6 was evicted due to a lock blocking callback time out: rc -107 May 08 11:30:50 fir-md1-s2 kernel: LustreError: Skipped 1 previous similar message May 08 11:30:50 fir-md1-s2 kernel: LustreError: 121399:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 0s: evicting client at 10.8.23.14@o2ib6 ns: mdt-fir-MDT0003_UUID lock: ffff931d7f350240/0x1c35e9ee4ee44723 lrc: 3/0,0 mode: PR/PR res: [0x280000401:0x5:0x0].0x0 bits 0x13/0x0 rrc: 1005 type: IBT flags: 0x60200400000020 nid: 10.8.23.14@o2ib6 remote: 0x4466d9bcc3c87c8f expref: 7 pid: 121577 timeout: 0 lvb_type: 0 May 08 11:31:01 fir-md1-s2 kernel: Lustre: 121996:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557340250/real 1557340250] req@ffff93060eaacb00 x1632349562187936/t0(0) o104->fir-MDT0003@10.9.105.3@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1557340261 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 08 11:31:01 fir-md1-s2 kernel: Lustre: 121996:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 50 previous similar messages May 08 11:31:15 fir-md1-s2 kernel: Lustre: 122271:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff933e88768300 x1631673550595792/t0(0) o101->ac39260b-aa99-2050-abe9-06dcf51e927b@10.9.101.47@o2ib4:20/0 lens 576/3264 e 0 to 0 dl 1557340280 ref 2 fl Interpret:/0/0 rc 0/0 May 08 11:31:15 fir-md1-s2 kernel: Lustre: 122271:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 216 previous similar messages May 08 11:31:21 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 74b14fbc-b3cc-e771-44b0-23517ef5c46c (at 10.9.0.1@o2ib4) reconnecting May 08 11:31:21 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages May 08 11:31:21 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to (at 10.9.0.1@o2ib4) May 08 11:31:25 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 8d951c4e-09d2-9a6e-d7d2-479856ebd844 (at 10.8.23.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9315f5664c00, cur 1557340285 expire 1557340135 last 1557340058 May 08 11:31:37 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 354991dd-8925-b813-15e6-a487911db4c8 (at 10.9.115.6@o2ib4) reconnecting May 08 11:31:37 fir-md1-s2 kernel: Lustre: Skipped 205 previous similar messages May 08 11:31:45 fir-md1-s2 kernel: Lustre: 121996:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557340294/real 1557340294] req@ffff93060eaacb00 x1632349562187936/t0(0) o104->fir-MDT0003@10.9.105.3@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1557340305 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 May 08 11:31:45 fir-md1-s2 kernel: Lustre: 121996:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages May 08 11:31:47 fir-md1-s2 kernel: Lustre: 122033:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff932efa7aa100 x1631535342269568/t0(0) o101->400d6bb2-cc30-d980-7d8b-e0cf4a3a30a0@10.9.107.21@o2ib4:22/0 lens 584/0 e 0 to 0 dl 1557340312 ref 2 fl New:/2/ffffffff rc 0/-1 May 08 11:31:47 fir-md1-s2 kernel: Lustre: 122033:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 557 previous similar messages May 08 11:32:09 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client e0767d77-866c-9038-3794-0af657e399d1 (at 10.8.8.22@o2ib6) reconnecting May 08 11:32:09 fir-md1-s2 kernel: Lustre: Skipped 332 previous similar messages May 08 11:32:20 fir-md1-s2 kernel: LustreError: 122141:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1557340250, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff93202b3e5e80/0x1c35e9ee50437255 lrc: 3/1,0 mode: --/PR res: [0x280000401:0x5:0x0].0x0 bits 0x13/0x0 rrc: 881 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 122141 timeout: 0 lvb_type: 0 May 08 11:32:20 fir-md1-s2 kernel: LustreError: 122141:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 22 previous similar messages May 08 11:32:21 fir-md1-s2 kernel: LustreError: 122692:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1557340251, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff9307d322c5c0/0x1c35e9ee50439085 lrc: 3/1,0 mode: --/PR res: [0x280000401:0x5:0x0].0x0 bits 0x13/0x0 rrc: 881 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 122692 timeout: 0 lvb_type: 0 May 08 11:32:21 fir-md1-s2 kernel: LustreError: 122692:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 269 previous similar messages May 08 11:32:22 fir-md1-s2 kernel: LustreError: 122082:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1557340252, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff932e35a0f980/0x1c35e9ee50439276 lrc: 3/1,0 mode: --/PR res: [0x280000401:0x5:0x0].0x0 bits 0x13/0x0 rrc: 881 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 122082 timeout: 0 lvb_type: 0 May 08 11:32:22 fir-md1-s2 kernel: LustreError: 122082:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 15 previous similar messages May 08 11:32:24 fir-md1-s2 kernel: LustreError: 122284:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1557340254, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff932c5260e0c0/0x1c35e9ee5043938e lrc: 3/1,0 mode: --/PR res: [0x280000401:0x5:0x0].0x0 bits 0x13/0x0 rrc: 881 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 122284 timeout: 0 lvb_type: 0 May 08 11:32:24 fir-md1-s2 kernel: LustreError: 122284:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 9 previous similar messages May 08 11:32:25 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to (at 10.9.109.4@o2ib4) May 08 11:32:25 fir-md1-s2 kernel: Lustre: Skipped 767 previous similar messages May 08 11:32:28 fir-md1-s2 kernel: LustreError: 122288:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1557340258, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff933f7c2e98c0/0x1c35e9ee50439421 lrc: 3/1,0 mode: --/PR res: [0x280000401:0x5:0x0].0x0 bits 0x13/0x0 rrc: 881 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 122288 timeout: 0 lvb_type: 0 May 08 11:32:28 fir-md1-s2 kernel: LustreError: 122288:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 5 previous similar messages May 08 11:32:40 fir-md1-s2 kernel: LustreError: 122333:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1557340270, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff932f6cf81200/0x1c35e9ee5043950f lrc: 3/1,0 mode: --/PR res: [0x280000401:0x5:0x0].0x0 bits 0x13/0x0 rrc: 881 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 122333 timeout: 0 lvb_type: 0 May 08 11:32:40 fir-md1-s2 kernel: LustreError: 122333:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 13 previous similar messages May 08 11:32:51 fir-md1-s2 kernel: Lustre: 122033:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff931b36b1b600 x1631544107046928/t0(0) o101->acb643ef-75ad-6f92-b388-57634462f54f@10.8.28.6@o2ib6:26/0 lens 576/0 e 0 to 0 dl 1557340376 ref 2 fl New:/2/ffffffff rc 0/-1 May 08 11:32:51 fir-md1-s2 kernel: Lustre: 122033:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1017 previous similar messages May 08 11:32:58 fir-md1-s2 kernel: LustreError: 122341:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1557340288, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff933c076dc800/0x1c35e9ee5043972a lrc: 3/1,0 mode: --/PR res: [0x280000401:0x5:0x0].0x0 bits 0x13/0x0 rrc: 881 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 122341 timeout: 0 lvb_type: 0 May 08 11:32:58 fir-md1-s2 kernel: LustreError: 122341:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 17 previous similar messages May 08 11:33:00 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client e3964c02-b0e5-74a3-d6d2-94acff5fcdaa (at 10.9.105.3@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9336cefb0c00, cur 1557340380 expire 1557340230 last 1557340153 May 08 11:33:00 fir-md1-s2 kernel: Lustre: 76531:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:33s); client may timeout. req@ffff9333dc216f00 x1631569702173568/t0(0) o101->7cdb2556-bbe1-0eaf-913d-36ae92180c70@10.9.112.2@o2ib4:27/0 lens 568/0 e 0 to 0 dl 1557340347 ref 1 fl Interpret:/2/ffffffff rc 0/-1 May 08 11:33:00 fir-md1-s2 kernel: LustreError: 121422:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.9.109.19@o2ib4: deadline 30:1s ago req@ffff9333d9e2b300 x1632921135101600/t0(0) o101->50db062e-4c58-ecba-348f-9f830e135d8c@10.9.109.19@o2ib4:29/0 lens 576/0 e 0 to 0 dl 1557340379 ref 1 fl Interpret:/2/ffffffff rc 0/-1 May 08 11:33:00 fir-md1-s2 kernel: Lustre: 76531:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 1113 previous similar messages May 08 11:33:13 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 2027e649-8bcd-4ca1-6dcb-dd11dcd45e21 (at 10.9.101.17@o2ib4) reconnecting May 08 11:33:13 fir-md1-s2 kernel: Lustre: Skipped 735 previous similar messages May 08 11:34:05 fir-md1-s2 kernel: Lustre: 122646:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1557340438/real 1557340438] req@ffff9305f73b6f00 x1632349590472080/t0(0) o104->fir-MDT0003@10.8.11.9@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1557340445 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 May 08 11:34:05 fir-md1-s2 kernel: Lustre: 122646:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 6 previous similar messages May 08 11:34:33 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to (at 10.8.1.30@o2ib6) May 08 11:34:33 fir-md1-s2 kernel: Lustre: Skipped 698 previous similar messages May 08 11:35:00 fir-md1-s2 kernel: Lustre: 122255:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff931d78630f00 x1631793796001744/t0(0) o101->07174424-87a6-1756-764e-10e7f32ab3b2@10.8.23.36@o2ib6:5/0 lens 576/0 e 0 to 0 dl 1557340505 ref 2 fl New:/0/ffffffff rc 0/-1 May 08 11:35:00 fir-md1-s2 kernel: Lustre: 122255:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 798 previous similar messages May 08 11:35:21 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 5924c705-ac90-422d-3e46-a0ea5d70203c (at 10.9.102.26@o2ib4) reconnecting May 08 11:35:21 fir-md1-s2 kernel: Lustre: Skipped 613 previous similar messages May 08 11:35:28 fir-md1-s2 kernel: LustreError: 122052:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1557340438, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0003_UUID lock: ffff931937ed45c0/0x1c35e9ee5072f625 lrc: 3/1,0 mode: --/PR res: [0x280000401:0x5:0x0].0x0 bits 0x13/0x0 rrc: 888 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 122052 timeout: 0 lvb_type: 0 May 08 11:35:28 fir-md1-s2 kernel: LustreError: 122052:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 15 previous similar messages May 08 11:35:33 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 52dc2b20-3bed-6e90-ce23-8012b240d434 (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9327e3e73400, cur 1557340533 expire 1557340383 last 1557340306 May 08 11:35:33 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message May 08 11:35:39 fir-md1-s2 kernel: Lustre: 121638:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:47s); client may timeout. req@ffff93061ab96600 x1632107013191232/t0(0) o53->6b6c38b2-4fd6-8d37-9524-d4553bfeb828@10.0.10.3@o2ib7:22/0 lens 304/0 e 0 to 0 dl 1557340492 ref 1 fl Interpret:/0/ffffffff rc 0/-1 May 08 11:35:39 fir-md1-s2 kernel: LustreError: 122874:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.9.109.10@o2ib4: deadline 30:1s ago req@ffff932be5a95400 x1631575299616496/t0(0) o101->17486c01-5f76-5daa-5bb1-3ab560a54d66@10.9.109.10@o2ib4:8/0 lens 576/0 e 0 to 0 dl 1557340538 ref 1 fl Interpret:/2/ffffffff rc 0/-1 May 08 11:35:39 fir-md1-s2 kernel: LustreError: 122690:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.9.103.16@o2ib4: deadline 30:1s ago req@ffff932be5a95a00 x1631534913968512/t0(0) o101->65863bd2-5bf4-3857-2c85-73178bef5ac4@10.9.103.16@o2ib4:8/0 lens 584/0 e 0 to 0 dl 1557340538 ref 1 fl Interpret:/0/ffffffff rc 0/-1 May 08 11:35:39 fir-md1-s2 kernel: LustreError: 122874:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 13 previous similar messages May 08 11:35:39 fir-md1-s2 kernel: LustreError: 122690:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 14 previous similar messages May 08 11:35:39 fir-md1-s2 kernel: Lustre: 121638:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 456 previous similar messages May 08 11:41:25 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client c09cfba5-6ff8-041f-f460-5b8951b8da80 (at 10.9.103.43@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff93207c63cc00, cur 1557340885 expire 1557340735 last 1557340658 May 08 11:41:25 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message