Feb 24 03:48:32 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.26.5@o2ib6) Feb 24 03:48:32 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 24 03:59:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 6fccb913-f8e6-2056-033a-4c02e0e89d4f (at 10.9.103.6@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb4f6154800, cur 1551009587 expire 1551009437 last 1551009360 Feb 24 03:59:47 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 24 04:08:34 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.106.31@o2ib4) Feb 24 04:08:34 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 04:09:19 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.106.46@o2ib4) Feb 24 04:09:19 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 04:36:20 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d848ca64-83ac-6efd-99c9-6a51e07c0fe4 (at 10.9.103.6@o2ib4) Feb 24 04:36:20 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 04:46:57 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.18.31@o2ib6) Feb 24 04:46:57 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 04:47:22 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 2179abbe-1cae-a641-7540-5ea835229e4e (at 10.8.18.31@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cc3306b1400, cur 1551012442 expire 1551012292 last 1551012215 Feb 24 04:47:22 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 04:52:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 5f797900-3e70-d5b1-6f40-9a3fbbdfbe67 (at 10.8.18.31@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9caa8669f000, cur 1551012722 expire 1551012572 last 1551012495 Feb 24 04:52:02 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 04:52:06 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.18.31@o2ib6) Feb 24 04:52:06 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 05:04:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client ed2d79f5-7b8f-9ba3-ea86-1acdd4ac5241 (at 10.8.18.31@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca18249d000, cur 1551013457 expire 1551013307 last 1551013230 Feb 24 05:04:17 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 05:05:04 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.18.31@o2ib6) Feb 24 05:05:04 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 05:05:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client bde58792-3602-962e-df58-c34b9dbd9136 (at 10.9.101.64@o2ib4) in 182 seconds. I think it's dead, and I am evicting it. exp ffff9ca451200800, cur 1551013533 expire 1551013383 last 1551013351 Feb 24 05:05:33 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 05:06:18 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 62f4cede-cb22-5a53-489f-f0659d32a4ca (at 10.9.101.64@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c984b6d8000, cur 1551013578 expire 1551013428 last 1551013351 Feb 24 05:06:18 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 24 05:14:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 4c6307e0-e5e3-3295-9e75-ac7ce5c5822c (at 10.9.104.62@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c951845c400, cur 1551014073 expire 1551013923 last 1551013846 Feb 24 05:20:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client b3923a77-9e45-3833-8aa3-c4d76a48c186 (at 10.8.18.31@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cbb81f0f400, cur 1551014411 expire 1551014261 last 1551014184 Feb 24 05:20:11 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 05:22:07 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.18.31@o2ib6) Feb 24 05:22:07 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 05:25:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 3372d8ae-4a24-9eb1-fcd7-7dd890b14dc0 (at 10.8.18.35@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb0b6ed6400, cur 1551014748 expire 1551014598 last 1551014521 Feb 24 05:25:48 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 05:27:39 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 252849ea-4610-daa4-240b-4bf6ea8ff8a2 (at 10.8.18.35@o2ib6) Feb 24 05:27:39 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 05:31:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 88d745c2-8792-e718-77da-e1b0f185d9b0 (at 10.8.18.31@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cae2c3f5000, cur 1551015083 expire 1551014933 last 1551014856 Feb 24 05:31:23 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 05:32:09 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.18.31@o2ib6) Feb 24 05:32:09 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 05:35:56 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 5f272c19-9e37-1149-e1f3-c16175864071 (at 10.8.18.31@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca687f8a400, cur 1551015356 expire 1551015206 last 1551015129 Feb 24 05:35:56 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 05:36:14 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.18.31@o2ib6) Feb 24 05:36:14 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 05:40:50 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.101.64@o2ib4) Feb 24 05:40:50 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 05:44:08 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.18.31@o2ib6) Feb 24 05:44:08 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 05:44:28 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client bf856d5d-9774-0e02-d0af-8d83912c24c3 (at 10.9.101.61@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c982ebf1800, cur 1551015868 expire 1551015718 last 1551015641 Feb 24 05:44:28 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 05:47:18 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 252849ea-4610-daa4-240b-4bf6ea8ff8a2 (at 10.8.18.35@o2ib6) Feb 24 05:47:18 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 05:50:08 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 4c6307e0-e5e3-3295-9e75-ac7ce5c5822c (at 10.9.104.62@o2ib4) Feb 24 05:50:08 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 05:51:48 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.18.31@o2ib6) Feb 24 05:51:48 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 05:54:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 6880ca8c-0ab1-e4af-8e15-a4cb158645a5 (at 10.8.18.35@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9caf4be68000, cur 1551016469 expire 1551016319 last 1551016242 Feb 24 05:54:29 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Feb 24 05:56:27 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 252849ea-4610-daa4-240b-4bf6ea8ff8a2 (at 10.8.18.35@o2ib6) Feb 24 05:56:27 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 06:01:59 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 252849ea-4610-daa4-240b-4bf6ea8ff8a2 (at 10.8.18.35@o2ib6) Feb 24 06:01:59 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 06:05:10 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 89f1b0a4-56cf-2135-528b-8e0618cba428 (at 10.9.103.25@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c9ea914d800, cur 1551017110 expire 1551016960 last 1551016883 Feb 24 06:05:10 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Feb 24 06:15:56 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 252849ea-4610-daa4-240b-4bf6ea8ff8a2 (at 10.8.18.35@o2ib6) Feb 24 06:15:56 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 06:19:43 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client a86da004-5aa9-5ec4-75a8-754a690b81bd (at 10.8.18.35@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cd184e66c00, cur 1551017983 expire 1551017833 last 1551017756 Feb 24 06:19:43 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Feb 24 06:27:27 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 759159fb-eae0-c8e8-e6a7-c145abf4538d (at 10.9.107.71@o2ib4) Feb 24 06:27:27 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 24 06:37:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client f8c56e48-2583-434a-b50d-30254178caf9 (at 10.9.103.20@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c9965705000, cur 1551019032 expire 1551018882 last 1551018805 Feb 24 06:37:12 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 06:44:05 fir-md1-s1 kernel: Lustre: MGS: Connection restored to f88d3e4f-b8ad-7e3f-e052-b857e571de2a (at 10.9.107.13@o2ib4) Feb 24 06:44:05 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Feb 24 07:07:55 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.103.36@o2ib4) Feb 24 07:07:55 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Feb 24 07:13:24 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 5b2add5c-e8c3-a0bc-1847-97325246277c (at 10.9.103.20@o2ib4) Feb 24 07:13:24 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 07:26:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client eb0b17ca-746e-7622-abd4-371c493253d0 (at 10.9.103.24@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb78d2f9000, cur 1551021974 expire 1551021824 last 1551021747 Feb 24 07:26:14 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 24 07:36:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client e9d1b5f8-7ec9-998b-fe00-a6102cb74525 (at 10.9.102.2@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c98cd6b6000, cur 1551022596 expire 1551022446 last 1551022369 Feb 24 07:36:36 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 07:57:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 04c67229-21fb-0235-15ed-cccc9063a531 (at 10.8.27.18@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca9d0bb2400, cur 1551023831 expire 1551023681 last 1551023604 Feb 24 07:57:11 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 08:01:29 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.103.24@o2ib4) Feb 24 08:01:29 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 08:11:14 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 76b3fc15-397a-53cf-d258-620d61edba45 (at 10.9.102.2@o2ib4) Feb 24 08:11:14 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 08:14:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 7fe2e6bd-17ee-5658-a394-750649bff28a (at 10.8.14.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb8d36ea000, cur 1551024858 expire 1551024708 last 1551024631 Feb 24 08:14:18 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 08:14:30 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client a8ebae92-67fc-000b-b30f-ff82dfdca2bb (at 10.8.14.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca0cc642400, cur 1551024870 expire 1551024720 last 1551024643 Feb 24 08:14:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 7fe2e6bd-17ee-5658-a394-750649bff28a (at 10.8.14.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca2f57d2800, cur 1551024882 expire 1551024732 last 1551024655 Feb 24 08:22:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 7f26f4a5-b09c-90cc-57f5-181682b8827f (at 10.9.103.33@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cc47807f800, cur 1551025335 expire 1551025185 last 1551025108 Feb 24 08:26:08 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 60b24e28-59ab-9096-8306-2f244ef0ff01 (at 10.8.27.18@o2ib6) Feb 24 08:26:08 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 08:29:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 2196625d-9992-1b8a-5a12-40751a9cdd4e (at 10.9.107.2@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c98d2b01800, cur 1551025746 expire 1551025596 last 1551025519 Feb 24 08:29:06 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 08:41:48 fir-md1-s1 kernel: Lustre: MGS: Connection restored to a8ebae92-67fc-000b-b30f-ff82dfdca2bb (at 10.8.14.4@o2ib6) Feb 24 08:41:48 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 08:53:06 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.103.33@o2ib4) Feb 24 08:53:06 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 08:56:57 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 2196625d-9992-1b8a-5a12-40751a9cdd4e (at 10.9.107.2@o2ib4) Feb 24 08:56:57 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 09:57:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 77963223-bc75-0922-f3f9-87c125865623 (at 10.8.31.5@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca4fbde6800, cur 1551031026 expire 1551030876 last 1551030799 Feb 24 09:57:06 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 10:17:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 967250d6-c1de-8fd6-c33c-9b0bc69f4cab (at 10.9.103.5@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c9965707c00, cur 1551032268 expire 1551032118 last 1551032041 Feb 24 10:17:48 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 10:23:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 6da5269e-e6e7-e930-ea8b-e990b1fd18b0 (at 10.9.101.72@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c980ee4d400, cur 1551032590 expire 1551032440 last 1551032363 Feb 24 10:23:10 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 10:23:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 6da5269e-e6e7-e930-ea8b-e990b1fd18b0 (at 10.9.101.72@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca44c0a5c00, cur 1551032595 expire 1551032445 last 1551032368 Feb 24 10:23:15 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 24 10:25:54 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.31.5@o2ib6) Feb 24 10:25:54 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 10:48:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 92ae2964-22f0-d9af-2db7-23fcbd1fe55b (at 10.9.102.69@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c9e2e6c7000, cur 1551034096 expire 1551033946 last 1551033869 Feb 24 10:52:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client dad15dc4-4b75-8f93-57ae-ea1cf5361955 (at 10.9.105.16@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c992c214800, cur 1551034337 expire 1551034187 last 1551034110 Feb 24 10:52:17 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Feb 24 10:52:18 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 8324460b-2865-1a0b-d205-24b94b71985a (at 10.9.105.16@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c9d49d91c00, cur 1551034338 expire 1551034188 last 1551034111 Feb 24 10:52:18 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 24 10:53:29 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.103.5@o2ib4) Feb 24 10:53:29 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 10:56:57 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.101.72@o2ib4) Feb 24 10:56:57 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 10:58:10 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 68cec70d-885b-9788-50d2-59fb92bb2775 (at 10.8.10.11@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca335b32800, cur 1551034690 expire 1551034540 last 1551034463 Feb 24 11:06:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 96fb107e-6354-4a71-2925-e1f8a9a58d15 (at 10.9.103.21@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cc01bf22800, cur 1551035198 expire 1551035048 last 1551034971 Feb 24 11:06:38 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 11:23:30 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.102.57@o2ib4) Feb 24 11:23:30 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 11:25:06 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.102.69@o2ib4) Feb 24 11:25:06 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 11:25:08 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 8c85a357-78fa-8c22-d8e7-f1ef8d184843 (at 10.9.102.64@o2ib4) Feb 24 11:25:08 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 11:25:35 fir-md1-s1 kernel: Lustre: MGS: Connection restored to f5fe3def-7261-239e-1d83-4ac81d4cfaa5 (at 10.9.102.61@o2ib4) Feb 24 11:25:35 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 11:30:00 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.105.16@o2ib4) Feb 24 11:30:00 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 24 11:30:42 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.10.11@o2ib6) Feb 24 11:30:42 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 11:41:38 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.103.21@o2ib4) Feb 24 11:41:38 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 11:41:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 829cb29b-c33b-daf1-5d36-1b68d0eb41a6 (at 10.9.107.52@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c9965706800, cur 1551037299 expire 1551037149 last 1551037072 Feb 24 11:41:39 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 11:41:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 829cb29b-c33b-daf1-5d36-1b68d0eb41a6 (at 10.9.107.52@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca451207400, cur 1551037315 expire 1551037165 last 1551037088 Feb 24 11:41:55 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 24 11:56:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 14d1e2af-bfe3-6fa6-de44-b510e2b94a1a (at 10.9.101.62@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca052a13c00, cur 1551038194 expire 1551038044 last 1551037967 Feb 24 12:07:04 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 5c8e36ac-a9f0-1174-ba63-42ae10a6b31b (at 10.9.107.52@o2ib4) Feb 24 12:07:04 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 12:31:22 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.101.62@o2ib4) Feb 24 12:31:22 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 12:43:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 1ee3037c-52dd-207d-3196-b589ce5ac006 (at 10.9.114.14@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca098bd5800, cur 1551041032 expire 1551040882 last 1551040805 Feb 24 12:43:52 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 13:01:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 3e2bfa45-013a-e48d-7bcc-c486bbeaa49b (at 10.9.108.9@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c9dae766800, cur 1551042088 expire 1551041938 last 1551041861 Feb 24 13:01:28 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 13:04:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 1b1c689b-37b8-b4bb-d7e3-6c60b9889af8 (at 10.9.106.21@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c9e2e6c3800, cur 1551042269 expire 1551042119 last 1551042042 Feb 24 13:04:29 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Feb 24 13:08:48 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1ee3037c-52dd-207d-3196-b589ce5ac006 (at 10.9.114.14@o2ib4) Feb 24 13:08:48 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 13:16:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client b3fc52f5-cc19-f1e2-5d13-43190203fae8 (at 10.9.106.22@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c98a5e37400, cur 1551043015 expire 1551042865 last 1551042788 Feb 24 13:16:55 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Feb 24 13:39:02 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.107.65@o2ib4) Feb 24 13:39:02 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 13:39:20 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.108.3@o2ib4) Feb 24 13:39:20 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 13:39:25 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.108.9@o2ib4) Feb 24 13:39:25 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 13:40:24 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.108.1@o2ib4) Feb 24 13:40:24 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 13:40:59 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.108.15@o2ib4) Feb 24 13:40:59 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 13:41:12 fir-md1-s1 kernel: Lustre: MGS: Connection restored to fe721a75-74b6-8ad0-6a1f-cdf8875305be (at 10.9.108.13@o2ib4) Feb 24 13:41:12 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 24 13:41:41 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.106.13@o2ib4) Feb 24 13:41:41 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 13:42:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 1e0dbe81-97a8-a9d0-3976-d5a8c6b1ba02 (at 10.9.108.16@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c9dae763000, cur 1551044523 expire 1551044373 last 1551044296 Feb 24 13:42:03 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 24 13:46:34 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 61c57add-b8f8-4e18-8fcb-c69b19f9f23b (at 10.9.106.19@o2ib4) Feb 24 13:46:34 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 13:48:06 fir-md1-s1 kernel: Lustre: MGS: Connection restored to b3732dc3-bb78-77f7-8107-6ecbf6d88a03 (at 10.9.106.21@o2ib4) Feb 24 13:48:06 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Feb 24 14:09:50 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ad5813bc-0e1b-f4a4-6b7d-92ba9e63b92f (at 10.9.107.68@o2ib4) Feb 24 14:09:50 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Feb 24 14:12:56 fir-md1-s1 kernel: Lustre: MGS: Connection restored to c8d3469c-ec1f-9a7d-c4a5-37f7678112b1 (at 10.9.108.5@o2ib4) Feb 24 14:12:56 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 14:13:59 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 4446dbc3-7d22-741f-18a7-993c0ff07f3a (at 10.9.108.8@o2ib4) Feb 24 14:13:59 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 14:26:39 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 220634eb-2f7f-cb16-1776-0156cc57d0d5 (at 10.9.106.45@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c982ebf6000, cur 1551047199 expire 1551047049 last 1551046972 Feb 24 14:26:39 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Feb 24 14:26:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 9042fb3a-c0ab-6915-0268-4626f11a023e (at 10.9.106.45@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb3f7e89c00, cur 1551047207 expire 1551047057 last 1551046980 Feb 24 14:26:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 9042fb3a-c0ab-6915-0268-4626f11a023e (at 10.9.106.45@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c98a5e33000, cur 1551047211 expire 1551047061 last 1551046984 Feb 24 14:52:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 275e2730-4d3f-dc89-2b66-e7a8cc62e3d6 (at 10.8.25.28@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c9d9c8e0800, cur 1551048735 expire 1551048585 last 1551048508 Feb 24 15:00:20 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 9042fb3a-c0ab-6915-0268-4626f11a023e (at 10.9.106.45@o2ib4) Feb 24 15:00:20 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 24 15:21:23 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.25.28@o2ib6) Feb 24 15:21:23 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 15:23:43 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 5d079231-9ed4-0730-7be9-e123819c7379 (at 10.8.13.22@o2ib6) Feb 24 15:23:43 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 15:23:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 36c47535-7f19-9692-c5d3-687f789af19d (at 10.8.11.10@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb22cbaa000, cur 1551050635 expire 1551050485 last 1551050408 Feb 24 15:23:55 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 24 15:24:01 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client c06daabf-9964-0f8e-e088-e365e2438955 (at 10.8.11.10@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cbfd83a0800, cur 1551050641 expire 1551050491 last 1551050414 Feb 24 15:24:01 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 24 15:25:28 fir-md1-s1 kernel: Lustre: MGS: Connection restored to fe28a108-baf3-cd9c-d6ff-86b12f332cdd (at 10.8.11.10@o2ib6) Feb 24 15:25:28 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 15:27:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 5ec5206a-1c58-eeff-df49-5fe7d326e368 (at 10.9.104.28@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca7e2210c00, cur 1551050855 expire 1551050705 last 1551050628 Feb 24 15:37:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client f00f7e2a-8ac6-86cf-8aa7-d26ad9d6b9e7 (at 10.9.106.47@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb3f7e8b400, cur 1551051448 expire 1551051298 last 1551051221 Feb 24 15:37:28 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 15:37:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client f00f7e2a-8ac6-86cf-8aa7-d26ad9d6b9e7 (at 10.9.106.47@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c98a5e35400, cur 1551051457 expire 1551051307 last 1551051230 Feb 24 15:37:37 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 24 15:44:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 8f440af3-cd92-379d-a078-f053f705469f (at 10.9.106.58@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb6323ca000, cur 1551051858 expire 1551051708 last 1551051631 Feb 24 15:46:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client cd13369b-5f5c-de37-391a-25d067b062d5 (at 10.9.106.64@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cc47807b800, cur 1551052017 expire 1551051867 last 1551051790 Feb 24 15:46:57 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 15:57:40 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.18.31@o2ib6) Feb 24 15:57:40 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 16:04:39 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.104.28@o2ib4) Feb 24 16:04:39 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 16:07:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client db209fb3-e752-728f-7a78-57920189bb31 (at 10.9.106.60@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca04beecc00, cur 1551053224 expire 1551053074 last 1551052997 Feb 24 16:07:04 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 16:08:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client e5be9ff2-873f-0542-6c5f-13af50413057 (at 10.8.30.1@o2ib6) in 224 seconds. I think it's dead, and I am evicting it. exp ffff9ca05e3e4400, cur 1551053300 expire 1551053150 last 1551053076 Feb 24 16:08:20 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 16:09:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 032aae30-e439-a130-1f18-efe924baca21 (at 10.9.106.70@o2ib4) in 194 seconds. I think it's dead, and I am evicting it. exp ffff9ca04beebc00, cur 1551053376 expire 1551053226 last 1551053182 Feb 24 16:09:36 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Feb 24 16:10:09 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client fa45ca8a-a52a-b150-8639-65095d50ab90 (at 10.9.106.70@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c9832fa1c00, cur 1551053409 expire 1551053259 last 1551053182 Feb 24 16:12:23 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 9d49f0a0-c7fd-b022-595d-2f049f2ecdf5 (at 10.9.106.47@o2ib4) Feb 24 16:12:23 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 16:12:38 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.106.58@o2ib4) Feb 24 16:12:38 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 16:20:49 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.106.64@o2ib4) Feb 24 16:20:49 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 16:34:52 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.106.60@o2ib4) Feb 24 16:34:52 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 16:37:52 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 5ee98715-4bcf-b4e8-27bc-89f9f237369b (at 10.8.30.5@o2ib6) Feb 24 16:37:52 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 16:38:07 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.30.4@o2ib6) Feb 24 16:38:07 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 16:38:56 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.30.1@o2ib6) Feb 24 16:38:56 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 16:41:54 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 2a7cd23c-895e-9b12-e591-c14dd60309c3 (at 10.8.30.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cc933a8a800, cur 1551055314 expire 1551055164 last 1551055087 Feb 24 16:41:54 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 24 16:42:07 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.30.4@o2ib6) Feb 24 16:42:07 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 24 16:42:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 8766aca3-0b1a-78de-082b-08cc790415f9 (at 10.8.30.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb1fe248c00, cur 1551055333 expire 1551055183 last 1551055106 Feb 24 16:43:10 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client f682de02-8eea-7b32-39fc-79d659b66610 (at 10.9.107.51@o2ib4) in 196 seconds. I think it's dead, and I am evicting it. exp ffff9ca3dbe36400, cur 1551055390 expire 1551055240 last 1551055194 Feb 24 16:43:10 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 24 16:43:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 56d4d565-fe73-cd79-8098-610edfde7d3b (at 10.9.107.51@o2ib4) in 217 seconds. I think it's dead, and I am evicting it. exp ffff9ca9d0bb7000, cur 1551055409 expire 1551055259 last 1551055192 Feb 24 16:43:43 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 032aae30-e439-a130-1f18-efe924baca21 (at 10.9.106.70@o2ib4) Feb 24 16:43:43 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 16:44:26 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 76cda57d-9fb9-d11d-04a3-a76ef43e57c5 (at 10.9.107.49@o2ib4) in 152 seconds. I think it's dead, and I am evicting it. exp ffff9cd45563c400, cur 1551055466 expire 1551055316 last 1551055314 Feb 24 16:44:26 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 24 17:08:05 fir-md1-s1 kernel: Lustre: MGS: Connection restored to f682de02-8eea-7b32-39fc-79d659b66610 (at 10.9.107.51@o2ib4) Feb 24 17:08:05 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 17:09:59 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.107.49@o2ib4) Feb 24 17:09:59 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 17:53:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client a6d2f42c-4a04-9c4a-cc81-70ba500cf671 (at 10.9.106.38@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c98a5e33400, cur 1551059596 expire 1551059446 last 1551059369 Feb 24 17:53:16 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 18:03:47 fir-md1-s1 kernel: EXT4-fs (sdk2): error count since last fsck: 5 Feb 24 18:03:47 fir-md1-s1 kernel: EXT4-fs (sdk2): initial error at time 1550022155: ext4_mb_generate_buddy:757 Feb 24 18:03:47 fir-md1-s1 kernel: EXT4-fs (sdk2): last error at time 1550448029: ext4_mb_generate_buddy:757 Feb 24 18:28:03 fir-md1-s1 kernel: Lustre: MGS: Connection restored to a6d2f42c-4a04-9c4a-cc81-70ba500cf671 (at 10.9.106.38@o2ib4) Feb 24 18:28:03 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 19:01:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client b312cb06-2b52-b7e1-8133-1c9687ba2033 (at 10.8.18.31@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb0a335a000, cur 1551063703 expire 1551063553 last 1551063476 Feb 24 19:01:43 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 19:12:29 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.18.31@o2ib6) Feb 24 19:12:29 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 19:22:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 1ecf1ab2-d481-bbed-3893-941aef9b4486 (at 10.8.11.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cac28ba8400, cur 1551064979 expire 1551064829 last 1551064752 Feb 24 19:22:59 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 19:23:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 1ecf1ab2-d481-bbed-3893-941aef9b4486 (at 10.8.11.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c9ece87f000, cur 1551064984 expire 1551064834 last 1551064757 Feb 24 19:39:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 86b1a898-94cc-a41f-85e1-12abf7418dc9 (at 10.8.7.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cabfabe3000, cur 1551065957 expire 1551065807 last 1551065730 Feb 24 19:39:17 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 24 19:39:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 5ba2142c-879f-ac28-9d4a-a3788afebea0 (at 10.8.7.35@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca05e3e0c00, cur 1551065961 expire 1551065811 last 1551065734 Feb 24 19:39:21 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 24 19:54:52 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.11.23@o2ib6) Feb 24 19:54:52 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 20:01:21 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client aac614cf-2be0-7c7b-8dbf-992b5e2de845 (at 10.8.27.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c9d49d94800, cur 1551067281 expire 1551067131 last 1551067054 Feb 24 20:01:21 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Feb 24 20:01:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 680b2f8b-c457-9b65-c1bc-58d27b9ae2fe (at 10.8.27.14@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c96ace75000, cur 1551067302 expire 1551067152 last 1551067075 Feb 24 20:04:10 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ce807089-fd76-9353-e0a7-32c622e380eb (at 10.8.20.15@o2ib6) Feb 24 20:04:10 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 20:04:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client c3535875-d3c1-a4f1-4d7f-21c04068d08d (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cc43f8c2400, cur 1551067483 expire 1551067333 last 1551067256 Feb 24 20:04:43 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 24 20:08:01 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.7.33@o2ib6) Feb 24 20:08:01 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 20:08:53 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.3@o2ib6) Feb 24 20:08:53 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 20:09:00 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.7.35@o2ib6) Feb 24 20:09:00 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 20:09:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 7122cf14-0523-fe12-768f-cd0ed99220da (at 10.8.27.11@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb06aab7400, cur 1551067744 expire 1551067594 last 1551067517 Feb 24 20:09:04 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 24 20:09:25 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.6@o2ib6) Feb 24 20:09:25 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 20:09:46 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 66e4ac0c-6519-785c-aa30-3457dbc9eea1 (at 10.8.27.8@o2ib6) Feb 24 20:09:46 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 24 20:12:02 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ce807089-fd76-9353-e0a7-32c622e380eb (at 10.8.20.15@o2ib6) Feb 24 20:12:02 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 24 20:12:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 720160b3-b510-fd4a-7aef-667fe71d4b1d (at 10.8.27.20@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c9d9c8e0c00, cur 1551067979 expire 1551067829 last 1551067752 Feb 24 20:12:59 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 20:19:40 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ce807089-fd76-9353-e0a7-32c622e380eb (at 10.8.20.15@o2ib6) Feb 24 20:19:40 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 20:20:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 95a7bee7-b4e2-04fd-3dcf-8552f5380f6c (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cbfca242400, cur 1551068428 expire 1551068278 last 1551068201 Feb 24 20:20:28 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 24 20:23:19 fir-md1-s1 kernel: Lustre: MGS: Connection restored to fe28a108-baf3-cd9c-d6ff-86b12f332cdd (at 10.8.11.10@o2ib6) Feb 24 20:23:19 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 20:28:06 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 6fea30e0-b3a6-14db-b333-2c99febce837 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cbe636c1000, cur 1551068886 expire 1551068736 last 1551068659 Feb 24 20:28:06 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 24 20:31:31 fir-md1-s1 kernel: Lustre: MGS: Connection restored to aac614cf-2be0-7c7b-8dbf-992b5e2de845 (at 10.8.27.14@o2ib6) Feb 24 20:31:31 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 24 20:34:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client dd23da3a-484d-0eb4-af89-89a069ff0621 (at 10.9.104.49@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9caa05620c00, cur 1551069293 expire 1551069143 last 1551069066 Feb 24 20:34:53 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 20:41:33 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 2e6063a2-ca9b-ae5d-abce-e3daf4d673e4 (at 10.9.103.23@o2ib4) Feb 24 20:41:33 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 24 20:42:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 248ffa45-ee9e-3f32-a526-c435dd0ee693 (at 10.8.17.7@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c96f8453400, cur 1551069778 expire 1551069628 last 1551069551 Feb 24 20:42:58 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 24 20:50:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 00cd381d-5246-e5cd-af5e-792229d3fea2 (at 10.9.104.63@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c951845f800, cur 1551070250 expire 1551070100 last 1551070023 Feb 24 20:50:50 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 24 20:55:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 20609818-b83c-bf65-0dd2-090d3c6e2314 (at 10.9.108.2@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cc1b8b76800, cur 1551070518 expire 1551070368 last 1551070291 Feb 24 20:55:18 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 20:55:34 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 4791ab67-b098-1c8e-2f57-bf7b0dfb741f (at 10.9.106.26@o2ib4) Feb 24 20:55:34 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 24 21:07:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 9342c549-8cbe-27f5-a8b8-f7759a7fb2aa (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cbea7e8c400, cur 1551071236 expire 1551071086 last 1551071009 Feb 24 21:07:16 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 21:09:10 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.104.49@o2ib4) Feb 24 21:09:10 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 21:19:32 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 26cb8da7-6185-1e8c-4a9c-57f28738fb43 (at 10.9.106.32@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c982ebf0000, cur 1551071972 expire 1551071822 last 1551071745 Feb 24 21:19:32 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 24 21:21:10 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ce807089-fd76-9353-e0a7-32c622e380eb (at 10.8.20.15@o2ib6) Feb 24 21:21:10 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Feb 24 21:30:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client d7132a19-51b8-e098-d0d8-a2755039375a (at 10.8.25.20@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9caf05272800, cur 1551072639 expire 1551072489 last 1551072412 Feb 24 21:30:39 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 24 21:33:52 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 74a03b4f-d0c9-a84a-2aa4-fc50ef9db767 (at 10.8.11.9@o2ib6) Feb 24 21:33:52 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Feb 24 21:42:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 8e2f5957-6dae-32b0-e35e-e535b38ca109 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca7a2ac3000, cur 1551073333 expire 1551073183 last 1551073106 Feb 24 21:42:13 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Feb 24 21:45:21 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.106.32@o2ib4) Feb 24 21:45:21 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 24 21:58:14 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 2f904a16-897c-eebc-ac36-ca9f0db916f0 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cc35d3fe400, cur 1551074294 expire 1551074144 last 1551074067 Feb 24 21:58:14 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 24 21:59:30 fir-md1-s1 kernel: Lustre: MGS: Connection restored to d7132a19-51b8-e098-d0d8-a2755039375a (at 10.8.25.20@o2ib6) Feb 24 21:59:30 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 24 22:12:02 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ce807089-fd76-9353-e0a7-32c622e380eb (at 10.8.20.15@o2ib6) Feb 24 22:12:02 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Feb 24 22:12:34 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client cf9bf1da-0d34-eabc-4396-a8cd530033c3 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cd1abbec000, cur 1551075154 expire 1551075004 last 1551074927 Feb 24 22:12:34 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Feb 24 22:23:12 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client f8c148af-48cc-a50d-dac6-61c4b3987507 (at 10.8.18.31@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cc340242800, cur 1551075792 expire 1551075642 last 1551075565 Feb 24 22:23:12 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 24 22:24:28 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.18.31@o2ib6) Feb 24 22:24:28 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 24 22:34:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 65449291-0a20-4dcc-a5b6-a53ab778bafd (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cc3eb7eb000, cur 1551076470 expire 1551076320 last 1551076243 Feb 24 22:34:30 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Feb 24 22:35:52 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.20.33@o2ib6) Feb 24 22:35:52 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Feb 24 22:46:43 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.18.31@o2ib6) Feb 24 22:46:43 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 24 22:47:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client c96f8680-8899-c6f6-b9d8-0338b01efeba (at 10.8.18.31@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c94ce214800, cur 1551077241 expire 1551077091 last 1551077014 Feb 24 22:47:21 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Feb 24 23:03:43 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.21.7@o2ib6) Feb 24 23:03:43 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 24 23:10:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client e4a48af5-173b-69ab-3bac-2bcc464bdd13 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb88424f800, cur 1551078635 expire 1551078485 last 1551078408 Feb 24 23:10:35 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 24 23:15:00 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 56ac631c-5b89-f8ef-0bb7-89282c3bc1be (at 10.9.106.35@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca4e271c000, cur 1551078900 expire 1551078750 last 1551078673 Feb 24 23:15:00 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 23:21:04 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ce807089-fd76-9353-e0a7-32c622e380eb (at 10.8.20.15@o2ib6) Feb 24 23:21:04 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 24 23:21:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 8b348f66-3fc4-430a-d8c7-359d09e3c590 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb132e04000, cur 1551079295 expire 1551079145 last 1551079068 Feb 24 23:21:35 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 24 23:31:19 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ce807089-fd76-9353-e0a7-32c622e380eb (at 10.8.20.15@o2ib6) Feb 24 23:31:19 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 24 23:31:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client bd9b9589-11d3-79b8-1ee1-3b3f95c56163 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca22e6f8000, cur 1551079896 expire 1551079746 last 1551079669 Feb 24 23:31:36 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 24 23:41:45 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.18.31@o2ib6) Feb 24 23:41:45 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 24 23:42:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 995ad94b-fea2-b418-fe5b-a795bf308d23 (at 10.8.18.31@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca0b5469000, cur 1551080540 expire 1551080390 last 1551080313 Feb 24 23:42:20 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 24 23:52:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 24f54dae-dc87-c0e2-6af2-2fff5088159d (at 10.8.18.31@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cc1d32b8000, cur 1551081142 expire 1551080992 last 1551080915 Feb 24 23:52:22 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 24 23:53:20 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.18.31@o2ib6) Feb 24 23:53:20 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Feb 25 00:04:12 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 0cc22e52-6a05-4333-6377-0def4a3f5684 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cc4bd734000, cur 1551081852 expire 1551081702 last 1551081625 Feb 25 00:04:12 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Feb 25 00:06:56 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.18.31@o2ib6) Feb 25 00:06:56 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 25 00:15:38 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 17ce3997-55e2-92f7-0093-c8c466020edd (at 10.8.18.31@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cc43dbedc00, cur 1551082538 expire 1551082388 last 1551082311 Feb 25 00:15:38 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Feb 25 00:28:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 7ca862d9-fb36-f06e-d642-e42cfd5c8c83 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca728bb8000, cur 1551083284 expire 1551083134 last 1551083057 Feb 25 00:28:04 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 00:28:56 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ce807089-fd76-9353-e0a7-32c622e380eb (at 10.8.20.15@o2ib6) Feb 25 00:28:56 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Feb 25 00:35:51 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ce807089-fd76-9353-e0a7-32c622e380eb (at 10.8.20.15@o2ib6) Feb 25 00:35:51 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 00:42:06 fir-md1-s1 kernel: Lustre: MGS: Connection restored to f390a994-cc03-db44-d95c-e8bdd61cdd08 (at 10.8.25.10@o2ib6) Feb 25 00:42:06 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 00:42:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client fc07cbaa-b975-254a-ff30-137392ee3f44 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cbdb7edc400, cur 1551084157 expire 1551084007 last 1551083930 Feb 25 00:42:37 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 25 00:53:28 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ce807089-fd76-9353-e0a7-32c622e380eb (at 10.8.20.15@o2ib6) Feb 25 00:53:28 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Feb 25 00:59:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 1f2b0e26-473c-92cc-1fc9-3f880eb97666 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca80037ec00, cur 1551085189 expire 1551085039 last 1551084962 Feb 25 00:59:49 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 25 01:04:43 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ce807089-fd76-9353-e0a7-32c622e380eb (at 10.8.20.15@o2ib6) Feb 25 01:04:43 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 25 01:10:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 7845f72c-b8b2-58a5-a96c-1e234dd4860e (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cc0a3240000, cur 1551085839 expire 1551085689 last 1551085612 Feb 25 01:10:39 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Feb 25 01:17:02 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.18.31@o2ib6) Feb 25 01:17:02 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Feb 25 01:21:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 12daae0d-e8cd-642c-33c5-6f85068519db (at 10.8.18.31@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cacadf62c00, cur 1551086502 expire 1551086352 last 1551086275 Feb 25 01:21:42 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Feb 25 01:28:39 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 6f23ad32-0dd1-26f7-1bbe-7fefdeb50a2a (at 10.8.19.1@o2ib6) Feb 25 01:28:39 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Feb 25 01:40:21 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ce807089-fd76-9353-e0a7-32c622e380eb (at 10.8.20.15@o2ib6) Feb 25 01:40:21 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Feb 25 01:40:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 152568df-f8f3-5680-1eb9-4bfb2c89211f (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca79724f800, cur 1551087639 expire 1551087489 last 1551087412 Feb 25 01:40:39 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Feb 25 01:57:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 37249470-7203-2d46-2985-30a67dbbc6fd (at 10.8.18.35@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca81d68cc00, cur 1551088628 expire 1551088478 last 1551088401 Feb 25 01:57:08 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 25 01:57:13 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.18.31@o2ib6) Feb 25 01:57:13 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 25 02:08:38 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.104.23@o2ib4) Feb 25 02:08:38 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 25 02:17:19 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 204c5e9c-8d7a-428e-a6e6-ffbb9bd36a27 (at 10.8.18.31@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cbb17e3b000, cur 1551089839 expire 1551089689 last 1551089612 Feb 25 02:17:19 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 25 02:19:58 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ce807089-fd76-9353-e0a7-32c622e380eb (at 10.8.20.15@o2ib6) Feb 25 02:19:58 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Feb 25 02:20:42 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 583d7164-4041-58e5-b1df-fe93b8abab03 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cba1aadc400, cur 1551090042 expire 1551089892 last 1551089815 Feb 25 02:20:42 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 02:23:45 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 624a793a-e972-6bb4-fee5-00b7df4bd340 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cc425171800, cur 1551090225 expire 1551090075 last 1551089998 Feb 25 02:23:45 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 02:31:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 1cd5416c-4828-ac30-6538-cc2847b45533 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca825edf800, cur 1551090709 expire 1551090559 last 1551090482 Feb 25 02:31:49 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 25 02:32:05 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ce807089-fd76-9353-e0a7-32c622e380eb (at 10.8.20.15@o2ib6) Feb 25 02:32:05 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 25 02:42:32 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ce807089-fd76-9353-e0a7-32c622e380eb (at 10.8.20.15@o2ib6) Feb 25 02:42:32 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 25 02:44:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 08958bbc-0f90-1cbd-61ae-768cfa6c9459 (at 10.9.104.69@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9caa05626400, cur 1551091446 expire 1551091296 last 1551091219 Feb 25 02:44:06 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 25 02:55:57 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ce807089-fd76-9353-e0a7-32c622e380eb (at 10.8.20.15@o2ib6) Feb 25 02:55:57 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 25 03:21:18 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.104.69@o2ib4) Feb 25 03:21:18 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 03:37:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client dea6ff96-022f-bdf6-cc03-2d525b850638 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c9692549800, cur 1551094641 expire 1551094491 last 1551094414 Feb 25 03:37:21 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Feb 25 03:44:31 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ce807089-fd76-9353-e0a7-32c622e380eb (at 10.8.20.15@o2ib6) Feb 25 03:44:31 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 04:07:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 3bc55b46-df6c-46c6-b66d-183e8631b835 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9caf83fe1400, cur 1551096430 expire 1551096280 last 1551096203 Feb 25 04:07:10 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 04:11:58 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ce807089-fd76-9353-e0a7-32c622e380eb (at 10.8.20.15@o2ib6) Feb 25 04:11:58 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 04:14:06 fir-md1-s1 kernel: Lustre: MGS: Connection restored to fe28a108-baf3-cd9c-d6ff-86b12f332cdd (at 10.8.11.10@o2ib6) Feb 25 04:14:06 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 04:17:46 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ce807089-fd76-9353-e0a7-32c622e380eb (at 10.8.20.15@o2ib6) Feb 25 04:17:46 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 04:18:19 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client cc1f3015-effc-6c19-6b5d-d1280033b128 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cae80bc1c00, cur 1551097099 expire 1551096949 last 1551096872 Feb 25 04:18:19 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 25 04:20:44 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ce807089-fd76-9353-e0a7-32c622e380eb (at 10.8.20.15@o2ib6) Feb 25 04:20:44 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 04:21:33 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client e4d8a8b7-9588-d384-e69c-9ea51d4da6dc (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cafadafb800, cur 1551097293 expire 1551097143 last 1551097066 Feb 25 04:21:33 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 04:21:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 3cd4b19e-f214-f55a-3c2a-78687885e70c (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb3d2938800, cur 1551097296 expire 1551097146 last 1551097069 Feb 25 04:22:49 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 3652bd79-4f32-b5dc-ea3b-cf15e1f5fb84 (at 10.8.11.10@o2ib6) in 169 seconds. I think it's dead, and I am evicting it. exp ffff9c9ae823c400, cur 1551097369 expire 1551097219 last 1551097200 Feb 25 04:22:49 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 25 04:23:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 1b0a86b3-0fd0-cda1-1107-0c6301aea8f2 (at 10.8.11.10@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca71432b000, cur 1551097427 expire 1551097277 last 1551097200 Feb 25 04:23:47 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 25 04:25:54 fir-md1-s1 kernel: Lustre: MGS: Connection restored to fe28a108-baf3-cd9c-d6ff-86b12f332cdd (at 10.8.11.10@o2ib6) Feb 25 04:25:54 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 04:33:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client a75a6a3c-cb83-2a7b-82b2-3df2cdacd1c6 (at 10.8.11.10@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9caa502e1400, cur 1551097984 expire 1551097834 last 1551097757 Feb 25 04:35:24 fir-md1-s1 kernel: Lustre: MGS: Connection restored to fe28a108-baf3-cd9c-d6ff-86b12f332cdd (at 10.8.11.10@o2ib6) Feb 25 04:35:24 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 04:35:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 229460c3-4182-2dec-2c0d-9ca77ec4a979 (at 10.8.18.31@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb0db6e9400, cur 1551098130 expire 1551097980 last 1551097903 Feb 25 04:35:30 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 04:36:39 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.18.31@o2ib6) Feb 25 04:36:39 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 04:38:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client d12e7805-0195-6cbb-9c53-6eb86c3dcff7 (at 10.8.18.35@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cacad255c00, cur 1551098323 expire 1551098173 last 1551098096 Feb 25 04:38:43 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 04:39:44 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 252849ea-4610-daa4-240b-4bf6ea8ff8a2 (at 10.8.18.35@o2ib6) Feb 25 04:39:44 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 04:42:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 66075cf2-9785-a74c-455c-df4c6c6263fd (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb02523a400, cur 1551098551 expire 1551098401 last 1551098324 Feb 25 04:42:31 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 04:46:30 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ce807089-fd76-9353-e0a7-32c622e380eb (at 10.8.20.15@o2ib6) Feb 25 04:46:30 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 25 04:52:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 69bc773c-93b7-5ebc-266d-a59bf91f8620 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca4ae7bc000, cur 1551099170 expire 1551099020 last 1551098943 Feb 25 04:52:50 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 25 04:52:56 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.18.31@o2ib6) Feb 25 04:52:56 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 05:03:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 13cda212-3c3a-2354-2af2-efa6674c6d4b (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cac6cef3000, cur 1551099788 expire 1551099638 last 1551099561 Feb 25 05:03:08 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 25 05:05:11 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ce807089-fd76-9353-e0a7-32c622e380eb (at 10.8.20.15@o2ib6) Feb 25 05:05:11 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Feb 25 05:15:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 20f8fe85-8425-8176-626e-350b50e6f45c (at 10.9.106.15@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca04beeb000, cur 1551100529 expire 1551100379 last 1551100302 Feb 25 05:15:29 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Feb 25 05:22:34 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ce807089-fd76-9353-e0a7-32c622e380eb (at 10.8.20.15@o2ib6) Feb 25 05:22:34 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Feb 25 05:26:21 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client af2423f2-1739-a090-98aa-6239f4f641e0 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ccfd1bac400, cur 1551101181 expire 1551101031 last 1551100954 Feb 25 05:26:21 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 25 05:33:52 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ce807089-fd76-9353-e0a7-32c622e380eb (at 10.8.20.15@o2ib6) Feb 25 05:33:52 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 25 05:40:38 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 82d5f96f-0b93-394d-49da-639bac969b8d (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c9681be2c00, cur 1551102038 expire 1551101888 last 1551101811 Feb 25 05:40:38 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 25 05:47:29 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ce807089-fd76-9353-e0a7-32c622e380eb (at 10.8.20.15@o2ib6) Feb 25 05:47:29 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 25 05:51:16 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client d699e73a-dbf2-7a13-6b13-8415e39704db (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cccd9e5e400, cur 1551102676 expire 1551102526 last 1551102449 Feb 25 05:51:16 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 25 06:18:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client c6708fe1-bd2e-9567-4c18-7af0628e1eeb (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb42af23c00, cur 1551104323 expire 1551104173 last 1551104096 Feb 25 06:18:43 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 06:22:14 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ce807089-fd76-9353-e0a7-32c622e380eb (at 10.8.20.15@o2ib6) Feb 25 06:22:14 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 25 06:28:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 997214e0-1aa8-bfb7-6d54-7e553060abad (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb27c35ac00, cur 1551104890 expire 1551104740 last 1551104663 Feb 25 06:28:10 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 06:30:40 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ce807089-fd76-9353-e0a7-32c622e380eb (at 10.8.20.15@o2ib6) Feb 25 06:30:40 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 25 06:36:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 5b8b8274-3329-70cb-e2f9-7c5876fb0ee5 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca825684000, cur 1551105370 expire 1551105220 last 1551105143 Feb 25 06:36:10 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 06:41:01 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ce807089-fd76-9353-e0a7-32c622e380eb (at 10.8.20.15@o2ib6) Feb 25 06:41:01 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 06:47:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 65d1370e-308d-58b6-f0c9-7ef4f0b85028 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb3e34f2400, cur 1551106042 expire 1551105892 last 1551105815 Feb 25 06:47:22 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 06:47:30 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ce807089-fd76-9353-e0a7-32c622e380eb (at 10.8.20.15@o2ib6) Feb 25 06:47:30 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 06:58:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client ee2183cf-f67e-8663-d79a-58c5f102bdc3 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca29be81c00, cur 1551106687 expire 1551106537 last 1551106460 Feb 25 06:58:07 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 25 07:01:25 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ce807089-fd76-9353-e0a7-32c622e380eb (at 10.8.20.15@o2ib6) Feb 25 07:01:25 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 25 07:09:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 39b74f02-0c4c-fd51-e621-4bd6eb7173c0 (at 10.9.103.18@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c9e95dcec00, cur 1551107379 expire 1551107229 last 1551107152 Feb 25 07:09:39 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 25 07:15:15 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.18.31@o2ib6) Feb 25 07:15:15 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 25 07:43:59 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ce807089-fd76-9353-e0a7-32c622e380eb (at 10.8.20.15@o2ib6) Feb 25 07:43:59 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 25 07:44:29 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client c2fd55c5-d390-9c70-abea-7d70fa7c0357 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cba0c755c00, cur 1551109469 expire 1551109319 last 1551109242 Feb 25 07:44:29 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Feb 25 07:45:22 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.103.18@o2ib4) Feb 25 07:45:22 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 07:47:45 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client fe595d6e-f74b-8fd6-940a-415fbf373982 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cc371a41400, cur 1551109665 expire 1551109515 last 1551109438 Feb 25 07:47:45 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 07:53:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 975f7ad7-f03d-15c7-83b1-221b445246ae (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca7907aa400, cur 1551110023 expire 1551109873 last 1551109796 Feb 25 07:53:43 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 25 07:58:15 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ce807089-fd76-9353-e0a7-32c622e380eb (at 10.8.20.15@o2ib6) Feb 25 07:58:15 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 25 08:04:02 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ce807089-fd76-9353-e0a7-32c622e380eb (at 10.8.20.15@o2ib6) Feb 25 08:04:02 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 08:04:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 47cbbf7a-086f-0227-f3f0-b1bf06fab51b (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca342e3a400, cur 1551110651 expire 1551110501 last 1551110424 Feb 25 08:04:11 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 08:09:04 fir-md1-s1 kernel: Lustre: 51443:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551110933/real 1551110933] req@ffff9c9518ffa700 x1625959319111840/t0(0) o104->fir-MDT0000@10.8.20.15@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1551110944 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Feb 25 08:09:04 fir-md1-s1 kernel: Lustre: 51443:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Feb 25 08:09:15 fir-md1-s1 kernel: Lustre: 51443:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551110944/real 1551110944] req@ffff9c9518ffa700 x1625959319111840/t0(0) o104->fir-MDT0000@10.8.20.15@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1551110955 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 25 08:09:38 fir-md1-s1 kernel: Lustre: 51443:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551110966/real 1551110966] req@ffff9c9518ffa700 x1625959319111840/t0(0) o104->fir-MDT0000@10.8.20.15@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1551110977 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 25 08:09:38 fir-md1-s1 kernel: Lustre: 51443:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Feb 25 08:09:53 fir-md1-s1 kernel: LustreError: 51443:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.20.15@o2ib6) returned error from blocking AST (req@ffff9c9518ffa700 x1625959319111840 status -107 rc -107), evict it ns: mdt-fir-MDT0000_UUID lock: ffff9cc072a4bf00/0xb7044c63b55fd23a lrc: 4/0,0 mode: PR/PR res: [0x200000400:0x5:0x0].0x0 bits 0x13/0x0 rrc: 539 type: IBT flags: 0x60200400000020 nid: 10.8.20.15@o2ib6 remote: 0xd6802e1cd314dc72 expref: 10 pid: 22241 timeout: 1096117 lvb_type: 0 Feb 25 08:09:53 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ce807089-fd76-9353-e0a7-32c622e380eb (at 10.8.20.15@o2ib6) Feb 25 08:09:53 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 08:09:53 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0000: A client on nid 10.8.20.15@o2ib6 was evicted due to a lock blocking callback time out: rc -107 Feb 25 08:09:53 fir-md1-s1 kernel: LustreError: 129189:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 60s: evicting client at 10.8.20.15@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff9cc072a4bf00/0xb7044c63b55fd23a lrc: 3/0,0 mode: PR/PR res: [0x200000400:0x5:0x0].0x0 bits 0x13/0x0 rrc: 539 type: IBT flags: 0x60200400000020 nid: 10.8.20.15@o2ib6 remote: 0xd6802e1cd314dc72 expref: 11 pid: 22241 timeout: 0 lvb_type: 0 Feb 25 08:12:55 fir-md1-s1 kernel: Lustre: DEBUG MARKER: Mon Feb 25 08:12:55 2019 Feb 25 08:14:42 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ce807089-fd76-9353-e0a7-32c622e380eb (at 10.8.20.15@o2ib6) Feb 25 08:14:42 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 08:15:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 76379dee-0ebd-960e-9130-f64a5595d718 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cac69a05800, cur 1551111323 expire 1551111173 last 1551111096 Feb 25 08:15:23 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Feb 25 08:17:34 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ce807089-fd76-9353-e0a7-32c622e380eb (at 10.8.20.15@o2ib6) Feb 25 08:17:34 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 08:25:25 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 97bc8e0c-1614-4de0-a593-98b585b7fd0b (at 10.9.103.30@o2ib4) Feb 25 08:25:25 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 25 08:28:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 874bae97-c73c-de86-53e9-317637b9cc61 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb130fc9800, cur 1551112137 expire 1551111987 last 1551111910 Feb 25 08:28:57 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 25 08:48:09 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ce807089-fd76-9353-e0a7-32c622e380eb (at 10.8.20.15@o2ib6) Feb 25 08:48:09 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 25 08:49:00 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client c1362e0d-4bd1-387a-1fff-93d3147b3840 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cceebed8800, cur 1551113340 expire 1551113190 last 1551113113 Feb 25 08:49:00 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 08:56:34 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ce807089-fd76-9353-e0a7-32c622e380eb (at 10.8.20.15@o2ib6) Feb 25 08:56:34 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 08:59:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 471d4d1d-7684-41a1-beb7-bd2d4f0908de (at 10.8.18.35@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb3f5227400, cur 1551113968 expire 1551113818 last 1551113741 Feb 25 08:59:28 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 25 09:00:45 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 252849ea-4610-daa4-240b-4bf6ea8ff8a2 (at 10.8.18.35@o2ib6) Feb 25 09:00:45 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 09:07:30 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 252849ea-4610-daa4-240b-4bf6ea8ff8a2 (at 10.8.18.35@o2ib6) Feb 25 09:07:30 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 25 09:12:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 32b6d3dd-e6ba-d67f-b889-fe8b65a0fcd6 (at 10.8.18.35@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb20276dc00, cur 1551114752 expire 1551114602 last 1551114525 Feb 25 09:12:32 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Feb 25 09:17:39 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.18.31@o2ib6) Feb 25 09:17:39 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 25 09:23:17 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 41739cd5-7b25-4f7d-893e-28ee147d7ba7 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cbefa3fd000, cur 1551115397 expire 1551115247 last 1551115170 Feb 25 09:23:17 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 25 09:30:12 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ce807089-fd76-9353-e0a7-32c622e380eb (at 10.8.20.15@o2ib6) Feb 25 09:30:12 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Feb 25 09:42:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client b9f4ca05-0fa9-649e-b1de-383d9bc47492 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cc25e319800, cur 1551116569 expire 1551116419 last 1551116342 Feb 25 09:42:49 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Feb 25 09:48:07 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ce807089-fd76-9353-e0a7-32c622e380eb (at 10.8.20.15@o2ib6) Feb 25 09:48:07 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 09:54:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 6be83b83-a986-085c-687c-9542da46466e (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cbe0f219400, cur 1551117293 expire 1551117143 last 1551117066 Feb 25 09:54:53 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 10:01:08 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.24.22@o2ib6) Feb 25 10:01:08 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Feb 25 10:07:00 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client f8d08b61-6609-0865-48e3-a7a71df56240 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb7b8ead000, cur 1551118020 expire 1551117870 last 1551117793 Feb 25 10:07:00 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 25 10:14:57 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 252849ea-4610-daa4-240b-4bf6ea8ff8a2 (at 10.8.18.35@o2ib6) Feb 25 10:14:57 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 25 10:23:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client f69edaf7-429d-e7e9-8601-c87929294bbc (at 10.9.108.22@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb6323c8000, cur 1551119021 expire 1551118871 last 1551118794 Feb 25 10:23:41 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 25 10:34:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client a18587f4-6669-8a89-c311-224538b5a6f2 (at 10.8.27.32@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c9f1d0e9800, cur 1551119640 expire 1551119490 last 1551119413 Feb 25 10:34:00 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Feb 25 10:34:23 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ce807089-fd76-9353-e0a7-32c622e380eb (at 10.8.20.15@o2ib6) Feb 25 10:34:23 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 10:45:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 0bf5624d-e79b-6fc1-1e06-62562c85448b (at 10.8.3.11@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cacc921c000, cur 1551120303 expire 1551120153 last 1551120076 Feb 25 10:45:03 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 25 10:46:03 fir-md1-s1 kernel: Lustre: MGS: Connection restored to f69edaf7-429d-e7e9-8601-c87929294bbc (at 10.9.108.22@o2ib4) Feb 25 10:46:03 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 25 10:53:45 fir-md1-s1 kernel: Lustre: 51425:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551120818/real 1551120818] req@ffff9cd0b57db300 x1625959411060576/t0(0) o104->fir-MDT0000@10.9.0.62@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1551120825 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Feb 25 10:53:45 fir-md1-s1 kernel: Lustre: 51425:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Feb 25 10:53:52 fir-md1-s1 kernel: Lustre: 51425:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551120825/real 1551120825] req@ffff9cd0b57db300 x1625959411060576/t0(0) o104->fir-MDT0000@10.9.0.62@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1551120832 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 25 10:54:06 fir-md1-s1 kernel: Lustre: 51425:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551120839/real 1551120839] req@ffff9cd0b57db300 x1625959411060576/t0(0) o104->fir-MDT0000@10.9.0.62@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1551120846 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 25 10:54:06 fir-md1-s1 kernel: Lustre: 51425:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Feb 25 10:54:27 fir-md1-s1 kernel: Lustre: 51425:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551120860/real 1551120860] req@ffff9cd0b57db300 x1625959411060576/t0(0) o104->fir-MDT0000@10.9.0.62@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1551120867 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 25 10:54:27 fir-md1-s1 kernel: Lustre: 51425:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages Feb 25 10:55:09 fir-md1-s1 kernel: Lustre: 51425:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551120902/real 1551120902] req@ffff9cd0b57db300 x1625959411060576/t0(0) o104->fir-MDT0000@10.9.0.62@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1551120909 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 25 10:55:09 fir-md1-s1 kernel: Lustre: 51425:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 11 previous similar messages Feb 25 10:55:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client cf27932c-5cfb-509a-c7ce-6753e8ed5f45 (at 10.8.0.66@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca44c0a7c00, cur 1551120923 expire 1551120773 last 1551120696 Feb 25 10:55:23 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 25 10:57:38 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.104.53@o2ib4) Feb 25 10:57:38 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Feb 25 11:12:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client ff6dc77a-4656-6fd1-f0d6-fbaf7b10161c (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cbad0258400, cur 1551121930 expire 1551121780 last 1551121703 Feb 25 11:12:10 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Feb 25 11:14:39 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ce807089-fd76-9353-e0a7-32c622e380eb (at 10.8.20.15@o2ib6) Feb 25 11:14:39 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 25 11:31:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 844c642b-70a6-ed76-9d2b-d135d03f8b90 (at 10.9.105.66@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca098bd2800, cur 1551123097 expire 1551122947 last 1551122870 Feb 25 11:31:37 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 11:32:04 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 65d96ce3-d21d-6631-277d-e96b762f5c17 (at 10.9.0.62@o2ib4) Feb 25 11:32:04 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 11:43:09 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 45f81bad-ba79-a5b8-97f4-c718cac35552 (at 10.8.0.82@o2ib6) Feb 25 11:43:09 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 25 12:01:26 fir-md1-s1 kernel: Lustre: 21922:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551124879/real 1551124879] req@ffff9cb9eeee3900 x1625959508216128/t0(0) o104->fir-MDT0002@10.8.18.31@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1551124886 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Feb 25 12:01:26 fir-md1-s1 kernel: Lustre: 21922:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages Feb 25 12:01:37 fir-md1-s1 kernel: Lustre: 22219:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551124890/real 1551124890] req@ffff9cae0fa00c00 x1625959508328800/t0(0) o104->fir-MDT0002@10.8.18.31@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1551124897 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 25 12:01:37 fir-md1-s1 kernel: Lustre: 22219:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Feb 25 12:01:58 fir-md1-s1 kernel: Lustre: 22219:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551124911/real 1551124911] req@ffff9cae0fa00c00 x1625959508328800/t0(0) o104->fir-MDT0002@10.8.18.31@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1551124918 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 25 12:01:58 fir-md1-s1 kernel: Lustre: 22219:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 6 previous similar messages Feb 25 12:02:35 fir-md1-s1 kernel: Lustre: 22186:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551124948/real 1551124948] req@ffff9cb7ebabe900 x1625959509236976/t0(0) o104->fir-MDT0002@10.8.18.31@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1551124955 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 25 12:02:35 fir-md1-s1 kernel: Lustre: 22186:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 24 previous similar messages Feb 25 12:03:51 fir-md1-s1 kernel: Lustre: 21993:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551125024/real 1551125024] req@ffff9cbb956bcb00 x1625959509852784/t0(0) o104->fir-MDT0002@10.8.18.31@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1551125031 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 25 12:03:51 fir-md1-s1 kernel: Lustre: 21993:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 115 previous similar messages Feb 25 12:03:53 fir-md1-s1 kernel: LustreError: 21922:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.18.31@o2ib6) failed to reply to blocking AST (req@ffff9cb9eeee3900 x1625959508216128 status 0 rc -110), evict it ns: mdt-fir-MDT0002_UUID lock: ffff9cbde223d580/0xb7044c63e5c7f8d9 lrc: 4/0,0 mode: PR/PR res: [0x2c0007180:0x1b6e:0x0].0x0 bits 0x13/0x0 rrc: 3 type: IBT flags: 0x60200400000020 nid: 10.8.18.31@o2ib6 remote: 0x59d6657f6094d980 expref: 1371 pid: 22179 timeout: 1110155 lvb_type: 0 Feb 25 12:03:53 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0002: A client on nid 10.8.18.31@o2ib6 was evicted due to a lock blocking callback time out: rc -110 Feb 25 12:03:53 fir-md1-s1 kernel: LustreError: 129189:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 154s: evicting client at 10.8.18.31@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff9cbde223d580/0xb7044c63e5c7f8d9 lrc: 3/0,0 mode: PR/PR res: [0x2c0007180:0x1b6e:0x0].0x0 bits 0x13/0x0 rrc: 3 type: IBT flags: 0x60200400000020 nid: 10.8.18.31@o2ib6 remote: 0x59d6657f6094d980 expref: 1372 pid: 22179 timeout: 0 lvb_type: 0 Feb 25 12:03:53 fir-md1-s1 kernel: LustreError: 21922:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff9cbfad2e4200 x1625959513752384/t0(0) o104->fir-MDT0002@10.8.18.31@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Feb 25 12:03:53 fir-md1-s1 kernel: LustreError: 21922:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 1 previous similar message Feb 25 12:03:54 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 1473333c-9fb3-e2e3-d9a1-ba82eb425068 (at 10.9.105.66@o2ib4) Feb 25 12:03:54 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 12:04:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client a78ab922-10b1-e1a7-89e3-f90306408594 (at 10.8.18.31@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cae3c33f800, cur 1551125089 expire 1551124939 last 1551124862 Feb 25 12:04:49 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 12:05:41 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.18.31@o2ib6) Feb 25 12:05:41 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 12:08:48 fir-md1-s1 kernel: Lustre: 21885:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Feb 25 12:08:48 fir-md1-s1 kernel: Lustre: 21885:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 14 previous similar messages Feb 25 12:17:34 fir-md1-s1 kernel: Lustre: 47903:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Feb 25 12:18:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 3ce6d475-6724-c117-8c44-da8378e50030 (at 10.9.101.69@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca451203000, cur 1551125893 expire 1551125743 last 1551125666 Feb 25 12:18:13 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 25 12:36:15 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client e2371779-7491-8f2c-607f-69e73919828d (at 10.8.18.31@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cbaa6e6a000, cur 1551126975 expire 1551126825 last 1551126748 Feb 25 12:36:15 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 12:36:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 6c31a5cc-998a-3064-83b3-9f96e026df9d (at 10.8.18.31@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c9bf280b800, cur 1551126977 expire 1551126827 last 1551126750 Feb 25 12:36:17 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 25 12:37:25 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.18.31@o2ib6) Feb 25 12:37:25 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 12:42:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 4840388f-4bf3-516e-4c43-00080a4a9c17 (at 10.8.18.35@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cc13d244c00, cur 1551127326 expire 1551127176 last 1551127099 Feb 25 12:42:13 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 16632437-7bdf-adea-e03c-25744239d454 (at 10.8.18.35@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb65cff1800, cur 1551127333 expire 1551127183 last 1551127106 Feb 25 12:42:13 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 25 12:42:43 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 252849ea-4610-daa4-240b-4bf6ea8ff8a2 (at 10.8.18.35@o2ib6) Feb 25 12:42:43 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 12:44:28 fir-md1-s1 kernel: Lustre: 21955:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551127461/real 1551127461] req@ffff9caaee355100 x1625959579387840/t0(0) o104->fir-MDT0000@10.8.20.15@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1551127468 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Feb 25 12:44:28 fir-md1-s1 kernel: Lustre: 21955:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 8 previous similar messages Feb 25 12:44:49 fir-md1-s1 kernel: Lustre: 21955:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551127482/real 1551127482] req@ffff9caaee355100 x1625959579387840/t0(0) o104->fir-MDT0000@10.8.20.15@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1551127489 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 25 12:44:49 fir-md1-s1 kernel: Lustre: 21955:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Feb 25 12:45:31 fir-md1-s1 kernel: Lustre: 21955:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551127524/real 1551127524] req@ffff9caaee355100 x1625959579387840/t0(0) o104->fir-MDT0000@10.8.20.15@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1551127531 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 25 12:45:31 fir-md1-s1 kernel: Lustre: 21955:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages Feb 25 12:46:48 fir-md1-s1 kernel: Lustre: 21955:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551127601/real 1551127601] req@ffff9caaee355100 x1625959579387840/t0(0) o104->fir-MDT0000@10.8.20.15@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1551127608 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 25 12:46:48 fir-md1-s1 kernel: Lustre: 21955:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 10 previous similar messages Feb 25 12:46:55 fir-md1-s1 kernel: LustreError: 21955:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.20.15@o2ib6) failed to reply to blocking AST (req@ffff9caaee355100 x1625959579387840 status 0 rc -110), evict it ns: mdt-fir-MDT0000_UUID lock: ffff9c952a5b6c00/0xb7044c63ed61b5b1 lrc: 4/0,0 mode: PR/PR res: [0x2000068d3:0x6:0x0].0x0 bits 0x1b/0x0 rrc: 8 type: IBT flags: 0x60200400000020 nid: 10.8.20.15@o2ib6 remote: 0xb57483b050d2316c expref: 77 pid: 22152 timeout: 1112736 lvb_type: 0 Feb 25 12:46:55 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0000: A client on nid 10.8.20.15@o2ib6 was evicted due to a lock blocking callback time out: rc -110 Feb 25 12:46:55 fir-md1-s1 kernel: LustreError: 129189:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 154s: evicting client at 10.8.20.15@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff9c952a5b6c00/0xb7044c63ed61b5b1 lrc: 3/0,0 mode: PR/PR res: [0x2000068d3:0x6:0x0].0x0 bits 0x1b/0x0 rrc: 8 type: IBT flags: 0x60200400000020 nid: 10.8.20.15@o2ib6 remote: 0xb57483b050d2316c expref: 78 pid: 22152 timeout: 0 lvb_type: 0 Feb 25 12:47:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client cc6e114f-a671-c1d1-8519-730567c2dff7 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cc40fafb000, cur 1551127644 expire 1551127494 last 1551127417 Feb 25 12:47:44 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.18.31@o2ib6) Feb 25 12:47:44 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 12:49:23 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 6a53c2a7-4511-34e6-fb4c-bd44f9c46ce2 (at 10.9.101.69@o2ib4) Feb 25 12:49:23 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 12:52:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client c313559f-a84d-1bc9-e226-9f1e30bc5add (at 10.8.26.31@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c98d2b02c00, cur 1551127940 expire 1551127790 last 1551127713 Feb 25 12:52:20 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Feb 25 12:53:58 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ce807089-fd76-9353-e0a7-32c622e380eb (at 10.8.20.15@o2ib6) Feb 25 12:53:58 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 12:57:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 206dd526-56ca-7f0f-cc93-b55e80ec3979 (at 10.8.18.31@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca2a49a6800, cur 1551128241 expire 1551128091 last 1551128014 Feb 25 12:57:21 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Feb 25 13:04:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 67f0d42c-d76b-f97b-26ce-7d40fa9782ff (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c9c6b99f000, cur 1551128669 expire 1551128519 last 1551128442 Feb 25 13:04:29 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 13:05:52 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.18.31@o2ib6) Feb 25 13:05:52 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 25 13:13:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 9bb2e785-62db-f92a-cba1-7c1565dca222 (at 10.8.18.31@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9caa222a6800, cur 1551129233 expire 1551129083 last 1551129006 Feb 25 13:13:53 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 25 13:20:14 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.26.31@o2ib6) Feb 25 13:20:14 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 25 13:20:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 3aa13807-7e41-ad0d-47ec-3103159f30b6 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb9636d4400, cur 1551129648 expire 1551129498 last 1551129421 Feb 25 13:20:48 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 13:30:17 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.18.31@o2ib6) Feb 25 13:30:17 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Feb 25 13:30:52 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 49b9ca42-388c-9c00-1abe-001e338e72bf (at 10.8.18.31@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cbabdb46000, cur 1551130252 expire 1551130102 last 1551130025 Feb 25 13:30:52 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 25 13:42:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 436f530a-f45e-a33c-6a0d-a3414701ff7b (at 10.8.18.31@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb0bce25400, cur 1551130921 expire 1551130771 last 1551130694 Feb 25 13:42:01 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 25 13:42:32 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.18.31@o2ib6) Feb 25 13:42:32 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 25 13:53:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 0560ee03-cb0d-3ab6-6564-7a223fe93704 (at 10.8.18.31@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb19dfcdc00, cur 1551131618 expire 1551131468 last 1551131391 Feb 25 13:53:38 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Feb 25 13:53:51 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.18.31@o2ib6) Feb 25 13:53:51 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 25 14:04:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client feac05a4-716f-c34a-fd9d-1220a521af0c (at 10.9.107.69@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c980ee4c800, cur 1551132283 expire 1551132133 last 1551132056 Feb 25 14:04:43 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 14:10:21 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 6e4158f9-b6ab-ef99-1b0e-b1d638b29462 (at 10.9.106.16@o2ib4) Feb 25 14:10:21 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 14:26:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 7c04f965-01e9-b008-bb36-681b1688b172 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c950d4fa400, cur 1551133592 expire 1551133442 last 1551133365 Feb 25 14:26:32 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 14:28:13 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.107.69@o2ib4) Feb 25 14:28:13 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 25 14:39:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 294204a3-86f8-ea6d-5140-898a94747d5f (at 10.8.18.35@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c9bace29800, cur 1551134392 expire 1551134242 last 1551134165 Feb 25 14:39:52 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 14:40:42 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 252849ea-4610-daa4-240b-4bf6ea8ff8a2 (at 10.8.18.35@o2ib6) Feb 25 14:40:42 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 25 14:44:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client f272eff7-c7bc-edf0-a0c0-52f99d872519 (at 10.8.18.35@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb1e9ff4400, cur 1551134669 expire 1551134519 last 1551134442 Feb 25 14:44:29 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 14:45:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 6763c657-fada-25a2-6788-620374ff78bc (at 10.8.18.34@o2ib6) in 159 seconds. I think it's dead, and I am evicting it. exp ffff9c9ce9b37400, cur 1551134745 expire 1551134595 last 1551134586 Feb 25 14:45:45 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 14:57:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client ae6b1dd5-45e1-41b8-a174-9deffa1f0fce (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cac472bb000, cur 1551135440 expire 1551135290 last 1551135213 Feb 25 14:57:20 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 15:00:02 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ce807089-fd76-9353-e0a7-32c622e380eb (at 10.8.20.15@o2ib6) Feb 25 15:00:02 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 25 15:03:49 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 93d66b19-bd87-f44e-7073-5f8ef5b84a9d (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cc4826f7c00, cur 1551135829 expire 1551135679 last 1551135602 Feb 25 15:03:49 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 15:33:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client a7b8045b-d476-b776-aa76-f8bea0000bff (at 10.8.9.8@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cc1e37ddc00, cur 1551137624 expire 1551137474 last 1551137397 Feb 25 15:33:44 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 15:33:57 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client a7b8045b-d476-b776-aa76-f8bea0000bff (at 10.8.9.8@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c9518eaa400, cur 1551137637 expire 1551137487 last 1551137410 Feb 25 15:33:57 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 25 15:34:11 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.9.8@o2ib6) Feb 25 15:34:11 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 25 15:53:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 8402838d-f75d-af53-0850-df7f3dd913ea (at 10.9.104.58@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9caa05626800, cur 1551138839 expire 1551138689 last 1551138612 Feb 25 16:02:20 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 33e83900-2c88-5de3-63d8-fe6b66d8cbe3 (at 10.9.103.40@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c9837662800, cur 1551139340 expire 1551139190 last 1551139113 Feb 25 16:02:20 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 16:02:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client ec0647d1-f1b9-85b5-d320-be89cdc060c1 (at 10.9.103.42@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca82b741400, cur 1551139344 expire 1551139194 last 1551139117 Feb 25 16:02:24 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 25 16:13:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 9a3e8168-75c8-46f5-0e45-82fbc55de66d (at 10.8.18.34@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cbd6ff1e000, cur 1551140028 expire 1551139878 last 1551139801 Feb 25 16:13:48 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Feb 25 16:14:46 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 252849ea-4610-daa4-240b-4bf6ea8ff8a2 (at 10.8.18.35@o2ib6) Feb 25 16:14:46 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 16:15:04 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 54d2b4c2-0258-d953-5231-1e714b15b030 (at 10.8.18.35@o2ib6) in 205 seconds. I think it's dead, and I am evicting it. exp ffff9cd1452a5c00, cur 1551140104 expire 1551139954 last 1551139899 Feb 25 16:15:04 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 16:23:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 19a5af77-32db-6a2c-514e-095bb62b31d6 (at 10.8.18.35@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca7483efc00, cur 1551140589 expire 1551140439 last 1551140362 Feb 25 16:23:09 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 16:23:33 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 252849ea-4610-daa4-240b-4bf6ea8ff8a2 (at 10.8.18.35@o2ib6) Feb 25 16:23:33 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 25 16:24:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 220b49be-4c22-e196-0372-93fb51bb5b15 (at 10.8.18.34@o2ib6) in 223 seconds. I think it's dead, and I am evicting it. exp ffff9cc341287400, cur 1551140665 expire 1551140515 last 1551140442 Feb 25 16:24:25 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 16:24:29 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client dd698de2-83db-0763-25f4-e9a8990b9a67 (at 10.8.18.34@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb3a6f26c00, cur 1551140669 expire 1551140519 last 1551140442 Feb 25 16:24:29 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 25 16:25:09 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.103.40@o2ib4) Feb 25 16:25:09 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 16:27:18 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ec0647d1-f1b9-85b5-d320-be89cdc060c1 (at 10.9.103.42@o2ib4) Feb 25 16:27:18 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 25 16:33:55 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.104.44@o2ib4) Feb 25 16:33:55 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 16:36:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client ec385dc1-10a6-ea22-c636-9ff43910b33d (at 10.8.14.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c9d6c319000, cur 1551141409 expire 1551141259 last 1551141182 Feb 25 16:36:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client ec385dc1-10a6-ea22-c636-9ff43910b33d (at 10.8.14.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cacbf39e800, cur 1551141415 expire 1551141265 last 1551141188 Feb 25 16:36:55 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 25 16:54:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 5e4c7a11-f131-2f99-ca39-7ac53f68733a (at 10.8.3.11@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb2fc765800, cur 1551142446 expire 1551142296 last 1551142219 Feb 25 16:55:22 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.3.11@o2ib6) Feb 25 16:55:22 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Feb 25 17:03:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 06f9d9f1-7bd0-5264-d60d-bdb6943859a0 (at 10.8.3.11@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cbe19b4d800, cur 1551143028 expire 1551142878 last 1551142801 Feb 25 17:03:48 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 17:04:52 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.3.11@o2ib6) Feb 25 17:04:52 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 17:35:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 59f57a83-3792-f137-abcf-4d866e4efc34 (at 10.8.18.34@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca1bd6b0000, cur 1551144937 expire 1551144787 last 1551144710 Feb 25 17:35:37 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 17:36:18 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 6d7e0734-7e24-57eb-b6d2-82e887ce4ecf (at 10.8.18.34@o2ib6) Feb 25 17:36:18 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 17:42:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 8a46a015-1bb5-c1bc-442c-693f3f87856f (at 10.8.3.11@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cc3f03d7c00, cur 1551145322 expire 1551145172 last 1551145095 Feb 25 17:42:02 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 17:42:09 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client b0fcf357-e4ba-e6b2-e3a5-f221a20f2e8a (at 10.8.3.11@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cc240f70400, cur 1551145329 expire 1551145179 last 1551145102 Feb 25 17:44:05 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.3.11@o2ib6) Feb 25 17:44:05 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 17:51:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 8f8203d6-652b-3662-9526-9e3d8f796fde (at 10.8.3.11@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca8b4ecec00, cur 1551145901 expire 1551145751 last 1551145674 Feb 25 17:51:41 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 25 17:52:41 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.3.11@o2ib6) Feb 25 17:52:41 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 18:01:07 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client be722a38-7583-32df-4f2f-63157822d9f1 (at 10.8.3.11@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ccf0beb8400, cur 1551146467 expire 1551146317 last 1551146240 Feb 25 18:01:07 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 18:01:49 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.3.11@o2ib6) Feb 25 18:01:49 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 18:05:34 fir-md1-s1 kernel: EXT4-fs (sdk2): error count since last fsck: 5 Feb 25 18:05:34 fir-md1-s1 kernel: EXT4-fs (sdk2): initial error at time 1550022155: ext4_mb_generate_buddy:757 Feb 25 18:05:34 fir-md1-s1 kernel: EXT4-fs (sdk2): last error at time 1550448029: ext4_mb_generate_buddy:757 Feb 25 18:10:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client d94d3be2-2e52-9c9e-d516-b2f07c1624ad (at 10.9.104.42@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c9f93e15800, cur 1551147034 expire 1551146884 last 1551146807 Feb 25 18:10:34 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 18:11:39 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.3.11@o2ib6) Feb 25 18:11:39 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 18:13:25 fir-md1-s1 kernel: Lustre: 21973:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551147198/real 1551147198] req@ffff9caa07a1ef00 x1625959757541488/t0(0) o106->fir-MDT0000@10.8.20.15@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1551147205 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Feb 25 18:13:25 fir-md1-s1 kernel: Lustre: 21973:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Feb 25 18:13:46 fir-md1-s1 kernel: Lustre: 21973:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551147219/real 1551147219] req@ffff9caa07a1ef00 x1625959757541488/t0(0) o106->fir-MDT0000@10.8.20.15@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1551147226 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 25 18:13:46 fir-md1-s1 kernel: Lustre: 21973:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Feb 25 18:14:28 fir-md1-s1 kernel: Lustre: 21973:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551147261/real 1551147261] req@ffff9caa07a1ef00 x1625959757541488/t0(0) o106->fir-MDT0000@10.8.20.15@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1551147268 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 25 18:14:28 fir-md1-s1 kernel: Lustre: 21973:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages Feb 25 18:15:45 fir-md1-s1 kernel: Lustre: 21973:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551147338/real 1551147338] req@ffff9caa07a1ef00 x1625959757541488/t0(0) o106->fir-MDT0000@10.8.20.15@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1551147345 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 25 18:15:45 fir-md1-s1 kernel: Lustre: 21973:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 19 previous similar messages Feb 25 18:16:21 fir-md1-s1 kernel: LustreError: 47903:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.20.15@o2ib6) failed to reply to blocking AST (req@ffff9c9fa2162100 x1625959758100064 status 0 rc -110), evict it ns: mdt-fir-MDT0002_UUID lock: ffff9ca7b0778b40/0xb7044c6446a58520 lrc: 4/0,0 mode: PR/PR res: [0x2c0007181:0xc64:0x0].0x0 bits 0x13/0x0 rrc: 7 type: IBT flags: 0x60200400000020 nid: 10.8.20.15@o2ib6 remote: 0xf4cd71c0f2c29891 expref: 3231 pid: 22177 timeout: 1132453 lvb_type: 0 Feb 25 18:16:21 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0002: A client on nid 10.8.20.15@o2ib6 was evicted due to a lock blocking callback time out: rc -110 Feb 25 18:16:21 fir-md1-s1 kernel: LustreError: 129189:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 105s: evicting client at 10.8.20.15@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff9ca7b0778b40/0xb7044c6446a58520 lrc: 3/0,0 mode: PR/PR res: [0x2c0007181:0xc64:0x0].0x0 bits 0x13/0x0 rrc: 7 type: IBT flags: 0x60200400000020 nid: 10.8.20.15@o2ib6 remote: 0xf4cd71c0f2c29891 expref: 3232 pid: 22177 timeout: 0 lvb_type: 0 Feb 25 18:16:38 fir-md1-s1 kernel: LNet: Service thread pid 21973 was inactive for 200.31s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Feb 25 18:16:38 fir-md1-s1 kernel: Pid: 21973, comm: mdt01_040 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Feb 25 18:16:38 fir-md1-s1 kernel: Call Trace: Feb 25 18:16:38 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x500/0x8d0 [ptlrpc] Feb 25 18:16:38 fir-md1-s1 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] Feb 25 18:16:38 fir-md1-s1 kernel: [] ldlm_glimpse_locks+0x3b/0x100 [ptlrpc] Feb 25 18:16:38 fir-md1-s1 kernel: [] mdt_do_glimpse+0x1e9/0x4c0 [mdt] Feb 25 18:16:38 fir-md1-s1 kernel: [] mdt_glimpse_enqueue+0x3d3/0x4f0 [mdt] Feb 25 18:16:38 fir-md1-s1 kernel: [] mdt_intent_glimpse+0x1f/0x30 [mdt] Feb 25 18:16:38 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Feb 25 18:16:38 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Feb 25 18:16:38 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Feb 25 18:16:38 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Feb 25 18:16:38 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Feb 25 18:16:38 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Feb 25 18:16:38 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Feb 25 18:16:38 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Feb 25 18:16:38 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Feb 25 18:16:38 fir-md1-s1 kernel: [] 0xffffffffffffffff Feb 25 18:16:38 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551147398.21973 Feb 25 18:16:53 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client e70b263b-d75a-a0e0-612f-517d4b2fdf9c (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cbd3f7bbc00, cur 1551147413 expire 1551147263 last 1551147186 Feb 25 18:16:53 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Feb 25 18:16:59 fir-md1-s1 kernel: LNet: Service thread pid 21973 completed after 220.91s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Feb 25 18:19:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client a17b147f-c4f5-4427-2b99-c1baf15f83bd (at 10.8.3.11@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cc23feecc00, cur 1551147597 expire 1551147447 last 1551147370 Feb 25 18:19:57 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 25 18:20:43 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ce807089-fd76-9353-e0a7-32c622e380eb (at 10.8.20.15@o2ib6) Feb 25 18:20:43 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 18:32:44 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.103.38@o2ib4) Feb 25 18:32:44 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 25 18:37:43 fir-md1-s1 kernel: Lustre: 22234:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551148656/real 1551148656] req@ffff9ca89526b000 x1625959771878672/t0(0) o104->fir-MDT0000@10.8.20.15@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1551148663 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Feb 25 18:37:43 fir-md1-s1 kernel: Lustre: 22234:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 16 previous similar messages Feb 25 18:38:04 fir-md1-s1 kernel: Lustre: 22234:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551148677/real 1551148677] req@ffff9ca89526b000 x1625959771878672/t0(0) o104->fir-MDT0000@10.8.20.15@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1551148684 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 25 18:38:04 fir-md1-s1 kernel: Lustre: 22234:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Feb 25 18:38:46 fir-md1-s1 kernel: Lustre: 22234:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551148719/real 1551148719] req@ffff9ca89526b000 x1625959771878672/t0(0) o104->fir-MDT0000@10.8.20.15@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1551148726 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 25 18:38:46 fir-md1-s1 kernel: Lustre: 22234:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages Feb 25 18:40:03 fir-md1-s1 kernel: Lustre: 22234:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551148796/real 1551148796] req@ffff9ca89526b000 x1625959771878672/t0(0) o104->fir-MDT0000@10.8.20.15@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1551148803 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 25 18:40:03 fir-md1-s1 kernel: Lustre: 22234:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 10 previous similar messages Feb 25 18:40:10 fir-md1-s1 kernel: LustreError: 22234:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.20.15@o2ib6) failed to reply to blocking AST (req@ffff9ca89526b000 x1625959771878672 status 0 rc -110), evict it ns: mdt-fir-MDT0000_UUID lock: ffff9ca887a37bc0/0xb7044c644b013017 lrc: 4/0,0 mode: PW/PW res: [0x200006996:0x9:0x0].0x0 bits 0x40/0x0 rrc: 8 type: IBT flags: 0x60200400000020 nid: 10.8.20.15@o2ib6 remote: 0x23048e567d65a488 expref: 187 pid: 22138 timeout: 1133932 lvb_type: 0 Feb 25 18:40:10 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0000: A client on nid 10.8.20.15@o2ib6 was evicted due to a lock blocking callback time out: rc -110 Feb 25 18:40:10 fir-md1-s1 kernel: LustreError: 129189:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 154s: evicting client at 10.8.20.15@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff9ca887a37bc0/0xb7044c644b013017 lrc: 3/0,0 mode: PW/PW res: [0x200006996:0x9:0x0].0x0 bits 0x40/0x0 rrc: 8 type: IBT flags: 0x60200400000020 nid: 10.8.20.15@o2ib6 remote: 0x23048e567d65a488 expref: 188 pid: 22138 timeout: 0 lvb_type: 0 Feb 25 18:41:17 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client da8757ff-50c9-c1f3-f5d8-9cf91df486ce (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c950ff57400, cur 1551148877 expire 1551148727 last 1551148650 Feb 25 18:41:17 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 18:42:00 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.104.43@o2ib4) Feb 25 18:42:00 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 25 18:45:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 9da9cc6a-d27d-cf84-c5da-3812acf847ad (at 10.8.3.11@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb4f8056800, cur 1551149107 expire 1551148957 last 1551148880 Feb 25 18:45:07 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 25 18:48:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 819a2974-4f03-e3b0-601d-2d97317e6637 (at 10.9.103.3@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca82b742c00, cur 1551149293 expire 1551149143 last 1551149066 Feb 25 18:48:13 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 19:00:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client eb3b9013-3ee1-3a96-8e91-182000608f99 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c9df662cc00, cur 1551150010 expire 1551149860 last 1551149783 Feb 25 19:00:10 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 19:00:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client eb3b9013-3ee1-3a96-8e91-182000608f99 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca8a43a4800, cur 1551150015 expire 1551149865 last 1551149788 Feb 25 19:00:15 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 25 19:06:21 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ce807089-fd76-9353-e0a7-32c622e380eb (at 10.8.20.15@o2ib6) Feb 25 19:06:21 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Feb 25 19:21:10 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 819a2974-4f03-e3b0-601d-2d97317e6637 (at 10.9.103.3@o2ib4) Feb 25 19:21:10 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 19:26:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client b3fce559-01af-1741-be02-c46bc4b5ebb8 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cbc65efac00, cur 1551151613 expire 1551151463 last 1551151386 Feb 25 19:30:32 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ce807089-fd76-9353-e0a7-32c622e380eb (at 10.8.20.15@o2ib6) Feb 25 19:30:32 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 19:31:29 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 25dfc4d7-d4a2-e435-cafa-8075e28c59e0 (at 10.9.101.43@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca335b32c00, cur 1551151889 expire 1551151739 last 1551151662 Feb 25 19:31:29 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 19:31:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 2c96ed5b-7b98-2819-9f2b-c8d6f7172439 (at 10.9.101.43@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca7e2216000, cur 1551151897 expire 1551151747 last 1551151670 Feb 25 19:31:37 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 25 19:40:45 fir-md1-s1 kernel: Lustre: 21826:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551152438/real 1551152438] req@ffff9cbca0bb4b00 x1625959812481504/t0(0) o104->fir-MDT0002@10.8.20.15@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1551152445 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Feb 25 19:40:45 fir-md1-s1 kernel: Lustre: 21826:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Feb 25 19:41:06 fir-md1-s1 kernel: Lustre: 21826:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551152459/real 1551152459] req@ffff9cbca0bb4b00 x1625959812481504/t0(0) o104->fir-MDT0002@10.8.20.15@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1551152466 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 25 19:41:06 fir-md1-s1 kernel: Lustre: 21826:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Feb 25 19:41:49 fir-md1-s1 kernel: Lustre: 21826:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551152501/real 1551152501] req@ffff9cbca0bb4b00 x1625959812481504/t0(0) o104->fir-MDT0002@10.8.20.15@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1551152508 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 25 19:41:49 fir-md1-s1 kernel: Lustre: 21826:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages Feb 25 19:43:04 fir-md1-s1 kernel: Lustre: 50316:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551152577/real 1551152577] req@ffff9cbf32308900 x1625959814047488/t0(0) o104->fir-MDT0002@10.8.20.15@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1551152584 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 25 19:43:04 fir-md1-s1 kernel: Lustre: 50316:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 11 previous similar messages Feb 25 19:43:13 fir-md1-s1 kernel: LustreError: 21826:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.20.15@o2ib6) failed to reply to blocking AST (req@ffff9cbca0bb4b00 x1625959812481504 status 0 rc -110), evict it ns: mdt-fir-MDT0002_UUID lock: ffff9caa9d5260c0/0xb7044c6454cbaecd lrc: 4/0,0 mode: PR/PR res: [0x2c0007181:0xc64:0x0].0x0 bits 0x13/0x0 rrc: 9 type: IBT flags: 0x60200400000020 nid: 10.8.20.15@o2ib6 remote: 0xbefec15fe3acf20d expref: 37 pid: 22177 timeout: 1137714 lvb_type: 0 Feb 25 19:43:13 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0002: A client on nid 10.8.20.15@o2ib6 was evicted due to a lock blocking callback time out: rc -110 Feb 25 19:43:13 fir-md1-s1 kernel: LustreError: 129189:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 155s: evicting client at 10.8.20.15@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff9caa9d5260c0/0xb7044c6454cbaecd lrc: 3/0,0 mode: PR/PR res: [0x2c0007181:0xc64:0x0].0x0 bits 0x13/0x0 rrc: 9 type: IBT flags: 0x60200400000020 nid: 10.8.20.15@o2ib6 remote: 0xbefec15fe3acf20d expref: 38 pid: 22177 timeout: 0 lvb_type: 0 Feb 25 19:43:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 8d831dd0-3dd8-79f5-f509-c084d8b78f7a (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cc1f5303400, cur 1551152638 expire 1551152488 last 1551152411 Feb 25 19:47:37 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ce807089-fd76-9353-e0a7-32c622e380eb (at 10.8.20.15@o2ib6) Feb 25 19:47:37 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 20:01:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 6ad65c1d-1c56-46ff-89db-441d3f64285e (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb785b68400, cur 1551153664 expire 1551153514 last 1551153437 Feb 25 20:01:04 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 25 20:02:30 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.101.43@o2ib4) Feb 25 20:02:30 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 20:10:48 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ce807089-fd76-9353-e0a7-32c622e380eb (at 10.8.20.15@o2ib6) Feb 25 20:10:48 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 20:14:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 044d49a5-61d5-0c66-16c0-c4250e19a31c (at 10.8.13.16@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cabfabe6c00, cur 1551154464 expire 1551154314 last 1551154237 Feb 25 20:14:24 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 20:14:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 044d49a5-61d5-0c66-16c0-c4250e19a31c (at 10.8.13.16@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c9f6dd5ec00, cur 1551154482 expire 1551154332 last 1551154255 Feb 25 20:37:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client f7811602-a35e-c9f6-69af-1aa2c35d6293 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9caba7af1400, cur 1551155829 expire 1551155679 last 1551155602 Feb 25 20:37:09 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 25 20:38:54 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ce807089-fd76-9353-e0a7-32c622e380eb (at 10.8.20.15@o2ib6) Feb 25 20:38:54 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 20:46:27 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ca0be302-894c-ca1c-bd93-173e301ee769 (at 10.8.13.16@o2ib6) Feb 25 20:46:27 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 21:00:26 fir-md1-s1 kernel: Lustre: 22267:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551157219/real 1551157219] req@ffff9cb28d7bd700 x1625959866404992/t0(0) o104->fir-MDT0002@10.8.11.9@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1551157226 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Feb 25 21:00:26 fir-md1-s1 kernel: Lustre: 22267:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages Feb 25 21:00:47 fir-md1-s1 kernel: Lustre: 22267:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551157240/real 1551157240] req@ffff9cb28d7bd700 x1625959866404992/t0(0) o104->fir-MDT0002@10.8.11.9@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1551157247 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 25 21:00:47 fir-md1-s1 kernel: Lustre: 22267:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Feb 25 21:00:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 2686cabc-4957-43b2-fd06-e63bbe689c41 (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cc35afbd400, cur 1551157255 expire 1551157105 last 1551157028 Feb 25 21:00:55 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 21:01:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 2686cabc-4957-43b2-fd06-e63bbe689c41 (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cac1a75a000, cur 1551157261 expire 1551157111 last 1551157034 Feb 25 21:01:01 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 25 21:03:20 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 74a03b4f-d0c9-a84a-2aa4-fc50ef9db767 (at 10.8.11.9@o2ib6) Feb 25 21:03:20 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 21:12:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client f411b7e5-c3ac-92fa-a719-7c8c471e0ce5 (at 10.8.18.34@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cabdfff8400, cur 1551157969 expire 1551157819 last 1551157742 Feb 25 21:12:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client f411b7e5-c3ac-92fa-a719-7c8c471e0ce5 (at 10.8.18.34@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cc458352800, cur 1551157978 expire 1551157828 last 1551157751 Feb 25 21:13:52 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 6d7e0734-7e24-57eb-b6d2-82e887ce4ecf (at 10.8.18.34@o2ib6) Feb 25 21:13:52 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 21:24:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 2a5bd73b-b039-4236-618f-951a1ce35476 (at 10.8.18.34@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c950bfd2000, cur 1551158657 expire 1551158507 last 1551158430 Feb 25 21:24:17 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 25 21:24:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 2a5bd73b-b039-4236-618f-951a1ce35476 (at 10.8.18.34@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cba9c309800, cur 1551158663 expire 1551158513 last 1551158436 Feb 25 21:25:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 4d8bcdc8-d35e-7acb-6372-5603c5ac3d2b (at 10.8.18.35@o2ib6) in 179 seconds. I think it's dead, and I am evicting it. exp ffff9cafdd606c00, cur 1551158733 expire 1551158583 last 1551158554 Feb 25 21:25:33 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 25 21:25:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 4d8bcdc8-d35e-7acb-6372-5603c5ac3d2b (at 10.8.18.35@o2ib6) in 185 seconds. I think it's dead, and I am evicting it. exp ffff9ca4093e1400, cur 1551158739 expire 1551158589 last 1551158554 Feb 25 21:25:43 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 6d7e0734-7e24-57eb-b6d2-82e887ce4ecf (at 10.8.18.34@o2ib6) Feb 25 21:25:43 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 21:25:47 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 252849ea-4610-daa4-240b-4bf6ea8ff8a2 (at 10.8.18.35@o2ib6) Feb 25 21:25:47 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 21:26:21 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 549f7cee-c540-f928-8d25-e0057364fa73 (at 10.8.18.35@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cc4537ddc00, cur 1551158781 expire 1551158631 last 1551158554 Feb 25 21:32:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 67078a83-293a-3a96-655b-e1d1456aff5d (at 10.8.18.35@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cad123bf000, cur 1551159175 expire 1551159025 last 1551158948 Feb 25 21:33:08 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 252849ea-4610-daa4-240b-4bf6ea8ff8a2 (at 10.8.18.35@o2ib6) Feb 25 21:33:08 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 21:34:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client f6596a8c-3f77-8521-5252-5a76feead9f0 (at 10.8.3.11@o2ib6) in 204 seconds. I think it's dead, and I am evicting it. exp ffff9cc41de51800, cur 1551159251 expire 1551159101 last 1551159047 Feb 25 21:34:11 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 21:36:36 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.3.11@o2ib6) Feb 25 21:36:36 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 21:46:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 7a9f52f3-fad7-d644-ae70-019d1a25b459 (at 10.8.18.34@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9caa10e2d800, cur 1551159976 expire 1551159826 last 1551159749 Feb 25 21:46:16 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 21:47:14 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 6d7e0734-7e24-57eb-b6d2-82e887ce4ecf (at 10.8.18.34@o2ib6) Feb 25 21:47:14 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 21:47:32 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 6ad948c5-f32f-156a-eb67-ce3deb619a77 (at 10.8.18.35@o2ib6) in 208 seconds. I think it's dead, and I am evicting it. exp ffff9cd36eaf8400, cur 1551160052 expire 1551159902 last 1551159844 Feb 25 21:47:32 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 21:48:39 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 252849ea-4610-daa4-240b-4bf6ea8ff8a2 (at 10.8.18.35@o2ib6) Feb 25 21:48:39 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 21:56:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 0ce06e46-72c7-b07f-f910-fb248deafcb9 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca3499f9400, cur 1551160604 expire 1551160454 last 1551160377 Feb 25 21:56:44 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 21:59:49 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ce807089-fd76-9353-e0a7-32c622e380eb (at 10.8.20.15@o2ib6) Feb 25 21:59:49 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 22:00:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 16b26052-afa3-f913-3f13-3b653d2521a8 (at 10.8.30.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb78d2ff400, cur 1551160811 expire 1551160661 last 1551160584 Feb 25 22:00:11 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 22:20:47 fir-md1-s1 kernel: Lustre: 22254:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551162040/real 1551162040] req@ffff9c961e2b9500 x1625959889182944/t0(0) o104->fir-MDT0002@10.8.20.15@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1551162047 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Feb 25 22:20:47 fir-md1-s1 kernel: Lustre: 22254:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Feb 25 22:20:54 fir-md1-s1 kernel: Lustre: 22254:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551162047/real 1551162047] req@ffff9c961e2b9500 x1625959889182944/t0(0) o104->fir-MDT0002@10.8.20.15@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1551162054 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 25 22:21:08 fir-md1-s1 kernel: Lustre: 22254:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551162061/real 1551162061] req@ffff9c961e2b9500 x1625959889182944/t0(0) o104->fir-MDT0002@10.8.20.15@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1551162068 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 25 22:21:08 fir-md1-s1 kernel: Lustre: 22254:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Feb 25 22:21:29 fir-md1-s1 kernel: Lustre: 22254:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551162082/real 1551162082] req@ffff9c961e2b9500 x1625959889182944/t0(0) o104->fir-MDT0002@10.8.20.15@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1551162089 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 25 22:21:29 fir-md1-s1 kernel: Lustre: 22254:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Feb 25 22:22:11 fir-md1-s1 kernel: Lustre: 22254:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551162124/real 1551162124] req@ffff9c961e2b9500 x1625959889182944/t0(0) o104->fir-MDT0002@10.8.20.15@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1551162131 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 25 22:22:11 fir-md1-s1 kernel: Lustre: 22254:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages Feb 25 22:23:14 fir-md1-s1 kernel: LustreError: 22254:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.20.15@o2ib6) failed to reply to blocking AST (req@ffff9c961e2b9500 x1625959889182944 status 0 rc -110), evict it ns: mdt-fir-MDT0002_UUID lock: ffff9cbabce55a00/0xb7044c6473266257 lrc: 4/0,0 mode: PR/PR res: [0x2c0007626:0x3e5e:0x0].0x0 bits 0x13/0x0 rrc: 63 type: IBT flags: 0x60200400000020 nid: 10.8.20.15@o2ib6 remote: 0x3d32a58c690067f0 expref: 89 pid: 50303 timeout: 1147316 lvb_type: 0 Feb 25 22:23:14 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0002: A client on nid 10.8.20.15@o2ib6 was evicted due to a lock blocking callback time out: rc -110 Feb 25 22:23:14 fir-md1-s1 kernel: LustreError: 129189:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 154s: evicting client at 10.8.20.15@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff9cbabce55a00/0xb7044c6473266257 lrc: 3/0,0 mode: PR/PR res: [0x2c0007626:0x3e5e:0x0].0x0 bits 0x13/0x0 rrc: 63 type: IBT flags: 0x60200400000020 nid: 10.8.20.15@o2ib6 remote: 0x3d32a58c690067f0 expref: 90 pid: 50303 timeout: 0 lvb_type: 0 Feb 25 22:24:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client da953acd-2596-440d-58c4-8374a6a21470 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cc2b1a2e800, cur 1551162248 expire 1551162098 last 1551162021 Feb 25 22:24:08 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Feb 25 22:24:15 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ce807089-fd76-9353-e0a7-32c622e380eb (at 10.8.20.15@o2ib6) Feb 25 22:24:15 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 22:29:18 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ee0af47e-89ac-3267-6f3f-6741e244f0eb (at 10.8.24.30@o2ib6) Feb 25 22:29:18 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 22:29:31 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.30.23@o2ib6) Feb 25 22:29:31 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 22:30:02 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 47fad81f-b096-ac9a-4bb2-3c0bd8f3f55d (at 10.8.24.26@o2ib6) Feb 25 22:30:02 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 22:30:13 fir-md1-s1 kernel: Lustre: MGS: Connection restored to b6bd4729-6340-f90c-f09c-89c915400705 (at 10.8.30.9@o2ib6) Feb 25 22:30:13 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 25 22:30:28 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 340fde16-d5c9-2ec3-b5c5-4d2188018462 (at 10.8.30.10@o2ib6) Feb 25 22:30:28 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 22:34:00 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 0ea10329-d4a8-3b07-3ddd-dc7d4d43e072 (at 10.8.30.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cbb592b3400, cur 1551162840 expire 1551162690 last 1551162613 Feb 25 22:34:00 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 25 22:34:08 fir-md1-s1 kernel: Lustre: MGS: Connection restored to b6bd4729-6340-f90c-f09c-89c915400705 (at 10.8.30.9@o2ib6) Feb 25 22:34:08 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 25 22:34:15 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client ecc6aef0-dc0a-3dfa-b81e-4b6bace22b92 (at 10.8.30.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca4a9704800, cur 1551162855 expire 1551162705 last 1551162628 Feb 25 23:11:23 fir-md1-s1 kernel: Lustre: 22211:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551165076/real 1551165076] req@ffff9cb35ab90900 x1625959901690528/t0(0) o104->fir-MDT0002@10.8.20.15@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1551165083 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Feb 25 23:11:23 fir-md1-s1 kernel: Lustre: 22211:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 9 previous similar messages Feb 25 23:11:37 fir-md1-s1 kernel: Lustre: 22211:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551165090/real 1551165090] req@ffff9cb35ab90900 x1625959901690528/t0(0) o104->fir-MDT0002@10.8.20.15@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1551165097 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 25 23:11:37 fir-md1-s1 kernel: Lustre: 22211:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Feb 25 23:11:58 fir-md1-s1 kernel: Lustre: 22211:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551165111/real 1551165111] req@ffff9cb35ab90900 x1625959901690528/t0(0) o104->fir-MDT0002@10.8.20.15@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1551165118 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 25 23:11:58 fir-md1-s1 kernel: Lustre: 22211:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages Feb 25 23:12:40 fir-md1-s1 kernel: Lustre: 22211:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551165153/real 1551165153] req@ffff9cb35ab90900 x1625959901690528/t0(0) o104->fir-MDT0002@10.8.20.15@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1551165160 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 25 23:12:40 fir-md1-s1 kernel: Lustre: 22211:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 11 previous similar messages Feb 25 23:13:50 fir-md1-s1 kernel: LustreError: 22211:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.20.15@o2ib6) failed to reply to blocking AST (req@ffff9cb35ab90900 x1625959901690528 status 0 rc -110), evict it ns: mdt-fir-MDT0002_UUID lock: ffff9cabbb6c8240/0xb7044c6476ffeb67 lrc: 4/0,0 mode: PR/PR res: [0x2c0007626:0x3e5e:0x0].0x0 bits 0x13/0x0 rrc: 250 type: IBT flags: 0x60200400000020 nid: 10.8.20.15@o2ib6 remote: 0xa7c00ab130a8f4d6 expref: 74 pid: 21931 timeout: 1150352 lvb_type: 0 Feb 25 23:13:50 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0002: A client on nid 10.8.20.15@o2ib6 was evicted due to a lock blocking callback time out: rc -110 Feb 25 23:13:50 fir-md1-s1 kernel: LustreError: 129189:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 154s: evicting client at 10.8.20.15@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff9cabbb6c8240/0xb7044c6476ffeb67 lrc: 3/0,0 mode: PR/PR res: [0x2c0007626:0x3e5e:0x0].0x0 bits 0x13/0x0 rrc: 250 type: IBT flags: 0x60200400000020 nid: 10.8.20.15@o2ib6 remote: 0xa7c00ab130a8f4d6 expref: 75 pid: 21931 timeout: 0 lvb_type: 0 Feb 25 23:14:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 62b86656-e49b-4aad-513b-c6fdd686a279 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca92520c000, cur 1551165292 expire 1551165142 last 1551165065 Feb 25 23:14:52 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 25 23:16:14 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ce807089-fd76-9353-e0a7-32c622e380eb (at 10.8.20.15@o2ib6) Feb 25 23:16:14 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 23:20:12 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ce807089-fd76-9353-e0a7-32c622e380eb (at 10.8.20.15@o2ib6) Feb 25 23:20:12 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 23:20:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 857874a0-6c65-87fc-b21e-d57cfb925c3f (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c960d0e6400, cur 1551165653 expire 1551165503 last 1551165426 Feb 25 23:20:53 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 25 23:30:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client cffdbbde-fe7e-e65c-96dd-0c657ff57df3 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb9ddf1ac00, cur 1551166218 expire 1551166068 last 1551165991 Feb 25 23:30:18 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 23:38:44 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ce807089-fd76-9353-e0a7-32c622e380eb (at 10.8.20.15@o2ib6) Feb 25 23:38:44 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 23:51:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 581597c4-fc7f-ab16-a755-4b98fb05a39e (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca9aca87800, cur 1551167486 expire 1551167336 last 1551167259 Feb 25 23:51:26 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 23:51:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 581597c4-fc7f-ab16-a755-4b98fb05a39e (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cc19fb63800, cur 1551167503 expire 1551167353 last 1551167276 Feb 25 23:51:43 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Feb 25 23:56:24 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 74a03b4f-d0c9-a84a-2aa4-fc50ef9db767 (at 10.8.11.9@o2ib6) Feb 25 23:56:24 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 25 23:56:40 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ce807089-fd76-9353-e0a7-32c622e380eb (at 10.8.20.15@o2ib6) Feb 25 23:56:40 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 00:00:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client a82d0417-5354-7090-8c74-27f558bf90cb (at 10.9.103.27@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c9965706400, cur 1551168012 expire 1551167862 last 1551167785 Feb 26 00:00:12 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 26 00:34:19 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.103.27@o2ib4) Feb 26 00:34:19 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 00:51:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client ab18c7b8-19fe-0a85-20d5-e8ebe1d8b280 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca23b423000, cur 1551171116 expire 1551170966 last 1551170889 Feb 26 00:51:56 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 00:56:40 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ce807089-fd76-9353-e0a7-32c622e380eb (at 10.8.20.15@o2ib6) Feb 26 00:56:40 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 01:57:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 42e3d8b1-2fbe-8c39-9aa5-ba3ebc735d9f (at 10.8.15.10@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca7a179ac00, cur 1551175062 expire 1551174912 last 1551174835 Feb 26 01:57:42 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 01:57:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 42e3d8b1-2fbe-8c39-9aa5-ba3ebc735d9f (at 10.8.15.10@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cbef87df400, cur 1551175063 expire 1551174913 last 1551174836 Feb 26 02:10:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 06a0640d-de7f-9715-947b-5ac203d15e9f (at 10.0.10.3@o2ib7) reconnecting Feb 26 02:10:00 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.0.10.3@o2ib7, removing former export from same NID Feb 26 02:10:00 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 06a0640d-de7f-9715-947b-5ac203d15e9f (at 10.0.10.3@o2ib7) Feb 26 02:10:00 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 03:32:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 0909ba82-9930-4742-92e1-1f27f6cb6323 (at 10.8.18.35@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca977aba800, cur 1551180770 expire 1551180620 last 1551180543 Feb 26 03:32:50 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 26 03:40:56 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 252849ea-4610-daa4-240b-4bf6ea8ff8a2 (at 10.8.18.35@o2ib6) Feb 26 03:40:56 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 26 03:43:15 fir-md1-s1 kernel: Lustre: 51451:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551181388/real 1551181388] req@ffff9cd177236f00 x1625959970715696/t0(0) o104->fir-MDT0000@10.8.3.11@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1551181395 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Feb 26 03:43:15 fir-md1-s1 kernel: Lustre: 51451:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 20 previous similar messages Feb 26 03:43:29 fir-md1-s1 kernel: Lustre: 51451:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551181402/real 1551181402] req@ffff9cd177236f00 x1625959970715696/t0(0) o104->fir-MDT0000@10.8.3.11@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1551181409 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 26 03:43:29 fir-md1-s1 kernel: Lustre: 51451:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Feb 26 03:43:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 7319d1c9-6e29-bb1b-6c4e-7953610895a2 (at 10.8.3.11@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca7f5b5e000, cur 1551181425 expire 1551181275 last 1551181198 Feb 26 03:43:45 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 03:43:50 fir-md1-s1 kernel: Lustre: 51451:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551181423/real 1551181423] req@ffff9cd177236f00 x1625959970715696/t0(0) o104->fir-MDT0000@10.8.3.11@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1551181430 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 26 03:43:50 fir-md1-s1 kernel: Lustre: 51451:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Feb 26 03:44:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 7319d1c9-6e29-bb1b-6c4e-7953610895a2 (at 10.8.3.11@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cc4a535dc00, cur 1551181440 expire 1551181290 last 1551181213 Feb 26 03:44:00 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 26 03:44:53 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 252849ea-4610-daa4-240b-4bf6ea8ff8a2 (at 10.8.18.35@o2ib6) Feb 26 03:44:53 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 03:44:55 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.3.11@o2ib6) Feb 26 03:44:55 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 03:45:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client fd2175c4-6e22-c21f-3953-e6131baee4bf (at 10.8.18.35@o2ib6) in 217 seconds. I think it's dead, and I am evicting it. exp ffff9c9c25867000, cur 1551181501 expire 1551181351 last 1551181284 Feb 26 03:45:01 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 26 03:53:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 0e938983-d3df-1102-1bfb-0c5d01e88c96 (at 10.8.3.11@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb45862c800, cur 1551182001 expire 1551181851 last 1551181774 Feb 26 03:53:21 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 26 03:54:25 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.3.11@o2ib6) Feb 26 03:54:25 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 04:02:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 109dabda-4797-b0a5-89ea-028612db99a3 (at 10.8.26.33@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb12d229c00, cur 1551182561 expire 1551182411 last 1551182334 Feb 26 04:02:41 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 04:04:02 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.26.33@o2ib6) Feb 26 04:04:02 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 04:45:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client bd83aa88-0d86-0b70-e3b2-63276f1aee23 (at 10.8.3.11@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb43f2bc000, cur 1551185155 expire 1551185005 last 1551184928 Feb 26 04:45:55 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 04:46:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client bd83aa88-0d86-0b70-e3b2-63276f1aee23 (at 10.8.3.11@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ccd626f2800, cur 1551185172 expire 1551185022 last 1551184945 Feb 26 04:46:12 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 26 04:48:05 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.3.11@o2ib6) Feb 26 04:48:05 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 05:12:33 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 252849ea-4610-daa4-240b-4bf6ea8ff8a2 (at 10.8.18.35@o2ib6) Feb 26 05:12:33 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 05:13:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 911bba1d-aac3-a561-aabc-a836fe660fe3 (at 10.8.18.35@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9caaccff3400, cur 1551186785 expire 1551186635 last 1551186558 Feb 26 05:14:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 3c8de293-496c-dae1-016a-77f5d18cf419 (at 10.8.11.9@o2ib6) in 157 seconds. I think it's dead, and I am evicting it. exp ffff9cb4a5ab2000, cur 1551186861 expire 1551186711 last 1551186704 Feb 26 05:14:21 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 05:15:31 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 910cca72-fbc3-4a9d-3f73-3a077a200e69 (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c98af988000, cur 1551186931 expire 1551186781 last 1551186704 Feb 26 05:15:31 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 26 05:17:39 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 74a03b4f-d0c9-a84a-2aa4-fc50ef9db767 (at 10.8.11.9@o2ib6) Feb 26 05:17:39 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 06:10:41 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 6ba849c0-4d47-300c-c82a-86ab6fe29a8a (at 10.8.18.35@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cccd52b3400, cur 1551190241 expire 1551190091 last 1551190014 Feb 26 06:11:20 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 252849ea-4610-daa4-240b-4bf6ea8ff8a2 (at 10.8.18.35@o2ib6) Feb 26 06:11:20 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 06:15:07 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client a2d7f602-beb3-ff48-d0f1-d0b5f2324af1 (at 10.8.18.35@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca49ca30c00, cur 1551190507 expire 1551190357 last 1551190280 Feb 26 06:15:07 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 06:15:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 76c51379-dde3-4295-8312-25d0f46055a9 (at 10.8.18.35@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb3baed4c00, cur 1551190510 expire 1551190360 last 1551190283 Feb 26 06:15:50 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 252849ea-4610-daa4-240b-4bf6ea8ff8a2 (at 10.8.18.35@o2ib6) Feb 26 06:15:50 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 06:22:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client bd2d5f7f-4b3c-0d29-45f5-a53d85119c72 (at 10.8.18.35@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cabd662fc00, cur 1551190931 expire 1551190781 last 1551190704 Feb 26 06:22:11 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 26 06:22:25 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 252849ea-4610-daa4-240b-4bf6ea8ff8a2 (at 10.8.18.35@o2ib6) Feb 26 06:22:25 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 06:27:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 22a5922f-9d58-d21e-7a88-c946eba96d2d (at 10.8.18.35@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca32c6d2c00, cur 1551191250 expire 1551191100 last 1551191023 Feb 26 06:27:30 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 06:30:23 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 252849ea-4610-daa4-240b-4bf6ea8ff8a2 (at 10.8.18.35@o2ib6) Feb 26 06:30:23 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 06:34:10 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client f24044a1-89a2-ff1e-1fcd-451f2b0fc9b9 (at 10.8.18.35@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb12435e800, cur 1551191650 expire 1551191500 last 1551191423 Feb 26 06:34:10 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 06:34:34 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 252849ea-4610-daa4-240b-4bf6ea8ff8a2 (at 10.8.18.35@o2ib6) Feb 26 06:34:34 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 06:41:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 71bf8a1a-0452-bb7d-4be1-13ec78d48399 (at 10.8.18.35@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb30d716c00, cur 1551192079 expire 1551191929 last 1551191852 Feb 26 06:41:19 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 06:42:15 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 252849ea-4610-daa4-240b-4bf6ea8ff8a2 (at 10.8.18.35@o2ib6) Feb 26 06:42:15 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 06:46:02 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client cd946d37-3a66-a68f-3462-5824ba2cb1fe (at 10.8.18.35@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cad8c38d400, cur 1551192362 expire 1551192212 last 1551192135 Feb 26 06:46:02 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 06:46:33 fir-md1-s1 kernel: Lustre: 21906:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0000: Failure to clear the changelog for user 1: -22 Feb 26 06:46:33 fir-md1-s1 kernel: Lustre: 21906:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 28 previous similar messages Feb 26 06:46:34 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 252849ea-4610-daa4-240b-4bf6ea8ff8a2 (at 10.8.18.35@o2ib6) Feb 26 06:46:34 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 06:50:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 21e60294-3604-74ca-986d-7feabe89217b (at 10.8.3.11@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca600f32c00, cur 1551192640 expire 1551192490 last 1551192413 Feb 26 06:50:40 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 06:51:49 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.3.11@o2ib6) Feb 26 06:51:49 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 06:57:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client d8f8750c-f7b2-c810-7c7b-ca2259742a22 (at 10.8.18.35@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9caac93f8800, cur 1551193076 expire 1551192926 last 1551192849 Feb 26 06:57:56 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 07:00:16 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 252849ea-4610-daa4-240b-4bf6ea8ff8a2 (at 10.8.18.35@o2ib6) Feb 26 07:00:16 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 07:04:03 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client c031e00f-62c4-80cc-850e-cde2362eefaf (at 10.8.18.35@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cabfd7c3c00, cur 1551193443 expire 1551193293 last 1551193216 Feb 26 07:04:03 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 07:04:22 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 252849ea-4610-daa4-240b-4bf6ea8ff8a2 (at 10.8.18.35@o2ib6) Feb 26 07:04:22 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 07:13:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client a1aab7fc-9d47-9d13-6399-d761c763291c (at 10.8.18.35@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca760350c00, cur 1551194018 expire 1551193868 last 1551193791 Feb 26 07:13:38 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 07:16:41 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.3.11@o2ib6) Feb 26 07:16:41 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 07:29:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 4d299f05-b526-db6f-54d0-b5a94e0b6d6e (at 10.8.25.33@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca85822c800, cur 1551194969 expire 1551194819 last 1551194742 Feb 26 07:29:29 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 26 07:46:51 fir-md1-s1 kernel: LNet: Service thread pid 22156 was inactive for 200.45s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Feb 26 07:46:51 fir-md1-s1 kernel: Pid: 22156, comm: mdt00_039 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Feb 26 07:46:51 fir-md1-s1 kernel: Call Trace: Feb 26 07:46:51 fir-md1-s1 kernel: [] ldlm_completion_ast+0x5b1/0x920 [ptlrpc] Feb 26 07:46:51 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Feb 26 07:46:51 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Feb 26 07:46:51 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Feb 26 07:46:51 fir-md1-s1 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Feb 26 07:46:51 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Feb 26 07:46:51 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Feb 26 07:46:51 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Feb 26 07:46:51 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Feb 26 07:46:51 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Feb 26 07:46:51 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Feb 26 07:46:51 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Feb 26 07:46:51 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Feb 26 07:46:51 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Feb 26 07:46:51 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Feb 26 07:46:51 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Feb 26 07:46:51 fir-md1-s1 kernel: [] 0xffffffffffffffff Feb 26 07:46:51 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551196011.22156 Feb 26 07:46:53 fir-md1-s1 kernel: LNet: Service thread pid 21885 was inactive for 201.99s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Feb 26 07:46:53 fir-md1-s1 kernel: Pid: 21885, comm: mdt00_014 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Feb 26 07:46:53 fir-md1-s1 kernel: Call Trace: Feb 26 07:46:53 fir-md1-s1 kernel: [] ldlm_completion_ast+0x5b1/0x920 [ptlrpc] Feb 26 07:46:53 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Feb 26 07:46:53 fir-md1-s1 kernel: [] mdt_dom_discard_data+0x101/0x130 [mdt] Feb 26 07:46:53 fir-md1-s1 kernel: [] mdt_reint_unlink+0x331/0x14a0 [mdt] Feb 26 07:46:53 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Feb 26 07:46:53 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Feb 26 07:46:53 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Feb 26 07:46:53 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Feb 26 07:46:53 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Feb 26 07:46:53 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Feb 26 07:46:53 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Feb 26 07:46:53 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Feb 26 07:46:53 fir-md1-s1 kernel: [] 0xffffffffffffffff Feb 26 07:47:10 fir-md1-s1 kernel: LNet: Service thread pid 21882 was inactive for 200.54s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Feb 26 07:47:10 fir-md1-s1 kernel: Pid: 21882, comm: mdt00_012 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Feb 26 07:47:10 fir-md1-s1 kernel: Call Trace: Feb 26 07:47:10 fir-md1-s1 kernel: [] ldlm_completion_ast+0x5b1/0x920 [ptlrpc] Feb 26 07:47:10 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Feb 26 07:47:10 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Feb 26 07:47:10 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Feb 26 07:47:10 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x101d/0x1c30 [mdt] Feb 26 07:47:10 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Feb 26 07:47:10 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Feb 26 07:47:10 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Feb 26 07:47:10 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Feb 26 07:47:10 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Feb 26 07:47:10 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Feb 26 07:47:10 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Feb 26 07:47:10 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Feb 26 07:47:10 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Feb 26 07:47:10 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Feb 26 07:47:10 fir-md1-s1 kernel: [] 0xffffffffffffffff Feb 26 07:47:10 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551196030.21882 Feb 26 07:48:31 fir-md1-s1 kernel: LustreError: 21885:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551195811, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff9ca3a2037bc0/0xb7044c649884b346 lrc: 3/0,1 mode: --/PW res: [0x2c0007301:0x2f0:0x0].0x0 bits 0x40/0x0 rrc: 6 type: IBT flags: 0x40010080000000 nid: local remote: 0x0 expref: -99 pid: 21885 timeout: 0 lvb_type: 0 Feb 26 07:48:31 fir-md1-s1 kernel: LustreError: 22156:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551195811, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff9c9ec8ec0fc0/0xb7044c649884b33f lrc: 3/1,0 mode: --/PR res: [0x2c0007301:0x2f0:0x0].0x0 bits 0x13/0x8 rrc: 5 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 22156 timeout: 0 lvb_type: 0 Feb 26 07:48:31 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551196111.22156 Feb 26 07:48:49 fir-md1-s1 kernel: LustreError: 21882:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551195829, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff9ca456ec5100/0xb7044c64988bacd2 lrc: 3/1,0 mode: --/PR res: [0x2c0007328:0x173:0x0].0xff4b9336 bits 0x2/0x0 rrc: 3 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 21882 timeout: 0 lvb_type: 0 Feb 26 07:52:43 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 0d5af6de-ba6f-a8f8-db2a-a151ccd51175 (at 10.8.3.11@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cca57257c00, cur 1551196363 expire 1551196213 last 1551196136 Feb 26 07:52:43 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Feb 26 07:53:26 fir-md1-s1 kernel: Lustre: 21605:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply#012 req@ffff9c9b12632a00 x1626126029990768/t0(0) o101->25348431-febb-9cbc-ab46-773891c4a456@10.9.101.13@o2ib4:591/0 lens 576/3264 e 24 to 0 dl 1551196411 ref 2 fl Interpret:/0/0 rc 0/0 Feb 26 07:53:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 06a1a62f-4958-2183-fc0f-5921522e8fe4 (at 10.9.101.47@o2ib4) reconnecting Feb 26 07:53:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7588591d-55f5-1531-1f50-b7daa2833142 (at 10.9.101.13@o2ib4) Feb 26 07:53:32 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 26 07:53:32 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 26 07:53:44 fir-md1-s1 kernel: Lustre: 21605:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply#012 req@ffff9c94ec029e00 x1626180601650208/t0(0) o101->5a7ede41-749c-11f0-4812-c7172f7ef9a2@10.9.101.2@o2ib4:609/0 lens 616/3264 e 24 to 0 dl 1551196429 ref 2 fl Interpret:/0/0 rc 0/0 Feb 26 07:53:44 fir-md1-s1 kernel: Lustre: 21605:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Feb 26 07:53:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 5a7ede41-749c-11f0-4812-c7172f7ef9a2 (at 10.9.101.2@o2ib4) reconnecting Feb 26 07:57:47 fir-md1-s1 kernel: LNet: Service thread pid 22211 was inactive for 200.24s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Feb 26 07:57:47 fir-md1-s1 kernel: Pid: 22211, comm: mdt01_079 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Feb 26 07:57:47 fir-md1-s1 kernel: Call Trace: Feb 26 07:57:47 fir-md1-s1 kernel: [] ldlm_completion_ast+0x5b1/0x920 [ptlrpc] Feb 26 07:57:47 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Feb 26 07:57:47 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Feb 26 07:57:47 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Feb 26 07:57:47 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x101d/0x1c30 [mdt] Feb 26 07:57:47 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Feb 26 07:57:47 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Feb 26 07:57:47 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Feb 26 07:57:47 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Feb 26 07:57:47 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Feb 26 07:57:47 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Feb 26 07:57:47 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Feb 26 07:57:47 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Feb 26 07:57:47 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Feb 26 07:57:47 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Feb 26 07:57:47 fir-md1-s1 kernel: [] 0xffffffffffffffff Feb 26 07:57:47 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551196667.22211 Feb 26 07:58:00 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.25.33@o2ib6) Feb 26 07:58:00 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 26 07:59:26 fir-md1-s1 kernel: LustreError: 22211:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551196466, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff9cc814be2400/0xb7044c649a59902a lrc: 3/1,0 mode: --/PR res: [0x2c0007328:0x173:0x0].0xff4b9336 bits 0x2/0x0 rrc: 4 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 22211 timeout: 0 lvb_type: 0 Feb 26 08:01:44 fir-md1-s1 kernel: Lustre: 22194:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551196893/real 1551196893] req@ffff9c94f9df2100 x1625960021964496/t0(0) o104->fir-MDT0000@10.8.3.11@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1551196904 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Feb 26 08:01:44 fir-md1-s1 kernel: Lustre: 22194:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Feb 26 08:01:55 fir-md1-s1 kernel: Lustre: 22194:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551196904/real 1551196904] req@ffff9c94f9df2100 x1625960021964496/t0(0) o104->fir-MDT0000@10.8.3.11@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1551196915 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 26 08:02:06 fir-md1-s1 kernel: Lustre: 22194:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551196915/real 1551196915] req@ffff9c94f9df2100 x1625960021964496/t0(0) o104->fir-MDT0000@10.8.3.11@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1551196926 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 26 08:02:28 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 668992fe-dbd6-c874-cc61-8fed39ac45eb (at 10.8.3.11@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb9fe206000, cur 1551196948 expire 1551196798 last 1551196721 Feb 26 08:02:28 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 08:03:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 25348431-febb-9cbc-ab46-773891c4a456 (at 10.9.101.13@o2ib4) reconnecting Feb 26 08:03:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7588591d-55f5-1531-1f50-b7daa2833142 (at 10.9.101.13@o2ib4) Feb 26 08:03:33 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Feb 26 08:03:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 5a7ede41-749c-11f0-4812-c7172f7ef9a2 (at 10.9.101.2@o2ib4) reconnecting Feb 26 08:03:51 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 26 08:04:21 fir-md1-s1 kernel: Lustre: 21952:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply#012 req@ffff9cad347ca100 x1626113273964752/t0(0) o101->435afb47-1e94-e7f6-b446-81449fd48e09@10.9.101.25@o2ib4:491/0 lens 616/3264 e 24 to 0 dl 1551197066 ref 2 fl Interpret:/0/0 rc 0/0 Feb 26 08:04:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 435afb47-1e94-e7f6-b446-81449fd48e09 (at 10.9.101.25@o2ib4) reconnecting Feb 26 08:11:46 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.3.11@o2ib6) Feb 26 08:11:46 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Feb 26 08:12:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client e906aeae-ab64-24f4-4215-89a5b36c51e0 (at 10.8.3.11@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca9d0aa4800, cur 1551197524 expire 1551197374 last 1551197297 Feb 26 08:12:04 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 08:13:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 25348431-febb-9cbc-ab46-773891c4a456 (at 10.9.101.13@o2ib4) reconnecting Feb 26 08:13:34 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 26 08:13:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 5a7ede41-749c-11f0-4812-c7172f7ef9a2 (at 10.9.101.2@o2ib4) reconnecting Feb 26 08:14:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 435afb47-1e94-e7f6-b446-81449fd48e09 (at 10.9.101.25@o2ib4) reconnecting Feb 26 08:22:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 7f18a6dc-08e0-1658-7449-c7cf75edac4e (at 10.8.18.34@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cacb2ef9000, cur 1551198163 expire 1551198013 last 1551197936 Feb 26 08:22:43 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 26 08:23:15 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 6d7e0734-7e24-57eb-b6d2-82e887ce4ecf (at 10.8.18.34@o2ib6) Feb 26 08:23:15 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Feb 26 08:23:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 06a1a62f-4958-2183-fc0f-5921522e8fe4 (at 10.9.101.47@o2ib4) reconnecting Feb 26 08:33:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 06a1a62f-4958-2183-fc0f-5921522e8fe4 (at 10.9.101.47@o2ib4) reconnecting Feb 26 08:33:36 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Feb 26 08:33:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7588591d-55f5-1531-1f50-b7daa2833142 (at 10.9.101.13@o2ib4) Feb 26 08:33:36 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Feb 26 08:43:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 25348431-febb-9cbc-ab46-773891c4a456 (at 10.9.101.13@o2ib4) reconnecting Feb 26 08:43:37 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 08:43:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7588591d-55f5-1531-1f50-b7daa2833142 (at 10.9.101.13@o2ib4) Feb 26 08:43:37 fir-md1-s1 kernel: Lustre: Skipped 18 previous similar messages Feb 26 08:44:48 fir-md1-s1 kernel: Lustre: 22194:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551199481/real 1551199481] req@ffff9c9c1a150c00 x1625960028968832/t0(0) o106->fir-MDT0000@10.8.15.7@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1551199488 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Feb 26 08:44:48 fir-md1-s1 kernel: Lustre: 22194:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Feb 26 08:44:55 fir-md1-s1 kernel: Lustre: 22194:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551199488/real 1551199488] req@ffff9c9c1a150c00 x1625960028968832/t0(0) o106->fir-MDT0000@10.8.15.7@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1551199495 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 26 08:45:02 fir-md1-s1 kernel: Lustre: 22194:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551199495/real 1551199495] req@ffff9c9c1a150c00 x1625960028968832/t0(0) o106->fir-MDT0000@10.8.15.7@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1551199502 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 26 08:45:16 fir-md1-s1 kernel: Lustre: 22194:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551199509/real 1551199509] req@ffff9c9c1a150c00 x1625960028968832/t0(0) o106->fir-MDT0000@10.8.15.7@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1551199516 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 26 08:45:16 fir-md1-s1 kernel: Lustre: 22194:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Feb 26 08:45:37 fir-md1-s1 kernel: Lustre: 22194:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551199530/real 1551199530] req@ffff9c9c1a150c00 x1625960028968832/t0(0) o106->fir-MDT0000@10.8.15.7@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1551199537 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 26 08:45:37 fir-md1-s1 kernel: Lustre: 22194:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Feb 26 08:46:19 fir-md1-s1 kernel: Lustre: 22194:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551199572/real 1551199572] req@ffff9c9c1a150c00 x1625960028968832/t0(0) o106->fir-MDT0000@10.8.15.7@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1551199579 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 26 08:46:19 fir-md1-s1 kernel: Lustre: 22194:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages Feb 26 08:46:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 8c7c352c-0786-b576-65c8-08222987391a (at 10.8.15.5@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cbb907ad400, cur 1551199618 expire 1551199468 last 1551199391 Feb 26 08:46:58 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 08:50:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 0a3359cf-f01b-f0ca-47f5-3d722a20fa29 (at 10.8.3.11@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cc3b2a64000, cur 1551199812 expire 1551199662 last 1551199585 Feb 26 08:50:12 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 26 08:53:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 06a1a62f-4958-2183-fc0f-5921522e8fe4 (at 10.9.101.47@o2ib4) reconnecting Feb 26 08:53:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7588591d-55f5-1531-1f50-b7daa2833142 (at 10.9.101.13@o2ib4) Feb 26 08:53:38 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Feb 26 08:53:38 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Feb 26 08:57:22 fir-md1-s1 kernel: Lustre: 21952:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551200235/real 1551200235] req@ffff9ca746a82700 x1625960030990896/t0(0) o104->fir-MDT0002@10.8.3.11@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1551200242 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Feb 26 08:57:22 fir-md1-s1 kernel: Lustre: 21952:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages Feb 26 08:59:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 584a550f-058e-90b1-f777-04bc395e1bdc (at 10.8.3.11@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb40ae5d000, cur 1551200383 expire 1551200233 last 1551200156 Feb 26 08:59:43 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 26 09:03:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 25348431-febb-9cbc-ab46-773891c4a456 (at 10.9.101.13@o2ib4) reconnecting Feb 26 09:03:39 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 09:03:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7588591d-55f5-1531-1f50-b7daa2833142 (at 10.9.101.13@o2ib4) Feb 26 09:03:39 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Feb 26 09:09:13 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client da75353c-11a4-d7b7-34db-4dc9a8bb77af (at 10.8.3.11@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9caa38f73800, cur 1551200953 expire 1551200803 last 1551200726 Feb 26 09:09:13 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 26 09:13:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 25348431-febb-9cbc-ab46-773891c4a456 (at 10.9.101.13@o2ib4) reconnecting Feb 26 09:13:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 36dbaa49-ae67-d353-f5f2-30bad8178317 (at 10.9.101.47@o2ib4) Feb 26 09:13:40 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Feb 26 09:13:40 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Feb 26 09:23:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 06a1a62f-4958-2183-fc0f-5921522e8fe4 (at 10.9.101.47@o2ib4) reconnecting Feb 26 09:23:42 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 09:23:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 36dbaa49-ae67-d353-f5f2-30bad8178317 (at 10.9.101.47@o2ib4) Feb 26 09:23:42 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Feb 26 09:28:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client cd0798b5-98a5-1d8c-9fd3-a878f63429f4 (at 10.9.104.28@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ccbc4bac800, cur 1551202130 expire 1551201980 last 1551201903 Feb 26 09:28:50 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 26 09:33:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 25348431-febb-9cbc-ab46-773891c4a456 (at 10.9.101.13@o2ib4) reconnecting Feb 26 09:33:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 36dbaa49-ae67-d353-f5f2-30bad8178317 (at 10.9.101.47@o2ib4) Feb 26 09:33:43 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Feb 26 09:33:43 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Feb 26 09:43:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 06a1a62f-4958-2183-fc0f-5921522e8fe4 (at 10.9.101.47@o2ib4) reconnecting Feb 26 09:43:44 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 09:43:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 36dbaa49-ae67-d353-f5f2-30bad8178317 (at 10.9.101.47@o2ib4) Feb 26 09:43:44 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Feb 26 09:53:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 25348431-febb-9cbc-ab46-773891c4a456 (at 10.9.101.13@o2ib4) reconnecting Feb 26 09:53:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 36dbaa49-ae67-d353-f5f2-30bad8178317 (at 10.9.101.47@o2ib4) Feb 26 09:53:45 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Feb 26 09:53:45 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Feb 26 09:56:16 fir-md1-s1 kernel: Lustre: 22261:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551203769/real 1551203769] req@ffff9cbd517a7b00 x1625960041592096/t0(0) o104->fir-MDT0002@10.8.8.26@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1551203776 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Feb 26 09:56:16 fir-md1-s1 kernel: Lustre: 22261:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 20 previous similar messages Feb 26 09:56:37 fir-md1-s1 kernel: Lustre: 22261:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551203790/real 1551203790] req@ffff9cbd517a7b00 x1625960041592096/t0(0) o104->fir-MDT0002@10.8.8.26@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1551203797 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 26 09:56:37 fir-md1-s1 kernel: Lustre: 22261:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Feb 26 09:57:19 fir-md1-s1 kernel: Lustre: 22261:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551203832/real 1551203832] req@ffff9cbd517a7b00 x1625960041592096/t0(0) o104->fir-MDT0002@10.8.8.26@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1551203839 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 26 09:57:19 fir-md1-s1 kernel: Lustre: 22261:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages Feb 26 09:57:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 4e438827-7ac6-b97d-f221-da77297fcae6 (at 10.8.3.11@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cbb9a334800, cur 1551203871 expire 1551203721 last 1551203644 Feb 26 09:57:51 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 09:58:36 fir-md1-s1 kernel: Lustre: 22261:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551203909/real 1551203909] req@ffff9cbd517a7b00 x1625960041592096/t0(0) o104->fir-MDT0002@10.8.8.26@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1551203916 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 26 09:58:36 fir-md1-s1 kernel: Lustre: 22261:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 10 previous similar messages Feb 26 09:58:43 fir-md1-s1 kernel: LustreError: 22261:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.8.26@o2ib6) failed to reply to blocking AST (req@ffff9cbd517a7b00 x1625960041592096 status 0 rc -110), evict it ns: mdt-fir-MDT0002_UUID lock: ffff9ca04049c5c0/0xb7044c64a15137f6 lrc: 4/0,0 mode: PR/PR res: [0x2c0007626:0x3e5e:0x0].0x0 bits 0x13/0x0 rrc: 316 type: IBT flags: 0x60200400000020 nid: 10.8.8.26@o2ib6 remote: 0x5b734a35d59674cf expref: 32 pid: 22251 timeout: 1189045 lvb_type: 0 Feb 26 09:58:43 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0002: A client on nid 10.8.8.26@o2ib6 was evicted due to a lock blocking callback time out: rc -110 Feb 26 09:58:43 fir-md1-s1 kernel: LustreError: 129189:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 154s: evicting client at 10.8.8.26@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff9ca04049c5c0/0xb7044c64a15137f6 lrc: 3/0,0 mode: PR/PR res: [0x2c0007626:0x3e5e:0x0].0x0 bits 0x13/0x0 rrc: 316 type: IBT flags: 0x60200400000020 nid: 10.8.8.26@o2ib6 remote: 0x5b734a35d59674cf expref: 33 pid: 22251 timeout: 0 lvb_type: 0 Feb 26 09:59:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 586a6e8e-70fe-af28-3ec3-56d6983d8923 (at 10.8.9.6@o2ib6) in 215 seconds. I think it's dead, and I am evicting it. exp ffff9c9df42f8000, cur 1551203947 expire 1551203797 last 1551203732 Feb 26 09:59:07 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 10:03:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 06a1a62f-4958-2183-fc0f-5921522e8fe4 (at 10.9.101.47@o2ib4) reconnecting Feb 26 10:03:46 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 10:03:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 36dbaa49-ae67-d353-f5f2-30bad8178317 (at 10.9.101.47@o2ib4) Feb 26 10:03:46 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Feb 26 10:07:40 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 3c0a6174-3393-d852-7ace-2d7d00293300 (at 10.8.3.11@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cc9d8a3f800, cur 1551204460 expire 1551204310 last 1551204233 Feb 26 10:07:40 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 26 10:13:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 25348431-febb-9cbc-ab46-773891c4a456 (at 10.9.101.13@o2ib4) reconnecting Feb 26 10:13:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 36dbaa49-ae67-d353-f5f2-30bad8178317 (at 10.9.101.47@o2ib4) Feb 26 10:13:47 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Feb 26 10:13:47 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Feb 26 10:23:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 06a1a62f-4958-2183-fc0f-5921522e8fe4 (at 10.9.101.47@o2ib4) reconnecting Feb 26 10:23:48 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 10:23:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 36dbaa49-ae67-d353-f5f2-30bad8178317 (at 10.9.101.47@o2ib4) Feb 26 10:23:48 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Feb 26 10:32:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 86c6afcd-91e0-38ae-f048-b29dd98ae25f (at 10.8.3.11@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c9a8922d800, cur 1551205926 expire 1551205776 last 1551205699 Feb 26 10:32:06 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 10:33:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 25348431-febb-9cbc-ab46-773891c4a456 (at 10.9.101.13@o2ib4) reconnecting Feb 26 10:33:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 36dbaa49-ae67-d353-f5f2-30bad8178317 (at 10.9.101.47@o2ib4) Feb 26 10:33:49 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Feb 26 10:33:49 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Feb 26 10:38:11 fir-md1-s1 kernel: Lustre: 22261:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551206284/real 1551206284] req@ffff9cb97b3ee600 x1625960048046160/t0(0) o104->fir-MDT0002@10.8.3.11@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1551206291 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Feb 26 10:38:11 fir-md1-s1 kernel: Lustre: 22261:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Feb 26 10:38:31 fir-md1-s1 kernel: Lustre: 47885:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551206304/real 1551206304] req@ffff9cb97b262100 x1625960048058048/t0(0) o106->fir-MDT0002@10.8.3.11@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1551206311 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 26 10:38:31 fir-md1-s1 kernel: Lustre: 47885:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages Feb 26 10:39:13 fir-md1-s1 kernel: Lustre: 47885:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551206346/real 1551206346] req@ffff9cb97b262100 x1625960048058048/t0(0) o106->fir-MDT0002@10.8.3.11@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1551206353 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 26 10:39:13 fir-md1-s1 kernel: Lustre: 47885:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 11 previous similar messages Feb 26 10:40:30 fir-md1-s1 kernel: Lustre: 47885:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551206423/real 1551206423] req@ffff9cb97b262100 x1625960048058048/t0(0) o106->fir-MDT0002@10.8.3.11@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1551206430 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 26 10:40:30 fir-md1-s1 kernel: Lustre: 47885:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 21 previous similar messages Feb 26 10:40:38 fir-md1-s1 kernel: LustreError: 22261:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.3.11@o2ib6) failed to reply to blocking AST (req@ffff9cb97b3ee600 x1625960048046160 status 0 rc -110), evict it ns: mdt-fir-MDT0002_UUID lock: ffff9cbfa52dd7c0/0xb7044c64a39bf4bb lrc: 4/0,0 mode: PR/PR res: [0x2c0007480:0x1453:0x0].0x0 bits 0x13/0x0 rrc: 8 type: IBT flags: 0x60200400000020 nid: 10.8.3.11@o2ib6 remote: 0x3914a45b378e0bd6 expref: 67 pid: 21826 timeout: 1191560 lvb_type: 0 Feb 26 10:40:38 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0002: A client on nid 10.8.3.11@o2ib6 was evicted due to a lock blocking callback time out: rc -110 Feb 26 10:40:38 fir-md1-s1 kernel: LustreError: 129189:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 154s: evicting client at 10.8.3.11@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff9cbfa52dd7c0/0xb7044c64a39bf4bb lrc: 3/0,0 mode: PR/PR res: [0x2c0007480:0x1453:0x0].0x0 bits 0x13/0x0 rrc: 8 type: IBT flags: 0x60200400000020 nid: 10.8.3.11@o2ib6 remote: 0x3914a45b378e0bd6 expref: 68 pid: 21826 timeout: 0 lvb_type: 0 Feb 26 10:41:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client b5df26d2-8439-f08e-34c5-eb65650c2837 (at 10.8.3.11@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cafb0f84000, cur 1551206477 expire 1551206327 last 1551206250 Feb 26 10:41:17 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 10:43:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 06a1a62f-4958-2183-fc0f-5921522e8fe4 (at 10.9.101.47@o2ib4) reconnecting Feb 26 10:43:50 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 10:43:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 36dbaa49-ae67-d353-f5f2-30bad8178317 (at 10.9.101.47@o2ib4) Feb 26 10:43:50 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Feb 26 10:51:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 4afa537b-8741-2d9e-bcf4-6005547fa285 (at 10.8.3.11@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb47b217000, cur 1551207063 expire 1551206913 last 1551206836 Feb 26 10:51:03 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 26 10:53:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 06a1a62f-4958-2183-fc0f-5921522e8fe4 (at 10.9.101.47@o2ib4) reconnecting Feb 26 10:53:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7588591d-55f5-1531-1f50-b7daa2833142 (at 10.9.101.13@o2ib4) Feb 26 10:53:51 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Feb 26 10:53:51 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Feb 26 11:03:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client cbf5d06d-ba23-f67d-ac30-844ab79b193e (at 10.9.103.29@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb4f6151400, cur 1551207823 expire 1551207673 last 1551207596 Feb 26 11:03:43 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 11:03:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 25348431-febb-9cbc-ab46-773891c4a456 (at 10.9.101.13@o2ib4) reconnecting Feb 26 11:03:52 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 11:03:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7588591d-55f5-1531-1f50-b7daa2833142 (at 10.9.101.13@o2ib4) Feb 26 11:03:52 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Feb 26 11:13:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client c3993171-24c0-89a4-7cb1-27b4ddbf15a6 (at 10.8.6.36@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c9fc025c000, cur 1551208393 expire 1551208243 last 1551208166 Feb 26 11:13:13 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 11:13:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 25348431-febb-9cbc-ab46-773891c4a456 (at 10.9.101.13@o2ib4) reconnecting Feb 26 11:13:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 36dbaa49-ae67-d353-f5f2-30bad8178317 (at 10.9.101.47@o2ib4) Feb 26 11:13:53 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Feb 26 11:13:53 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Feb 26 11:23:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 06a1a62f-4958-2183-fc0f-5921522e8fe4 (at 10.9.101.47@o2ib4) reconnecting Feb 26 11:23:54 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 11:23:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 36dbaa49-ae67-d353-f5f2-30bad8178317 (at 10.9.101.47@o2ib4) Feb 26 11:23:54 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Feb 26 11:30:45 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client a5270867-1f9c-2cd4-9490-f4af8b248386 (at 10.8.3.11@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c9ed28cc000, cur 1551209445 expire 1551209295 last 1551209218 Feb 26 11:30:45 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 11:32:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client d9657556-3698-de72-acc2-cb9f2581779e (at 10.9.106.5@o2ib4) in 159 seconds. I think it's dead, and I am evicting it. exp ffff9cabfabe2c00, cur 1551209521 expire 1551209371 last 1551209362 Feb 26 11:32:01 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 11:33:09 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7a578c83-dd84-2f42-1dbc-7f07efe60dc5 (at 10.9.106.5@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c9e95dcc000, cur 1551209589 expire 1551209439 last 1551209362 Feb 26 11:33:09 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 26 11:33:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 25348431-febb-9cbc-ab46-773891c4a456 (at 10.9.101.13@o2ib4) reconnecting Feb 26 11:33:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 36dbaa49-ae67-d353-f5f2-30bad8178317 (at 10.9.101.47@o2ib4) Feb 26 11:33:55 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Feb 26 11:33:55 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Feb 26 11:43:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 06a1a62f-4958-2183-fc0f-5921522e8fe4 (at 10.9.101.47@o2ib4) reconnecting Feb 26 11:43:57 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 11:43:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 36dbaa49-ae67-d353-f5f2-30bad8178317 (at 10.9.101.47@o2ib4) Feb 26 11:43:57 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Feb 26 11:53:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 25348431-febb-9cbc-ab46-773891c4a456 (at 10.9.101.13@o2ib4) reconnecting Feb 26 11:53:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 36dbaa49-ae67-d353-f5f2-30bad8178317 (at 10.9.101.47@o2ib4) Feb 26 11:53:58 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Feb 26 11:53:58 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Feb 26 12:03:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 06a1a62f-4958-2183-fc0f-5921522e8fe4 (at 10.9.101.47@o2ib4) reconnecting Feb 26 12:03:59 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 12:03:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 36dbaa49-ae67-d353-f5f2-30bad8178317 (at 10.9.101.47@o2ib4) Feb 26 12:03:59 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Feb 26 12:14:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 25348431-febb-9cbc-ab46-773891c4a456 (at 10.9.101.13@o2ib4) reconnecting Feb 26 12:14:00 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Feb 26 12:14:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7588591d-55f5-1531-1f50-b7daa2833142 (at 10.9.101.13@o2ib4) Feb 26 12:14:00 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Feb 26 12:15:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client dace36ca-aee2-2621-1cf1-1972e147fbdb (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cad7df21800, cur 1551212101 expire 1551211951 last 1551211874 Feb 26 12:24:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 06a1a62f-4958-2183-fc0f-5921522e8fe4 (at 10.9.101.47@o2ib4) reconnecting Feb 26 12:24:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7588591d-55f5-1531-1f50-b7daa2833142 (at 10.9.101.13@o2ib4) Feb 26 12:24:01 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 26 12:24:01 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Feb 26 12:26:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 54802b78-b8a3-6ef2-1c4c-1f299e55d7f1 (at 10.9.103.14@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca82b746800, cur 1551212772 expire 1551212622 last 1551212545 Feb 26 12:26:12 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 12:34:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 25348431-febb-9cbc-ab46-773891c4a456 (at 10.9.101.13@o2ib4) reconnecting Feb 26 12:34:02 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 12:34:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7588591d-55f5-1531-1f50-b7daa2833142 (at 10.9.101.13@o2ib4) Feb 26 12:34:02 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Feb 26 12:37:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 4bc75f04-b8d9-ad99-995d-27dacc9e399f (at 10.8.3.11@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cc05d24d000, cur 1551213469 expire 1551213319 last 1551213242 Feb 26 12:37:49 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 12:44:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 06a1a62f-4958-2183-fc0f-5921522e8fe4 (at 10.9.101.47@o2ib4) reconnecting Feb 26 12:44:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7588591d-55f5-1531-1f50-b7daa2833142 (at 10.9.101.13@o2ib4) Feb 26 12:44:03 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 26 12:44:03 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Feb 26 12:48:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 1dd4f190-5ae4-caa5-985f-1c9f4e428645 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca14ab6bc00, cur 1551214082 expire 1551213932 last 1551213855 Feb 26 12:48:02 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 12:54:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 25348431-febb-9cbc-ab46-773891c4a456 (at 10.9.101.13@o2ib4) reconnecting Feb 26 12:54:04 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 12:54:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7588591d-55f5-1531-1f50-b7daa2833142 (at 10.9.101.13@o2ib4) Feb 26 12:54:04 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 26 12:58:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client b9ec06be-945c-dd77-0a54-c23157571370 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca2de495000, cur 1551214707 expire 1551214557 last 1551214480 Feb 26 12:58:27 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 13:02:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 86edcde5-a827-eaa7-4e0d-837c9d785f60 (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c986d25c400, cur 1551214975 expire 1551214825 last 1551214748 Feb 26 13:02:55 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 13:04:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 06a1a62f-4958-2183-fc0f-5921522e8fe4 (at 10.9.101.47@o2ib4) reconnecting Feb 26 13:04:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7588591d-55f5-1531-1f50-b7daa2833142 (at 10.9.101.13@o2ib4) Feb 26 13:04:05 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 26 13:04:05 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Feb 26 13:10:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 41de1f40-a368-cb6e-edf5-62e1463dd452 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca2050e4c00, cur 1551215442 expire 1551215292 last 1551215215 Feb 26 13:10:42 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 13:14:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 25348431-febb-9cbc-ab46-773891c4a456 (at 10.9.101.13@o2ib4) reconnecting Feb 26 13:14:06 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 13:14:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7588591d-55f5-1531-1f50-b7daa2833142 (at 10.9.101.13@o2ib4) Feb 26 13:14:06 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Feb 26 13:16:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 70a023b4-747e-4af8-d033-8c85e93a3452 (at 10.8.3.11@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca4eb899400, cur 1551215815 expire 1551215665 last 1551215588 Feb 26 13:16:55 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 13:24:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 06a1a62f-4958-2183-fc0f-5921522e8fe4 (at 10.9.101.47@o2ib4) reconnecting Feb 26 13:24:07 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Feb 26 13:24:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 36dbaa49-ae67-d353-f5f2-30bad8178317 (at 10.9.101.47@o2ib4) Feb 26 13:24:07 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Feb 26 13:24:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 7b549f94-39e7-01a6-e5cf-5436bf8dbade (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9caa3af4bc00, cur 1551216292 expire 1551216142 last 1551216065 Feb 26 13:24:52 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 13:33:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client d9daee9c-edb1-9d2e-619d-f21193f2215f (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cac05a4b400, cur 1551216807 expire 1551216657 last 1551216580 Feb 26 13:33:27 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 13:34:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 25348431-febb-9cbc-ab46-773891c4a456 (at 10.9.101.13@o2ib4) reconnecting Feb 26 13:34:08 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Feb 26 13:34:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7588591d-55f5-1531-1f50-b7daa2833142 (at 10.9.101.13@o2ib4) Feb 26 13:34:08 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Feb 26 13:37:40 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client ff6304fb-0905-2f57-9a69-fe16c0863243 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cd34e70c800, cur 1551217060 expire 1551216910 last 1551216833 Feb 26 13:37:40 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 13:44:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 06a1a62f-4958-2183-fc0f-5921522e8fe4 (at 10.9.101.47@o2ib4) reconnecting Feb 26 13:44:09 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Feb 26 13:44:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 36dbaa49-ae67-d353-f5f2-30bad8178317 (at 10.9.101.47@o2ib4) Feb 26 13:44:09 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Feb 26 13:48:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 684ea0e7-0c08-6703-7248-c95b9a40c5aa (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca5d8f54800, cur 1551217706 expire 1551217556 last 1551217479 Feb 26 13:48:26 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 13:54:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 25348431-febb-9cbc-ab46-773891c4a456 (at 10.9.101.13@o2ib4) reconnecting Feb 26 13:54:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 36dbaa49-ae67-d353-f5f2-30bad8178317 (at 10.9.101.47@o2ib4) Feb 26 13:54:11 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 26 13:54:11 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Feb 26 13:57:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 4b878fb5-3dc5-fb97-9a4e-5290921e42e5 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cc49cbad400, cur 1551218232 expire 1551218082 last 1551218005 Feb 26 13:57:12 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 14:04:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 06a1a62f-4958-2183-fc0f-5921522e8fe4 (at 10.9.101.47@o2ib4) reconnecting Feb 26 14:04:12 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 14:04:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 36dbaa49-ae67-d353-f5f2-30bad8178317 (at 10.9.101.47@o2ib4) Feb 26 14:04:12 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Feb 26 14:05:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 9fc3cd54-25e8-a4b2-3be6-326429584c71 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cab6bf9c000, cur 1551218742 expire 1551218592 last 1551218515 Feb 26 14:05:42 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 14:08:45 fir-md1-s1 kernel: Lustre: 22185:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551218918/real 1551218918] req@ffff9cbaff36f200 x1625960095436656/t0(0) o106->fir-MDT0000@10.8.3.11@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1551218925 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Feb 26 14:08:45 fir-md1-s1 kernel: Lustre: 22185:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages Feb 26 14:09:07 fir-md1-s1 kernel: Lustre: 22185:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551218939/real 1551218939] req@ffff9cbaff36f200 x1625960095436656/t0(0) o106->fir-MDT0000@10.8.3.11@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1551218946 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 26 14:09:07 fir-md1-s1 kernel: Lustre: 22185:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Feb 26 14:09:32 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 967de144-030f-0484-0bac-08f806ce1344 (at 10.8.3.11@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cd2ffad1800, cur 1551218972 expire 1551218822 last 1551218745 Feb 26 14:09:32 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 14:14:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 06a1a62f-4958-2183-fc0f-5921522e8fe4 (at 10.9.101.47@o2ib4) reconnecting Feb 26 14:14:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7588591d-55f5-1531-1f50-b7daa2833142 (at 10.9.101.13@o2ib4) Feb 26 14:14:13 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Feb 26 14:14:13 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Feb 26 14:16:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client b8941599-a431-f47d-bd8e-4cd295239b41 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb40a797c00, cur 1551219381 expire 1551219231 last 1551219154 Feb 26 14:16:21 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 14:19:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client e4e1992b-645c-7fa0-95fb-f97f854d138b (at 10.8.26.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca3a93bec00, cur 1551219544 expire 1551219394 last 1551219317 Feb 26 14:19:04 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 14:24:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 25348431-febb-9cbc-ab46-773891c4a456 (at 10.9.101.13@o2ib4) reconnecting Feb 26 14:24:14 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 14:24:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7588591d-55f5-1531-1f50-b7daa2833142 (at 10.9.101.13@o2ib4) Feb 26 14:24:14 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Feb 26 14:29:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client d8aa5f17-9f20-7c2a-069a-5ce841acf2bf (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cc3f160a000, cur 1551220190 expire 1551220040 last 1551219963 Feb 26 14:29:50 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 14:34:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 06a1a62f-4958-2183-fc0f-5921522e8fe4 (at 10.9.101.47@o2ib4) reconnecting Feb 26 14:34:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7588591d-55f5-1531-1f50-b7daa2833142 (at 10.9.101.13@o2ib4) Feb 26 14:34:15 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Feb 26 14:34:15 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Feb 26 14:43:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 2557db10-3079-820d-ecbc-02fa2573f957 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cbeb5f49800, cur 1551221016 expire 1551220866 last 1551220789 Feb 26 14:43:36 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 14:44:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 25348431-febb-9cbc-ab46-773891c4a456 (at 10.9.101.13@o2ib4) reconnecting Feb 26 14:44:16 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 14:44:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7588591d-55f5-1531-1f50-b7daa2833142 (at 10.9.101.13@o2ib4) Feb 26 14:44:16 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Feb 26 14:54:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 06a1a62f-4958-2183-fc0f-5921522e8fe4 (at 10.9.101.47@o2ib4) reconnecting Feb 26 14:54:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7588591d-55f5-1531-1f50-b7daa2833142 (at 10.9.101.13@o2ib4) Feb 26 14:54:17 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Feb 26 14:54:17 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Feb 26 14:55:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client c294684b-f6c0-f1a6-6900-0457bc4ed9b5 (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c9c65340000, cur 1551221701 expire 1551221551 last 1551221474 Feb 26 14:55:01 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 15:04:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 25348431-febb-9cbc-ab46-773891c4a456 (at 10.9.101.13@o2ib4) reconnecting Feb 26 15:04:18 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 15:04:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7588591d-55f5-1531-1f50-b7daa2833142 (at 10.9.101.13@o2ib4) Feb 26 15:04:18 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Feb 26 15:14:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 25348431-febb-9cbc-ab46-773891c4a456 (at 10.9.101.13@o2ib4) reconnecting Feb 26 15:14:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 36dbaa49-ae67-d353-f5f2-30bad8178317 (at 10.9.101.47@o2ib4) Feb 26 15:14:19 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Feb 26 15:14:19 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Feb 26 15:24:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 06a1a62f-4958-2183-fc0f-5921522e8fe4 (at 10.9.101.47@o2ib4) reconnecting Feb 26 15:24:20 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 15:24:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 36dbaa49-ae67-d353-f5f2-30bad8178317 (at 10.9.101.47@o2ib4) Feb 26 15:24:20 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Feb 26 15:30:40 fir-md1-s1 kernel: LustreError: 21983:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.20.15@o2ib6) returned error from glimpse AST (req@ffff9ca1204e7b00 x1625960112674144 status -107 rc -107), evict it ns: mdt-fir-MDT0002_UUID lock: ffff9ca99a666e40/0xb7044c64b6dd6f1d lrc: 4/0,0 mode: PW/PW res: [0x2c00076c1:0x13:0x0].0x0 bits 0x40/0x0 rrc: 7 type: IBT flags: 0x40200000000000 nid: 10.8.20.15@o2ib6 remote: 0x12e2f7a651f8de2e expref: 63 pid: 22265 timeout: 0 lvb_type: 0 Feb 26 15:30:40 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0002: A client on nid 10.8.20.15@o2ib6 was evicted due to a lock glimpse callback time out: rc -107 Feb 26 15:30:40 fir-md1-s1 kernel: LustreError: 129189:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 605s: evicting client at 10.8.20.15@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff9ca99a666e40/0xb7044c64b6dd6f1d lrc: 4/0,0 mode: PW/PW res: [0x2c00076c1:0x13:0x0].0x0 bits 0x40/0x0 rrc: 7 type: IBT flags: 0x40200000000000 nid: 10.8.20.15@o2ib6 remote: 0x12e2f7a651f8de2e expref: 64 pid: 22265 timeout: 0 lvb_type: 0 Feb 26 15:30:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 84832981-b60b-fcdd-c03c-f949a244fc50 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c98a9a12000, cur 1551223854 expire 1551223704 last 1551223627 Feb 26 15:30:54 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 15:34:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 25348431-febb-9cbc-ab46-773891c4a456 (at 10.9.101.13@o2ib4) reconnecting Feb 26 15:34:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 36dbaa49-ae67-d353-f5f2-30bad8178317 (at 10.9.101.47@o2ib4) Feb 26 15:34:21 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Feb 26 15:34:21 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Feb 26 15:38:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client c12a1684-dc00-3104-ef93-3cf20d893979 (at 10.9.106.8@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca2f59b6000, cur 1551224307 expire 1551224157 last 1551224080 Feb 26 15:38:27 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 26 15:44:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 06a1a62f-4958-2183-fc0f-5921522e8fe4 (at 10.9.101.47@o2ib4) reconnecting Feb 26 15:44:22 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 15:44:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 36dbaa49-ae67-d353-f5f2-30bad8178317 (at 10.9.101.47@o2ib4) Feb 26 15:44:22 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Feb 26 15:46:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client b24d4d9e-3a1a-723e-14ce-1a0c1b08163a (at 10.8.3.11@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cbccf364400, cur 1551224796 expire 1551224646 last 1551224569 Feb 26 15:46:36 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 15:54:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 06a1a62f-4958-2183-fc0f-5921522e8fe4 (at 10.9.101.47@o2ib4) reconnecting Feb 26 15:54:23 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Feb 26 15:54:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7588591d-55f5-1531-1f50-b7daa2833142 (at 10.9.101.13@o2ib4) Feb 26 15:54:23 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Feb 26 15:55:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client c2806c19-f08b-a384-fd39-3948eefaeb48 (at 10.8.3.11@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb37220cc00, cur 1551225358 expire 1551225208 last 1551225131 Feb 26 15:55:58 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 16:04:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 25348431-febb-9cbc-ab46-773891c4a456 (at 10.9.101.13@o2ib4) reconnecting Feb 26 16:04:25 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 16:04:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7588591d-55f5-1531-1f50-b7daa2833142 (at 10.9.101.13@o2ib4) Feb 26 16:04:25 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Feb 26 16:14:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 06a1a62f-4958-2183-fc0f-5921522e8fe4 (at 10.9.101.47@o2ib4) reconnecting Feb 26 16:14:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7588591d-55f5-1531-1f50-b7daa2833142 (at 10.9.101.13@o2ib4) Feb 26 16:14:26 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Feb 26 16:14:26 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Feb 26 16:14:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 510c5105-88c7-dbd9-6053-47a0d7a4184b (at 10.8.3.11@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9caeabaa5400, cur 1551226499 expire 1551226349 last 1551226272 Feb 26 16:14:59 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 26 16:24:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 25348431-febb-9cbc-ab46-773891c4a456 (at 10.9.101.13@o2ib4) reconnecting Feb 26 16:24:27 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 16:24:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7588591d-55f5-1531-1f50-b7daa2833142 (at 10.9.101.13@o2ib4) Feb 26 16:24:27 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Feb 26 16:34:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 06a1a62f-4958-2183-fc0f-5921522e8fe4 (at 10.9.101.47@o2ib4) reconnecting Feb 26 16:34:28 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Feb 26 16:34:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7588591d-55f5-1531-1f50-b7daa2833142 (at 10.9.101.13@o2ib4) Feb 26 16:34:28 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Feb 26 16:36:06 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 39a5491d-e10b-c19b-ffc9-b833c49037a8 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cae83ff8400, cur 1551227766 expire 1551227616 last 1551227539 Feb 26 16:36:06 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 26 16:44:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 25348431-febb-9cbc-ab46-773891c4a456 (at 10.9.101.13@o2ib4) reconnecting Feb 26 16:44:29 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 16:44:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7588591d-55f5-1531-1f50-b7daa2833142 (at 10.9.101.13@o2ib4) Feb 26 16:44:29 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Feb 26 16:49:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client a37c418b-5455-262c-ca39-fe09a1d64b6a (at 10.9.106.7@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9caf05271400, cur 1551228560 expire 1551228410 last 1551228333 Feb 26 16:49:20 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 16:49:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client a37c418b-5455-262c-ca39-fe09a1d64b6a (at 10.9.106.7@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca04bee8c00, cur 1551228570 expire 1551228420 last 1551228343 Feb 26 16:54:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 06a1a62f-4958-2183-fc0f-5921522e8fe4 (at 10.9.101.47@o2ib4) reconnecting Feb 26 16:54:30 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Feb 26 16:54:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7588591d-55f5-1531-1f50-b7daa2833142 (at 10.9.101.13@o2ib4) Feb 26 16:54:30 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Feb 26 16:59:22 fir-md1-s1 kernel: Lustre: 21919:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551229155/real 1551229155] req@ffff9cb6e2e22a00 x1625960135181088/t0(0) o104->fir-MDT0000@10.8.15.2@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1551229162 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Feb 26 16:59:22 fir-md1-s1 kernel: Lustre: 21919:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages Feb 26 16:59:29 fir-md1-s1 kernel: Lustre: 21919:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551229162/real 1551229162] req@ffff9cb6e2e22a00 x1625960135181088/t0(0) o104->fir-MDT0000@10.8.15.2@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1551229169 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 26 16:59:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client ece24e32-c30a-37fd-6034-503af11c37b1 (at 10.8.15.2@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb31db06000, cur 1551229182 expire 1551229032 last 1551228955 Feb 26 16:59:42 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 26 16:59:43 fir-md1-s1 kernel: Lustre: 21919:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551229176/real 1551229176] req@ffff9cb6e2e22a00 x1625960135181088/t0(0) o104->fir-MDT0000@10.8.15.2@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1551229183 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 26 16:59:43 fir-md1-s1 kernel: Lustre: 21919:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Feb 26 17:04:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 25348431-febb-9cbc-ab46-773891c4a456 (at 10.9.101.13@o2ib4) reconnecting Feb 26 17:04:31 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 17:04:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7588591d-55f5-1531-1f50-b7daa2833142 (at 10.9.101.13@o2ib4) Feb 26 17:04:31 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Feb 26 17:07:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 5d4a704c-38f8-3c12-3cf9-aa44b7455628 (at 10.8.3.11@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cbfc8ec6000, cur 1551229672 expire 1551229522 last 1551229445 Feb 26 17:07:52 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 17:14:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 25348431-febb-9cbc-ab46-773891c4a456 (at 10.9.101.13@o2ib4) reconnecting Feb 26 17:14:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 36dbaa49-ae67-d353-f5f2-30bad8178317 (at 10.9.101.47@o2ib4) Feb 26 17:14:32 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Feb 26 17:14:32 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Feb 26 17:17:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 1bc1040b-c71d-b7f4-505c-0eae20c9d5b2 (at 10.8.3.11@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cc22d347400, cur 1551230248 expire 1551230098 last 1551230021 Feb 26 17:17:28 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 17:24:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 06a1a62f-4958-2183-fc0f-5921522e8fe4 (at 10.9.101.47@o2ib4) reconnecting Feb 26 17:24:33 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 17:24:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 36dbaa49-ae67-d353-f5f2-30bad8178317 (at 10.9.101.47@o2ib4) Feb 26 17:24:33 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Feb 26 17:30:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client f05ae9bb-74a9-aaeb-6de7-2ec3e892aa1c (at 10.8.3.11@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c99556b6400, cur 1551231044 expire 1551230894 last 1551230817 Feb 26 17:30:44 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 17:34:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 25348431-febb-9cbc-ab46-773891c4a456 (at 10.9.101.13@o2ib4) reconnecting Feb 26 17:34:34 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Feb 26 17:34:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7588591d-55f5-1531-1f50-b7daa2833142 (at 10.9.101.13@o2ib4) Feb 26 17:34:34 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Feb 26 17:40:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 9805b277-87d0-b486-3f4c-d6cea507bc6f (at 10.8.3.11@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca28162a800, cur 1551231618 expire 1551231468 last 1551231391 Feb 26 17:40:18 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 17:44:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 06a1a62f-4958-2183-fc0f-5921522e8fe4 (at 10.9.101.47@o2ib4) reconnecting Feb 26 17:44:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7588591d-55f5-1531-1f50-b7daa2833142 (at 10.9.101.13@o2ib4) Feb 26 17:44:35 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 26 17:44:35 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Feb 26 17:49:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 07de05ca-2f51-d8fa-31d4-2f1d95f20c90 (at 10.8.3.11@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca2f0c07000, cur 1551232183 expire 1551232033 last 1551231956 Feb 26 17:49:43 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 17:50:35 fir-md1-s1 kernel: Lustre: 22225:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551232228/real 1551232228] req@ffff9cc0647e7200 x1625960145685776/t0(0) o104->fir-MDT0000@10.8.15.5@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1551232235 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Feb 26 17:50:35 fir-md1-s1 kernel: Lustre: 22225:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Feb 26 17:50:42 fir-md1-s1 kernel: Lustre: 22225:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551232235/real 1551232235] req@ffff9cc0647e7200 x1625960145685776/t0(0) o104->fir-MDT0000@10.8.15.5@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1551232242 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 26 17:50:49 fir-md1-s1 kernel: Lustre: 22225:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551232242/real 1551232242] req@ffff9cc0647e7200 x1625960145685776/t0(0) o104->fir-MDT0000@10.8.15.5@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1551232249 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 26 17:51:03 fir-md1-s1 kernel: Lustre: 22225:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551232256/real 1551232256] req@ffff9cc0647e7200 x1625960145685776/t0(0) o104->fir-MDT0000@10.8.15.5@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1551232263 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 26 17:51:03 fir-md1-s1 kernel: Lustre: 22225:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Feb 26 17:51:24 fir-md1-s1 kernel: Lustre: 22225:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551232277/real 1551232277] req@ffff9cc0647e7200 x1625960145685776/t0(0) o104->fir-MDT0000@10.8.15.5@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1551232284 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 26 17:51:24 fir-md1-s1 kernel: Lustre: 22225:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Feb 26 17:52:06 fir-md1-s1 kernel: Lustre: 22225:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551232319/real 1551232319] req@ffff9cc0647e7200 x1625960145685776/t0(0) o104->fir-MDT0000@10.8.15.5@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1551232326 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 26 17:52:06 fir-md1-s1 kernel: Lustre: 22225:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages Feb 26 17:52:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 58716176-0030-c50b-f21a-0116eb9d9b93 (at 10.8.15.5@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb4288d0400, cur 1551232365 expire 1551232215 last 1551232138 Feb 26 17:52:45 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 17:53:02 fir-md1-s1 kernel: LustreError: 22225:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.15.5@o2ib6) failed to reply to blocking AST (req@ffff9cc0647e7200 x1625960145685776 status 0 rc -110), evict it ns: mdt-fir-MDT0000_UUID lock: ffff9cb1aab3bf00/0xb7044c64c1bf7942 lrc: 4/0,0 mode: PR/PR res: [0x200001798:0x2:0x0].0x0 bits 0x13/0x0 rrc: 101 type: IBT flags: 0x60200400000020 nid: 10.8.15.5@o2ib6 remote: 0xac08fa9a47bafc3b expref: 664 pid: 51439 timeout: 1217504 lvb_type: 0 Feb 26 17:53:02 fir-md1-s1 kernel: LustreError: 22225:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) Skipped 1 previous similar message Feb 26 17:53:02 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0000: A client on nid 10.8.15.5@o2ib6 was evicted due to a lock blocking callback time out: rc -110 Feb 26 17:53:02 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Feb 26 17:53:02 fir-md1-s1 kernel: LustreError: 129189:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 154s: evicting client at 10.8.15.5@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff9cb1aab3bf00/0xb7044c64c1bf7942 lrc: 3/0,0 mode: PR/PR res: [0x200001798:0x2:0x0].0x0 bits 0x13/0x0 rrc: 101 type: IBT flags: 0x60200400000020 nid: 10.8.15.5@o2ib6 remote: 0xac08fa9a47bafc3b expref: 665 pid: 51439 timeout: 0 lvb_type: 0 Feb 26 17:53:02 fir-md1-s1 kernel: LustreError: 129189:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message Feb 26 17:54:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 25348431-febb-9cbc-ab46-773891c4a456 (at 10.9.101.13@o2ib4) reconnecting Feb 26 17:54:36 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 17:54:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 7588591d-55f5-1531-1f50-b7daa2833142 (at 10.9.101.13@o2ib4) Feb 26 17:54:36 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Feb 26 18:05:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client a51994fb-3110-bb4f-ed4e-eced3e097c15 (at 10.8.1.2@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb436b64400, cur 1551233135 expire 1551232985 last 1551232908 Feb 26 18:05:35 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 26 18:06:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client d2e5e5cf-1268-d36f-86c1-697d662b51f0 (at 10.8.20.15@o2ib6) in 168 seconds. I think it's dead, and I am evicting it. exp ffff9ca778fb2c00, cur 1551233211 expire 1551233061 last 1551233043 Feb 26 18:06:51 fir-md1-s1 kernel: Lustre: Skipped 20 previous similar messages Feb 26 18:07:22 fir-md1-s1 kernel: EXT4-fs (sdk2): error count since last fsck: 5 Feb 26 18:07:22 fir-md1-s1 kernel: EXT4-fs (sdk2): initial error at time 1550022155: ext4_mb_generate_buddy:757 Feb 26 18:07:22 fir-md1-s1 kernel: EXT4-fs (sdk2): last error at time 1550448029: ext4_mb_generate_buddy:757 Feb 26 18:07:50 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client b5216915-2908-dac2-4e2f-aa11d003325b (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb3f8f8b400, cur 1551233270 expire 1551233120 last 1551233043 Feb 26 18:07:50 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 26 18:10:13 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ce807089-fd76-9353-e0a7-32c622e380eb (at 10.8.20.15@o2ib6) Feb 26 18:10:13 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Feb 26 18:14:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client f2bd8263-263e-b013-a9c1-5f61f7b17ac2 (at 10.9.106.9@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cc43bf60c00, cur 1551233681 expire 1551233531 last 1551233454 Feb 26 18:24:57 fir-md1-s1 kernel: Lustre: MGS: Connection restored to c51f1581-c2de-e3e4-93bf-651212d7bfde (at 10.8.14.2@o2ib6) Feb 26 18:24:57 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 18:33:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client b2124be6-9114-eedd-f3c5-1909bfcb6010 (at 10.8.17.20@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cbb71a58c00, cur 1551234818 expire 1551234668 last 1551234591 Feb 26 18:33:38 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 18:36:33 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 054698ea-f84b-bd00-f4ed-c64e725d9902 (at 10.8.1.2@o2ib6) Feb 26 18:36:33 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Feb 26 18:41:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 5c4d5ec9-c7a9-4090-a2e7-0e452f357bdf (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca761333400, cur 1551235272 expire 1551235122 last 1551235045 Feb 26 18:41:12 fir-md1-s1 kernel: Lustre: Skipped 35 previous similar messages Feb 26 19:02:06 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.18.25@o2ib6) Feb 26 19:02:06 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Feb 26 19:03:54 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.7.18@o2ib6) Feb 26 19:03:54 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Feb 26 19:12:31 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 3932c835-e4bd-99a2-5e8c-8fdd68aa9cbf (at 10.9.106.6@o2ib4) Feb 26 19:12:31 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Feb 26 19:15:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client ebee093c-8e69-4e39-2895-94a502f715cc (at 10.9.113.6@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca08daa5000, cur 1551237348 expire 1551237198 last 1551237121 Feb 26 19:15:48 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 26 19:15:56 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client e8f72dfe-f91f-5660-7d07-dfa3e84e9791 (at 10.9.113.6@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c982e393800, cur 1551237356 expire 1551237206 last 1551237129 Feb 26 19:15:56 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 26 19:31:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client b42f3602-820d-dd01-5406-5780a0e4a943 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca79e3f7400, cur 1551238280 expire 1551238130 last 1551238053 Feb 26 19:35:34 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ce807089-fd76-9353-e0a7-32c622e380eb (at 10.8.20.15@o2ib6) Feb 26 19:35:34 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 19:40:52 fir-md1-s1 kernel: LustreError: 21288:0:(mdt_io.c:442:mdt_preprw_write()) fir-MDT0000: BRW to missing obj [0x2000068fa:0xba:0x0] Feb 26 20:04:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 06a0640d-de7f-9715-947b-5ac203d15e9f (at 10.0.10.3@o2ib7) reconnecting Feb 26 20:04:00 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Feb 26 20:04:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to 06a0640d-de7f-9715-947b-5ac203d15e9f (at 10.0.10.3@o2ib7) Feb 26 20:04:00 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 26 20:37:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client e3d7d750-2a26-e81f-1160-2f3ee9d7f849 (at 10.9.106.10@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca08daa5400, cur 1551242229 expire 1551242079 last 1551242002 Feb 26 20:37:09 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 20:38:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client e78a174a-ae69-cb32-9b42-0306c7153992 (at 10.9.103.7@o2ib4) in 225 seconds. I think it's dead, and I am evicting it. exp ffff9c9965701400, cur 1551242305 expire 1551242155 last 1551242080 Feb 26 20:38:25 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 20:41:53 fir-md1-s1 kernel: LNet: Service thread pid 22267 was inactive for 200.36s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Feb 26 20:41:53 fir-md1-s1 kernel: Pid: 22267, comm: mdt01_099 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Feb 26 20:41:53 fir-md1-s1 kernel: Call Trace: Feb 26 20:41:53 fir-md1-s1 kernel: [] ldlm_completion_ast+0x5b1/0x920 [ptlrpc] Feb 26 20:41:53 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Feb 26 20:41:53 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Feb 26 20:41:53 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Feb 26 20:41:53 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x101d/0x1c30 [mdt] Feb 26 20:41:53 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Feb 26 20:41:53 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Feb 26 20:41:53 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Feb 26 20:41:53 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Feb 26 20:41:53 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Feb 26 20:41:53 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Feb 26 20:41:53 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Feb 26 20:41:53 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Feb 26 20:41:53 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Feb 26 20:41:53 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Feb 26 20:41:53 fir-md1-s1 kernel: [] 0xffffffffffffffff Feb 26 20:41:53 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551242513.22267 Feb 26 20:42:24 fir-md1-s1 kernel: LNet: Service thread pid 51440 was inactive for 200.11s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Feb 26 20:42:24 fir-md1-s1 kernel: Pid: 51440, comm: mdt00_092 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Feb 26 20:42:24 fir-md1-s1 kernel: Call Trace: Feb 26 20:42:24 fir-md1-s1 kernel: [] ldlm_completion_ast+0x5b1/0x920 [ptlrpc] Feb 26 20:42:24 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Feb 26 20:42:24 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Feb 26 20:42:24 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Feb 26 20:42:24 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x101d/0x1c30 [mdt] Feb 26 20:42:24 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Feb 26 20:42:24 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Feb 26 20:42:24 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Feb 26 20:42:24 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Feb 26 20:42:24 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Feb 26 20:42:24 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Feb 26 20:42:24 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Feb 26 20:42:24 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Feb 26 20:42:24 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Feb 26 20:42:24 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Feb 26 20:42:24 fir-md1-s1 kernel: [] 0xffffffffffffffff Feb 26 20:42:24 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551242544.51440 Feb 26 20:42:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client fbc086c3-52be-ac9a-e64f-08d421c53770 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c998f0a2c00, cur 1551242549 expire 1551242399 last 1551242322 Feb 26 20:42:29 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 20:43:32 fir-md1-s1 kernel: LustreError: 22267:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551242312, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff9cb1fd2ad100/0xb7044c64d4aea6ae lrc: 3/1,0 mode: --/PR res: [0x2c0007328:0x173:0x0].0xff4b9336 bits 0x2/0x0 rrc: 6 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 22267 timeout: 0 lvb_type: 0 Feb 26 20:44:04 fir-md1-s1 kernel: LustreError: 51440:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551242344, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff9cbc0dbee300/0xb7044c64d4c35d58 lrc: 3/1,0 mode: --/PR res: [0x2c0007328:0x173:0x0].0xff4b9336 bits 0x2/0x0 rrc: 6 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 51440 timeout: 0 lvb_type: 0 Feb 26 20:45:37 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ce807089-fd76-9353-e0a7-32c622e380eb (at 10.8.20.15@o2ib6) Feb 26 20:48:27 fir-md1-s1 kernel: Lustre: 22204:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply#012 req@ffff9ca90b605400 x1626138819072560/t0(0) o101->46ae8e18-a430-f0cb-b32c-b912583906ff@10.8.3.5@o2ib6:282/0 lens 616/3264 e 24 to 0 dl 1551242912 ref 2 fl Interpret:/0/0 rc 0/0 Feb 26 20:48:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 46ae8e18-a430-f0cb-b32c-b912583906ff (at 10.8.3.5@o2ib6) reconnecting Feb 26 20:48:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.8.3.5@o2ib6) Feb 26 20:48:33 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 20:48:59 fir-md1-s1 kernel: Lustre: 51443:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply#012 req@ffff9c984123a700 x1626123901520496/t0(0) o101->4c56e9fc-d801-5654-35be-1d5aea53e0b4@10.9.101.44@o2ib4:314/0 lens 616/3264 e 21 to 0 dl 1551242944 ref 2 fl Interpret:/0/0 rc 0/0 Feb 26 20:49:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 4c56e9fc-d801-5654-35be-1d5aea53e0b4 (at 10.9.101.44@o2ib4) reconnecting Feb 26 20:49:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.9.101.44@o2ib4) Feb 26 20:52:58 fir-md1-s1 kernel: LNet: Service thread pid 22205 was inactive for 200.47s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Feb 26 20:52:58 fir-md1-s1 kernel: Pid: 22205, comm: mdt02_057 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Feb 26 20:52:58 fir-md1-s1 kernel: Call Trace: Feb 26 20:52:58 fir-md1-s1 kernel: [] ldlm_completion_ast+0x5b1/0x920 [ptlrpc] Feb 26 20:52:58 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Feb 26 20:52:58 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Feb 26 20:52:58 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Feb 26 20:52:58 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x101d/0x1c30 [mdt] Feb 26 20:52:58 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Feb 26 20:52:58 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Feb 26 20:52:58 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Feb 26 20:52:58 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Feb 26 20:52:58 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Feb 26 20:52:58 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Feb 26 20:52:58 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Feb 26 20:52:58 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Feb 26 20:52:58 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Feb 26 20:52:58 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Feb 26 20:52:58 fir-md1-s1 kernel: [] 0xffffffffffffffff Feb 26 20:52:58 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551243178.22205 Feb 26 20:54:37 fir-md1-s1 kernel: LustreError: 22205:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551242977, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff9cc28ef35a00/0xb7044c64d68c2a01 lrc: 3/1,0 mode: --/PR res: [0x2c0007328:0x173:0x0].0xff4b9336 bits 0x2/0x0 rrc: 8 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 22205 timeout: 0 lvb_type: 0 Feb 26 20:58:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 46ae8e18-a430-f0cb-b32c-b912583906ff (at 10.8.3.5@o2ib6) reconnecting Feb 26 20:58:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.8.3.5@o2ib6) Feb 26 20:59:05 fir-md1-s1 kernel: LustreError: 21904:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551243245, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff9cac2c7172c0/0xb7044c64d7505b08 lrc: 3/1,0 mode: --/PR res: [0x2c0007328:0x173:0x0].0xff4b9336 bits 0x2/0x0 rrc: 8 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 21904 timeout: 0 lvb_type: 0 Feb 26 20:59:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.9.101.44@o2ib4) Feb 26 21:01:38 fir-md1-s1 kernel: LustreError: 50752:0:(mdt_io.c:442:mdt_preprw_write()) fir-MDT0002: BRW to missing obj [0x2c0007536:0x8011:0x0] Feb 26 21:02:07 fir-md1-s1 kernel: Lustre: 21927:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-150), not sending early reply#012 req@ffff9cbf3f6b3300 x1626151786050224/t0(0) o101->6fa78cd9-5268-3d22-5dd7-ed9967a00172@10.8.4.3@o2ib6:347/0 lens 616/3264 e 0 to 0 dl 1551243732 ref 2 fl Interpret:/0/0 rc 0/0 Feb 26 21:02:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 6fa78cd9-5268-3d22-5dd7-ed9967a00172 (at 10.8.4.3@o2ib6) reconnecting Feb 26 21:02:13 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 26 21:02:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 72b2ac22-4dfa-46cc-3f81-2b4fa011eb7a (at 10.8.4.3@o2ib6) Feb 26 21:03:26 fir-md1-s1 kernel: LNet: Service thread pid 47881 was inactive for 200.19s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Feb 26 21:03:26 fir-md1-s1 kernel: Pid: 47881, comm: mdt03_027 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Feb 26 21:03:26 fir-md1-s1 kernel: Call Trace: Feb 26 21:03:27 fir-md1-s1 kernel: [] ldlm_completion_ast+0x5b1/0x920 [ptlrpc] Feb 26 21:03:27 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Feb 26 21:03:27 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Feb 26 21:03:27 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Feb 26 21:03:27 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x101d/0x1c30 [mdt] Feb 26 21:03:27 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Feb 26 21:03:27 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Feb 26 21:03:27 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Feb 26 21:03:27 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Feb 26 21:03:27 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Feb 26 21:03:27 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Feb 26 21:03:27 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Feb 26 21:03:27 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Feb 26 21:03:27 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Feb 26 21:03:27 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Feb 26 21:03:27 fir-md1-s1 kernel: [] 0xffffffffffffffff Feb 26 21:03:27 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551243807.47881 Feb 26 21:03:44 fir-md1-s1 kernel: LNet: Service thread pid 21967 was inactive for 200.05s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Feb 26 21:03:44 fir-md1-s1 kernel: Pid: 21967, comm: mdt01_038 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Feb 26 21:03:44 fir-md1-s1 kernel: Call Trace: Feb 26 21:03:44 fir-md1-s1 kernel: [] ldlm_completion_ast+0x5b1/0x920 [ptlrpc] Feb 26 21:03:44 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Feb 26 21:03:44 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Feb 26 21:03:44 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Feb 26 21:03:44 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x101d/0x1c30 [mdt] Feb 26 21:03:44 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Feb 26 21:03:44 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Feb 26 21:03:44 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Feb 26 21:03:44 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Feb 26 21:03:44 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Feb 26 21:03:44 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Feb 26 21:03:44 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Feb 26 21:03:44 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Feb 26 21:03:44 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Feb 26 21:03:44 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Feb 26 21:03:44 fir-md1-s1 kernel: [] 0xffffffffffffffff Feb 26 21:03:44 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551243824.21967 Feb 26 21:04:40 fir-md1-s1 kernel: LNet: Service thread pid 47887 was inactive for 200.01s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Feb 26 21:04:40 fir-md1-s1 kernel: Pid: 47887, comm: mdt03_029 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Feb 26 21:04:40 fir-md1-s1 kernel: Call Trace: Feb 26 21:04:40 fir-md1-s1 kernel: [] ldlm_completion_ast+0x5b1/0x920 [ptlrpc] Feb 26 21:04:40 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Feb 26 21:04:40 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Feb 26 21:04:40 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Feb 26 21:04:40 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x101d/0x1c30 [mdt] Feb 26 21:04:40 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Feb 26 21:04:40 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Feb 26 21:04:40 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Feb 26 21:04:40 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Feb 26 21:04:40 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Feb 26 21:04:40 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Feb 26 21:04:40 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Feb 26 21:04:40 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Feb 26 21:04:40 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Feb 26 21:04:40 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Feb 26 21:04:40 fir-md1-s1 kernel: [] 0xffffffffffffffff Feb 26 21:04:40 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551243880.47887 Feb 26 21:05:06 fir-md1-s1 kernel: LustreError: 47881:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551243606, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff9cbcee7c8000/0xb7044c64d867bdf8 lrc: 3/1,0 mode: --/PR res: [0x2c0007328:0x173:0x0].0xff4b9336 bits 0x2/0x0 rrc: 11 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 47881 timeout: 0 lvb_type: 0 Feb 26 21:05:24 fir-md1-s1 kernel: LustreError: 21967:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551243624, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff9cb3c66b0000/0xb7044c64d870e252 lrc: 3/1,0 mode: --/PR res: [0x2c0007328:0x173:0x0].0xff4b9336 bits 0x2/0x0 rrc: 12 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 21967 timeout: 0 lvb_type: 0 Feb 26 21:06:20 fir-md1-s1 kernel: LustreError: 47887:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551243680, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff9cc77b6f8fc0/0xb7044c64d891382d lrc: 3/1,0 mode: --/PR res: [0x2c0007328:0x173:0x0].0xff4b9336 bits 0x2/0x0 rrc: 12 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 47887 timeout: 0 lvb_type: 0 Feb 26 21:06:35 fir-md1-s1 kernel: Lustre: 22288:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-150), not sending early reply#012 req@ffff9cb0e0e45700 x1626120662704848/t0(0) o101->bdfb8fbe-b0b7-c92b-13a7-cd95abb6c09f@10.8.3.14@o2ib6:615/0 lens 616/3264 e 0 to 0 dl 1551244000 ref 2 fl Interpret:/0/0 rc 0/0 Feb 26 21:06:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client bdfb8fbe-b0b7-c92b-13a7-cd95abb6c09f (at 10.8.3.14@o2ib6) reconnecting Feb 26 21:06:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to (at 10.8.3.14@o2ib6) Feb 26 21:08:25 fir-md1-s1 kernel: Lustre: MGS: Connection restored to e3d7d750-2a26-e81f-1160-2f3ee9d7f849 (at 10.9.106.10@o2ib4) Feb 26 21:10:01 fir-md1-s1 kernel: Lustre: 51419:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply#012 req@ffff9cd0c6e38300 x1626169648597152/t0(0) o101->8c66dc95-793f-630e-2149-901ce83164ec@10.9.101.14@o2ib4:66/0 lens 616/3264 e 23 to 0 dl 1551244206 ref 2 fl Interpret:/0/0 rc 0/0 Feb 26 21:10:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client 8c66dc95-793f-630e-2149-901ce83164ec (at 10.9.101.14@o2ib4) reconnecting Feb 26 21:10:07 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 21:10:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 26 21:10:07 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Feb 26 21:10:10 fir-md1-s1 kernel: LustreError: 21998:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551243910, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff9ccec2a4b180/0xb7044c64d929951b lrc: 3/1,0 mode: --/PR res: [0x2c0007328:0x173:0x0].0xff4b9336 bits 0x2/0x0 rrc: 12 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 21998 timeout: 0 lvb_type: 0 Feb 26 21:10:19 fir-md1-s1 kernel: Lustre: 21291:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply#012 req@ffff9ca6ecb69800 x1626157480319664/t0(0) o101->0098b3ca-9c64-c96b-0dc4-0d4b5b3a3268@10.8.4.33@o2ib6:84/0 lens 616/3264 e 16 to 0 dl 1551244224 ref 2 fl Interpret:/0/0 rc 0/0 Feb 26 21:11:15 fir-md1-s1 kernel: Lustre: 21248:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply#012 req@ffff9cd0c2b3f800 x1626222349772544/t0(0) o101->592a51bd-e814-0c15-eeb6-9f1ef8a77f16@10.9.108.38@o2ib4:140/0 lens 616/3264 e 7 to 0 dl 1551244280 ref 2 fl Interpret:/0/0 rc 0/0 Feb 26 21:14:09 fir-md1-s1 kernel: LNet: Service thread pid 21904 was inactive for 1203.44s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Feb 26 21:14:09 fir-md1-s1 kernel: Pid: 21904, comm: mdt01_025 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Feb 26 21:14:09 fir-md1-s1 kernel: Call Trace: Feb 26 21:14:09 fir-md1-s1 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Feb 26 21:14:09 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Feb 26 21:14:09 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Feb 26 21:14:09 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Feb 26 21:14:09 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x101d/0x1c30 [mdt] Feb 26 21:14:09 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Feb 26 21:14:09 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Feb 26 21:14:09 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Feb 26 21:14:09 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Feb 26 21:14:09 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Feb 26 21:14:09 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Feb 26 21:14:09 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Feb 26 21:14:09 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Feb 26 21:14:09 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Feb 26 21:14:09 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Feb 26 21:14:09 fir-md1-s1 kernel: [] 0xffffffffffffffff Feb 26 21:14:09 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551244449.21904 Feb 26 21:14:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 72b2ac22-4dfa-46cc-3f81-2b4fa011eb7a (at 10.8.4.3@o2ib6) Feb 26 21:14:49 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 26 21:15:05 fir-md1-s1 kernel: Lustre: 21247:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply#012 req@ffff9ccd33bfef00 x1626240645669072/t0(0) o101->ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529@10.9.108.51@o2ib4:370/0 lens 616/3264 e 1 to 0 dl 1551244510 ref 2 fl Interpret:/0/0 rc 0/0 Feb 26 21:15:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 26 21:15:11 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Feb 26 21:15:47 fir-md1-s1 kernel: LNet: Service thread pid 21998 was inactive for 636.93s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Feb 26 21:15:47 fir-md1-s1 kernel: Pid: 21998, comm: mdt03_017 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Feb 26 21:15:47 fir-md1-s1 kernel: Call Trace: Feb 26 21:15:47 fir-md1-s1 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Feb 26 21:15:47 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Feb 26 21:15:47 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Feb 26 21:15:47 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Feb 26 21:15:47 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x101d/0x1c30 [mdt] Feb 26 21:15:47 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Feb 26 21:15:47 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Feb 26 21:15:47 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Feb 26 21:15:47 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Feb 26 21:15:47 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Feb 26 21:15:47 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Feb 26 21:15:47 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Feb 26 21:15:47 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Feb 26 21:15:47 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Feb 26 21:15:47 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Feb 26 21:15:47 fir-md1-s1 kernel: [] 0xffffffffffffffff Feb 26 21:15:47 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551244547.21998 Feb 26 21:19:28 fir-md1-s1 kernel: LustreError: 22272:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551244468, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff9ccf61263180/0xb7044c64da5699dd lrc: 3/1,0 mode: --/PR res: [0x2c0007328:0x173:0x0].0xff4b9336 bits 0x2/0x0 rrc: 13 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 22272 timeout: 0 lvb_type: 0 Feb 26 21:20:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 26 21:20:08 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Feb 26 21:24:24 fir-md1-s1 kernel: Lustre: 21875:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (4/4), not sending early reply#012 req@ffff9ccfb4637500 x1626148092398272/t0(0) o101->dde93ffd-c5a9-e24e-8415-4db4b158b86b@10.9.101.20@o2ib4:173/0 lens 616/3264 e 1 to 0 dl 1551245068 ref 2 fl Interpret:/0/0 rc 0/0 Feb 26 21:25:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 26 21:25:12 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 26 21:29:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 11366750-6b6e-8b38-85bb-2768074a85aa (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c9a993f1800, cur 1551245369 expire 1551245219 last 1551245142 Feb 26 21:29:29 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 21:30:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 26 21:30:09 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 26 21:34:29 fir-md1-s1 kernel: LNet: Service thread pid 22272 was inactive for 1201.01s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Feb 26 21:34:29 fir-md1-s1 kernel: Pid: 22272, comm: mdt03_023 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Feb 26 21:34:29 fir-md1-s1 kernel: Call Trace: Feb 26 21:34:29 fir-md1-s1 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Feb 26 21:34:29 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Feb 26 21:34:29 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Feb 26 21:34:29 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Feb 26 21:34:29 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0x101d/0x1c30 [mdt] Feb 26 21:34:29 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Feb 26 21:34:29 fir-md1-s1 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Feb 26 21:34:29 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Feb 26 21:34:29 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Feb 26 21:34:29 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Feb 26 21:34:29 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Feb 26 21:34:29 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Feb 26 21:34:29 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Feb 26 21:34:29 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Feb 26 21:34:29 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Feb 26 21:34:29 fir-md1-s1 kernel: [] 0xffffffffffffffff Feb 26 21:34:29 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551245669.22272 Feb 26 21:35:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 26 21:35:13 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 26 21:36:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 52733c2f-f233-b5cc-d346-759443d05063 (at 10.8.26.33@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca660e5d000, cur 1551245774 expire 1551245624 last 1551245547 Feb 26 21:36:14 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 21:39:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client ef0c3a49-59bd-2fd0-071f-d5153b593f5a (at 10.8.18.31@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb153f29800, cur 1551245996 expire 1551245846 last 1551245769 Feb 26 21:39:56 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 21:40:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 26 21:40:11 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Feb 26 21:45:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 26 21:45:14 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 26 21:50:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 26 21:50:12 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Feb 26 21:55:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 26 21:55:15 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 26 22:00:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 26 22:00:13 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 26 22:05:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 26 22:05:16 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 26 22:07:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client cf9afcda-5408-c066-9de7-934feb7f8f94 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca7e4760400, cur 1551247673 expire 1551247523 last 1551247446 Feb 26 22:07:53 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 22:10:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 26 22:10:14 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Feb 26 22:12:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 8f1df1d4-2533-a800-2b3d-cffbe6a0c6bb (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb6dc7d9800, cur 1551247928 expire 1551247778 last 1551247701 Feb 26 22:12:08 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 22:15:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 26 22:15:17 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 26 22:18:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 24778ae5-cb3e-f669-8abc-afdf158d9942 (at 10.8.26.33@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cbb75735400, cur 1551248333 expire 1551248183 last 1551248106 Feb 26 22:18:53 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 22:20:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 26 22:20:15 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Feb 26 22:25:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 26 22:25:18 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 26 22:27:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 33b24dc2-6993-d831-6b12-a459c16cbaa4 (at 10.8.18.31@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca190fb6800, cur 1551248862 expire 1551248712 last 1551248635 Feb 26 22:27:42 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 22:28:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 33b24dc2-6993-d831-6b12-a459c16cbaa4 (at 10.8.18.31@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca6f2a57400, cur 1551248885 expire 1551248735 last 1551248658 Feb 26 22:28:05 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 26 22:30:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 26 22:30:16 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Feb 26 22:35:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 26 22:35:19 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 26 22:40:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 26 22:40:17 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 26 22:45:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 26 22:45:21 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 26 22:50:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 26 22:50:18 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 26 22:51:12 fir-md1-s1 kernel: Lustre: 22187:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551250265/real 1551250265] req@ffff9c950a780c00 x1625960218857408/t0(0) o104->fir-MDT0002@10.8.26.33@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1551250272 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Feb 26 22:51:12 fir-md1-s1 kernel: Lustre: 22187:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 8 previous similar messages Feb 26 22:51:26 fir-md1-s1 kernel: Lustre: 22187:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551250279/real 1551250279] req@ffff9c950a780c00 x1625960218857408/t0(0) o104->fir-MDT0002@10.8.26.33@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1551250286 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 26 22:51:26 fir-md1-s1 kernel: Lustre: 22187:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Feb 26 22:51:47 fir-md1-s1 kernel: Lustre: 22187:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551250300/real 1551250300] req@ffff9c950a780c00 x1625960218857408/t0(0) o104->fir-MDT0002@10.8.26.33@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1551250307 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 26 22:51:47 fir-md1-s1 kernel: Lustre: 22187:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Feb 26 22:51:47 fir-md1-s1 kernel: LustreError: 22187:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.26.33@o2ib6) returned error from blocking AST (req@ffff9c950a780c00 x1625960218857408 status -107 rc -107), evict it ns: mdt-fir-MDT0002_UUID lock: ffff9caece3521c0/0xb7044c64e7992bb9 lrc: 4/0,0 mode: PR/PR res: [0x2c0007181:0x5e32:0x0].0x0 bits 0x13/0x0 rrc: 5 type: IBT flags: 0x60200400000020 nid: 10.8.26.33@o2ib6 remote: 0x3859147b8a2c2899 expref: 50 pid: 22268 timeout: 1235436 lvb_type: 0 Feb 26 22:51:47 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0002: A client on nid 10.8.26.33@o2ib6 was evicted due to a lock blocking callback time out: rc -107 Feb 26 22:51:47 fir-md1-s1 kernel: LustreError: 129189:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 42s: evicting client at 10.8.26.33@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff9caece3521c0/0xb7044c64e7992bb9 lrc: 3/0,0 mode: PR/PR res: [0x2c0007181:0x5e32:0x0].0x0 bits 0x13/0x0 rrc: 5 type: IBT flags: 0x60200400000020 nid: 10.8.26.33@o2ib6 remote: 0x3859147b8a2c2899 expref: 51 pid: 22268 timeout: 0 lvb_type: 0 Feb 26 22:51:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 3664c69e-7cfa-9e0a-6163-9c0f550b7fc7 (at 10.8.26.33@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c9cd2eec800, cur 1551250312 expire 1551250162 last 1551250085 Feb 26 22:52:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client f669be61-12f7-1c40-020d-d5e177414b94 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c9bd77e4c00, cur 1551250322 expire 1551250172 last 1551250095 Feb 26 22:52:02 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Feb 26 22:55:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 26 22:55:22 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 26 22:55:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client a3c57bef-a739-0ea9-6582-283914517ba2 (at 10.9.103.31@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb4f6152800, cur 1551250537 expire 1551250387 last 1551250310 Feb 26 22:55:56 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 4a38b56a-a6ec-218e-3825-ed77a8d99e20 (at 10.9.103.31@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c9f145f7400, cur 1551250556 expire 1551250406 last 1551250329 Feb 26 22:55:56 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 26 23:00:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 26 23:00:19 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Feb 26 23:05:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 26 23:05:23 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 26 23:07:17 fir-md1-s1 kernel: Lustre: 22288:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551251230/real 1551251230] req@ffff9cb1e4f0ce00 x1625960220980736/t0(0) o104->fir-MDT0002@10.8.18.31@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1551251237 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Feb 26 23:07:24 fir-md1-s1 kernel: Lustre: 22288:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551251237/real 1551251237] req@ffff9cb1e4f0ce00 x1625960220980736/t0(0) o104->fir-MDT0002@10.8.18.31@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1551251244 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 26 23:07:38 fir-md1-s1 kernel: Lustre: 22288:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551251251/real 1551251251] req@ffff9cb1e4f0ce00 x1625960220980736/t0(0) o104->fir-MDT0002@10.8.18.31@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1551251258 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 26 23:07:38 fir-md1-s1 kernel: Lustre: 22288:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Feb 26 23:07:59 fir-md1-s1 kernel: Lustre: 22288:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551251272/real 1551251272] req@ffff9cb1e4f0ce00 x1625960220980736/t0(0) o104->fir-MDT0002@10.8.18.31@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1551251279 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 26 23:07:59 fir-md1-s1 kernel: Lustre: 22288:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Feb 26 23:08:37 fir-md1-s1 kernel: Lustre: 51414:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551251310/real 1551251310] req@ffff9cd19d65d700 x1625960221070976/t0(0) o104->fir-MDT0002@10.8.18.31@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1551251317 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 26 23:08:37 fir-md1-s1 kernel: Lustre: 51414:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 9 previous similar messages Feb 26 23:09:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 1740c437-48db-49df-0c57-e20eac633aec (at 10.8.18.31@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c9aa4724c00, cur 1551251374 expire 1551251224 last 1551251147 Feb 26 23:09:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 1740c437-48db-49df-0c57-e20eac633aec (at 10.8.18.31@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cac6e787000, cur 1551251381 expire 1551251231 last 1551251154 Feb 26 23:09:41 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 26 23:10:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 26 23:10:20 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 26 23:15:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 26 23:15:24 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 26 23:20:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 26 23:20:21 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Feb 26 23:23:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 18e2d41e-26e3-b328-08f9-303d49168e96 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb05636b000, cur 1551252209 expire 1551252059 last 1551251982 Feb 26 23:25:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 26 23:25:25 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 26 23:27:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 4e98aca1-5176-7dc4-0f5e-99bb1a25f4d5 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca662643800, cur 1551252425 expire 1551252275 last 1551252198 Feb 26 23:27:05 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 23:30:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 26 23:30:22 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Feb 26 23:35:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 26 23:35:26 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 26 23:40:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 26 23:40:24 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 26 23:43:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client b3363dd3-6f10-4e81-f362-10865357e42b (at 10.9.106.11@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c9e9816d800, cur 1551253400 expire 1551253250 last 1551253173 Feb 26 23:43:20 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 23:43:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client b3363dd3-6f10-4e81-f362-10865357e42b (at 10.9.106.11@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cc47807ec00, cur 1551253405 expire 1551253255 last 1551253178 Feb 26 23:43:25 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 26 23:44:06 fir-md1-s1 kernel: Lustre: 22225:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551253438/real 1551253438] req@ffff9cb7c2f8a100 x1625960225391472/t0(0) o104->fir-MDT0002@10.8.18.31@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1551253445 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Feb 26 23:44:06 fir-md1-s1 kernel: Lustre: 22225:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 18 previous similar messages Feb 26 23:44:20 fir-md1-s1 kernel: Lustre: 22225:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551253453/real 1551253453] req@ffff9cb7c2f8a100 x1625960225391472/t0(0) o104->fir-MDT0002@10.8.18.31@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1551253460 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 26 23:44:20 fir-md1-s1 kernel: Lustre: 22225:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Feb 26 23:44:41 fir-md1-s1 kernel: Lustre: 22225:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551253474/real 1551253474] req@ffff9cb7c2f8a100 x1625960225391472/t0(0) o104->fir-MDT0002@10.8.18.31@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1551253481 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 26 23:44:41 fir-md1-s1 kernel: Lustre: 22225:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Feb 26 23:45:23 fir-md1-s1 kernel: Lustre: 22225:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551253516/real 1551253516] req@ffff9cb7c2f8a100 x1625960225391472/t0(0) o104->fir-MDT0002@10.8.18.31@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1551253523 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 26 23:45:23 fir-md1-s1 kernel: Lustre: 22225:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages Feb 26 23:45:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 26 23:45:27 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 26 23:46:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 9fb5bc8f-c015-38a2-4d97-6a68a3eb423b (at 10.8.18.31@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c9d54709400, cur 1551253569 expire 1551253419 last 1551253342 Feb 26 23:46:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 9fb5bc8f-c015-38a2-4d97-6a68a3eb423b (at 10.8.18.31@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb343fbfc00, cur 1551253577 expire 1551253427 last 1551253350 Feb 26 23:46:17 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 26 23:50:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 26 23:50:25 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Feb 26 23:52:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 6d0c0e81-8f24-8d59-6a5c-99645daffcc0 (at 10.8.18.31@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9caacf7ecc00, cur 1551253926 expire 1551253776 last 1551253699 Feb 26 23:55:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 26 23:55:28 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 26 23:57:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client c1606760-dc37-0be3-522c-32eac969b366 (at 10.8.18.31@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca88b746000, cur 1551254244 expire 1551254094 last 1551254017 Feb 26 23:57:24 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 23:58:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client aed98c92-7516-2fd2-ad8f-6d6af3f10bd0 (at 10.8.20.15@o2ib6) in 160 seconds. I think it's dead, and I am evicting it. exp ffff9ca2261cd800, cur 1551254320 expire 1551254170 last 1551254160 Feb 26 23:58:40 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 26 23:59:47 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 19f6684c-eb87-8c16-c26e-4ebfa118e108 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c99e4197c00, cur 1551254387 expire 1551254237 last 1551254160 Feb 26 23:59:47 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 27 00:00:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 00:00:26 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Feb 27 00:02:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 827df6b9-8791-e80a-263e-89c9d4786313 (at 10.8.18.31@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cacb2fc2000, cur 1551254559 expire 1551254409 last 1551254332 Feb 27 00:05:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 00:05:29 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 00:10:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 00:10:27 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Feb 27 00:15:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 00:15:30 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 27 00:20:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 00:20:28 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Feb 27 00:25:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 00:25:31 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 00:26:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 12ed72a6-9ff3-0134-3548-4cdffb71fecc (at 10.8.29.2@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cc47807b400, cur 1551255984 expire 1551255834 last 1551255757 Feb 27 00:26:24 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Feb 27 00:30:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 00:30:29 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 00:35:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 00:35:32 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 27 00:40:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 00:40:30 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 27 00:41:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 6792fc33-6546-9fba-3aff-deaf8715d928 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb470767400, cur 1551256882 expire 1551256732 last 1551256655 Feb 27 00:41:22 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 27 00:45:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 00:45:33 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 00:50:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 00:50:31 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Feb 27 00:55:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 00:55:34 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 01:00:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 01:00:32 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Feb 27 01:05:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 01:05:35 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 27 01:10:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 01:10:33 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 01:15:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 01:15:36 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 01:20:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 01:20:34 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 01:23:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 90c91e75-46ba-4a46-2672-39282762b97c (at 10.8.18.31@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cafc6ac6400, cur 1551259436 expire 1551259286 last 1551259209 Feb 27 01:23:56 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 27 01:25:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 01:25:38 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 27 01:26:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client afb4e635-c62a-c4c5-c50e-c4203c8fbfa8 (at 10.8.4.12@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca779ec0c00, cur 1551259599 expire 1551259449 last 1551259372 Feb 27 01:26:39 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 27 01:30:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 01:30:35 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Feb 27 01:30:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 6a1c2c98-0d47-91b2-781a-f670f1c7d341 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca866665c00, cur 1551259846 expire 1551259696 last 1551259619 Feb 27 01:30:46 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 27 01:35:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 01:35:39 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 01:40:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 01:40:37 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 01:42:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 9cb3db7d-519e-ab2a-317b-c21da9a59f55 (at 10.8.18.31@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb730aa5000, cur 1551260555 expire 1551260405 last 1551260328 Feb 27 01:42:35 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 27 01:45:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 01:45:40 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 01:47:22 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client ba784191-a134-c73d-ee15-d1a80ab225d6 (at 10.8.18.31@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb108fce000, cur 1551260842 expire 1551260692 last 1551260615 Feb 27 01:47:22 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 27 01:50:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 01:50:38 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Feb 27 01:52:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 299e34ad-357d-a63b-a82f-0b7a2579ea0e (at 10.8.18.31@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb39476d800, cur 1551261124 expire 1551260974 last 1551260897 Feb 27 01:52:04 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 27 01:55:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 01:55:41 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 27 02:00:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 4d30b815-3d3c-7361-5607-a211379e0754 (at 10.8.18.31@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb37ff4d800, cur 1551261636 expire 1551261486 last 1551261409 Feb 27 02:00:36 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 27 02:00:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 02:00:39 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Feb 27 02:05:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 02:05:42 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 02:08:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 81f5800f-4317-2f32-0d09-cb2da5456380 (at 10.8.18.31@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c9501189400, cur 1551262107 expire 1551261957 last 1551261880 Feb 27 02:08:27 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 27 02:08:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 81f5800f-4317-2f32-0d09-cb2da5456380 (at 10.8.18.31@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca7fbe95800, cur 1551262128 expire 1551261978 last 1551261901 Feb 27 02:08:48 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 27 02:09:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client ecfc1d1f-def9-0233-b190-86369353b9ea (at 10.8.20.15@o2ib6) in 179 seconds. I think it's dead, and I am evicting it. exp ffff9cc3d5f69400, cur 1551262183 expire 1551262033 last 1551262004 Feb 27 02:10:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 02:10:40 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Feb 27 02:15:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 02:15:43 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 27 02:16:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 9e7293dd-ec85-5c9b-e0d5-ec0e459ab351 (at 10.8.18.31@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb475ab5c00, cur 1551262568 expire 1551262418 last 1551262341 Feb 27 02:16:08 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 27 02:20:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 02:20:41 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Feb 27 02:25:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 02:25:44 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 02:30:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 02:30:42 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 02:35:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 02:35:45 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 02:40:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 3fd9c8a9-9b32-aa2a-e9eb-390c019a0b2c (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca093ea6400, cur 1551264033 expire 1551263883 last 1551263806 Feb 27 02:40:33 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 27 02:40:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 02:40:43 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 27 02:45:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 02:45:46 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 27 02:50:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 02:50:44 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Feb 27 02:55:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 02:55:47 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 03:00:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 03:00:45 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 03:05:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 03:05:48 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 27 03:10:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 03:10:46 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 27 03:15:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 03:15:49 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 03:17:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 1c74969a-bec7-76ba-47aa-a9f0b48eff93 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca492435c00, cur 1551266248 expire 1551266098 last 1551266021 Feb 27 03:17:28 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 27 03:20:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 03:20:47 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Feb 27 03:25:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 03:25:50 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 03:30:09 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 879d94a8-a845-ea21-f6e8-a2d093701c88 (at 10.9.103.15@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cc01bf22000, cur 1551267009 expire 1551266859 last 1551266782 Feb 27 03:30:09 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 27 03:30:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 03:30:48 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 27 03:35:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 03:35:52 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 27 03:40:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 03:40:50 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 03:45:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 03:45:53 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 03:50:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 03:50:51 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 03:53:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client c983e690-4a52-f452-dd68-e347d580a80d (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cc21f351000, cur 1551268405 expire 1551268255 last 1551268178 Feb 27 03:53:25 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 27 03:55:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 03:55:54 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 27 04:00:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 04:00:52 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Feb 27 04:05:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 04:05:55 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 04:10:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 04:10:53 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Feb 27 04:15:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 04:15:56 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 04:20:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 04:20:54 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 27 04:25:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 04:25:57 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 27 04:27:13 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client e4fc3255-f24e-b9b9-b1cd-0107702f059a (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cd1253fdc00, cur 1551270433 expire 1551270283 last 1551270206 Feb 27 04:27:13 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 27 04:30:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 04:30:55 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Feb 27 04:35:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client ad459c81-c80e-33d1-0264-0f3efe91204c (at 10.8.18.31@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c95f13bd400, cur 1551270926 expire 1551270776 last 1551270699 Feb 27 04:35:26 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 27 04:35:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 04:35:58 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 04:40:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 04:40:56 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Feb 27 04:45:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 04:45:59 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 27 04:50:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 04:50:57 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 27 04:56:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 04:56:00 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 05:00:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 05:00:58 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 27 05:01:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 13efc053-bc5b-08f3-50f2-203848f33f8f (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cbb67f09000, cur 1551272505 expire 1551272355 last 1551272278 Feb 27 05:01:45 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 27 05:06:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 05:06:01 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 05:10:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 05:10:59 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Feb 27 05:16:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 05:16:02 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 27 05:20:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client f7a0d0aa-251a-211d-7350-340df04c351e (at 10.8.18.31@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca67f655800, cur 1551273606 expire 1551273456 last 1551273379 Feb 27 05:20:06 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 27 05:21:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 05:21:00 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 05:26:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 05:26:03 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 27 05:31:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 05:31:01 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Feb 27 05:36:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 05:36:04 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 05:36:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 423c99a8-9686-5e73-248a-69577cd9075f (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb34274e800, cur 1551274568 expire 1551274418 last 1551274341 Feb 27 05:36:08 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 27 05:41:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 05:41:03 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Feb 27 05:46:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 05:46:06 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 05:51:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 05:51:04 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 27 05:56:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 05:56:07 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 05:56:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 63d9bf75-d764-a9a0-017c-96ce55afe613 (at 10.8.18.31@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c9bf5410400, cur 1551275769 expire 1551275619 last 1551275542 Feb 27 05:56:09 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 27 06:01:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 06:01:05 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Feb 27 06:04:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 1e55a8fe-1155-1620-97fd-4f31c961742b (at 10.8.18.31@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb28aa52c00, cur 1551276243 expire 1551276093 last 1551276016 Feb 27 06:04:03 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 27 06:05:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client f6d62d24-c3c1-00e7-ae80-d1625c77a42d (at 10.8.20.15@o2ib6) in 174 seconds. I think it's dead, and I am evicting it. exp ffff9caa6b79ec00, cur 1551276319 expire 1551276169 last 1551276145 Feb 27 06:05:19 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 27 06:06:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 06:06:08 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 27 06:08:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 07f0334b-24cb-aa64-e0c3-2e0fcb7b097e (at 10.8.18.31@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cc2547c4800, cur 1551276484 expire 1551276334 last 1551276257 Feb 27 06:08:04 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 27 06:11:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 06:11:06 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Feb 27 06:13:05 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client e86fea82-7799-12a3-9b83-0f8ff37cad2a (at 10.8.18.31@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cd309383800, cur 1551276785 expire 1551276635 last 1551276558 Feb 27 06:13:05 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 27 06:16:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 06:16:09 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 27 06:21:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 06:21:07 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Feb 27 06:26:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 06:26:10 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 06:31:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 06:31:08 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 06:36:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 06:36:11 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 06:39:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 56402132-371a-7044-687a-ecd11f531e92 (at 10.8.18.31@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb12d2f7400, cur 1551278390 expire 1551278240 last 1551278163 Feb 27 06:39:50 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 27 06:41:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client f1a67df7-3c12-179a-1bf4-2020df1bb713 (at 10.8.20.15@o2ib6) in 195 seconds. I think it's dead, and I am evicting it. exp ffff9c9e4eb5c000, cur 1551278466 expire 1551278316 last 1551278271 Feb 27 06:41:06 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 27 06:41:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 06:41:09 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 27 06:46:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 06:46:12 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 27 06:51:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 06:51:10 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Feb 27 06:55:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client aea7f273-f3cc-2f08-48e6-ad9397da92d1 (at 10.8.18.31@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cae0ce82000, cur 1551279323 expire 1551279173 last 1551279096 Feb 27 06:55:23 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 27 06:56:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 06:56:13 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 07:01:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 07:01:11 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 07:06:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 07:06:14 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 27 07:11:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 07:11:12 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Feb 27 07:16:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 07:16:15 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 07:17:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 47b4e7e9-2ee3-7739-8fac-75d29b35e829 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cba33af8400, cur 1551280659 expire 1551280509 last 1551280432 Feb 27 07:17:39 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 27 07:21:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 07:21:13 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Feb 27 07:22:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client fc373204-9bff-b372-8d85-a510d8cb4d1f (at 10.8.18.34@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c98ddd94400, cur 1551280964 expire 1551280814 last 1551280737 Feb 27 07:22:44 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 27 07:26:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 07:26:16 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 07:31:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 07:31:14 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Feb 27 07:36:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 07:36:17 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 27 07:41:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 07:41:16 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 07:44:39 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 81128fed-962e-75ef-f7a3-b64fecfc91b8 (at 10.8.18.31@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cc3e7b22400, cur 1551282279 expire 1551282129 last 1551282052 Feb 27 07:44:39 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 27 07:46:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 07:46:18 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 07:51:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 07:51:17 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Feb 27 07:52:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client f950e089-59a0-f88a-130c-04245d41d8a3 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9caf82b6c400, cur 1551282735 expire 1551282585 last 1551282508 Feb 27 07:52:15 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 27 07:56:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 07:56:20 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 27 08:01:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 08:01:18 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Feb 27 08:06:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 08:06:21 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 08:09:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 2c2890c3-b76f-2dc9-03f1-91b8e0c7ca35 (at 10.8.15.10@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c9c0decbc00, cur 1551283744 expire 1551283594 last 1551283517 Feb 27 08:09:04 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 27 08:09:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 2c2890c3-b76f-2dc9-03f1-91b8e0c7ca35 (at 10.8.15.10@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c9852e75400, cur 1551283753 expire 1551283603 last 1551283526 Feb 27 08:09:13 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 27 08:11:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 08:11:19 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 08:11:56 fir-md1-s1 kernel: Lustre: 22132:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551283909/real 1551283909] req@ffff9cac43f6b600 x1625960287841328/t0(0) o106->fir-MDT0002@10.8.18.31@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1551283916 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Feb 27 08:11:56 fir-md1-s1 kernel: Lustre: 22132:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 7 previous similar messages Feb 27 08:12:10 fir-md1-s1 kernel: Lustre: 22132:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551283923/real 1551283923] req@ffff9cac43f6b600 x1625960287841328/t0(0) o106->fir-MDT0002@10.8.18.31@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1551283930 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 27 08:12:10 fir-md1-s1 kernel: Lustre: 22132:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Feb 27 08:12:31 fir-md1-s1 kernel: Lustre: 22132:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551283944/real 1551283944] req@ffff9cac43f6b600 x1625960287841328/t0(0) o106->fir-MDT0002@10.8.18.31@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1551283951 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 27 08:12:31 fir-md1-s1 kernel: Lustre: 22132:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Feb 27 08:13:13 fir-md1-s1 kernel: Lustre: 22132:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551283986/real 1551283986] req@ffff9cac43f6b600 x1625960287841328/t0(0) o106->fir-MDT0002@10.8.18.31@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1551283993 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 27 08:13:13 fir-md1-s1 kernel: Lustre: 22132:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages Feb 27 08:14:30 fir-md1-s1 kernel: Lustre: 22132:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551284063/real 1551284063] req@ffff9cac43f6b600 x1625960287841328/t0(0) o106->fir-MDT0002@10.8.18.31@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1551284070 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 27 08:14:30 fir-md1-s1 kernel: Lustre: 22132:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 10 previous similar messages Feb 27 08:15:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client d1ce9bb9-28c1-286a-36e5-917eec823e48 (at 10.8.18.34@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca7d3f93000, cur 1551284108 expire 1551283958 last 1551283881 Feb 27 08:16:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 08:16:22 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 08:21:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 08:21:20 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Feb 27 08:25:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 65607c0b-31d0-b85c-0cd9-98dee3bda486 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cc06be23c00, cur 1551284747 expire 1551284597 last 1551284520 Feb 27 08:25:47 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 27 08:26:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 08:26:23 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 27 08:31:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 08:31:21 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Feb 27 08:36:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 08:36:24 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 08:41:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 08:41:22 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 08:46:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 08:46:25 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 27 08:51:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 08:51:23 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 27 08:56:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 08:56:26 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 09:01:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 09:01:24 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 09:04:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 86ba1f57-447a-970a-e35d-f9f3e10dbc28 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cab64b89000, cur 1551287057 expire 1551286907 last 1551286830 Feb 27 09:04:17 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 27 09:06:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 09:06:27 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 09:11:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 09:11:25 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Feb 27 09:16:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client bd263e8b-14cd-e651-91d0-4af7bbc2a545 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca6a421b400, cur 1551287782 expire 1551287632 last 1551287555 Feb 27 09:16:22 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 27 09:16:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 09:16:28 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 27 09:21:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 09:21:26 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Feb 27 09:25:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 7d683273-d398-7147-ee05-936affa82945 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cc4782d3400, cur 1551288327 expire 1551288177 last 1551288100 Feb 27 09:25:27 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 27 09:26:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 09:26:29 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 09:31:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 09:31:27 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Feb 27 09:34:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 3cff31fc-f52f-69eb-eab0-ac3dab3d2254 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca73a01e400, cur 1551288861 expire 1551288711 last 1551288634 Feb 27 09:34:21 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 27 09:36:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 09:36:30 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 27 09:41:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 09:41:29 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Feb 27 09:46:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 55d97700-f972-3c79-f6f3-212281f5861e (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb1f6ca1400, cur 1551289561 expire 1551289411 last 1551289334 Feb 27 09:46:01 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 27 09:46:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 09:46:31 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 09:51:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 09:51:30 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Feb 27 09:56:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 1a7c4a85-48b4-e0ef-6a00-6789f4a82b7f (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cac8b805400, cur 1551290178 expire 1551290028 last 1551289951 Feb 27 09:56:18 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 27 09:56:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 09:56:32 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 10:01:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 10:01:31 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Feb 27 10:05:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 2cff157b-a3ef-8f81-5463-c2f2cc4850b1 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cbaf16cc400, cur 1551290741 expire 1551290591 last 1551290514 Feb 27 10:05:41 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 27 10:06:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 10:06:34 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 27 10:11:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 10:11:32 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Feb 27 10:15:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 06760105-807c-acfa-970d-fe0f142109ab (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cbb953f8c00, cur 1551291349 expire 1551291199 last 1551291122 Feb 27 10:15:49 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 27 10:16:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 10:16:35 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 10:21:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 10:21:33 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Feb 27 10:24:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client a00ef1bc-8c52-d936-6bb2-fe20514b99b9 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb3e9549c00, cur 1551291874 expire 1551291724 last 1551291647 Feb 27 10:24:34 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 27 10:26:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 10:26:36 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 27 10:31:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 10:31:34 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Feb 27 10:34:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client a0a153a4-af25-0e81-ae8c-7970a5f64554 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca7c6a46c00, cur 1551292471 expire 1551292321 last 1551292244 Feb 27 10:34:31 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 27 10:36:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 10:36:37 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 10:41:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 10:41:35 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Feb 27 10:44:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 4d4c1a55-6f3c-9a92-ad05-8c201d929d06 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca467012c00, cur 1551293040 expire 1551292890 last 1551292813 Feb 27 10:44:00 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 27 10:45:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client b9d5ee48-d92d-0b8f-b217-c073d8cf4946 (at 10.9.103.2@o2ib4) in 195 seconds. I think it's dead, and I am evicting it. exp ffff9cc01bf20c00, cur 1551293116 expire 1551292966 last 1551292921 Feb 27 10:45:16 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 27 10:45:48 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client b81b6ec1-ac3a-c940-40fc-f881fcdab215 (at 10.9.103.2@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca75b7b3000, cur 1551293148 expire 1551292998 last 1551292921 Feb 27 10:45:48 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 27 10:46:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 10:46:38 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 10:51:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 10:51:36 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Feb 27 10:55:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 60653fc4-7600-7253-4897-510711012c5a (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca645b4b800, cur 1551293747 expire 1551293597 last 1551293520 Feb 27 10:56:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 10:56:39 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 27 11:01:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 11:01:37 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Feb 27 11:01:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client cbccc8dc-d0d0-7a5d-19fa-5016ce74ce38 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c951245ec00, cur 1551294103 expire 1551293953 last 1551293876 Feb 27 11:01:43 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 27 11:06:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 11:06:40 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 11:11:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 9c6816c3-0271-77cd-ee52-d8b7eebd9d01 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb940fb1c00, cur 1551294663 expire 1551294513 last 1551294436 Feb 27 11:11:03 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 27 11:11:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 11:11:38 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Feb 27 11:16:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 11:16:41 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 27 11:21:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 11:21:39 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Feb 27 11:24:37 fir-md1-s1 kernel: Lustre: 22189:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551295470/real 1551295470] req@ffff9cab184c3300 x1625960309053568/t0(0) o106->fir-MDT0002@10.8.20.15@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1551295477 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Feb 27 11:24:37 fir-md1-s1 kernel: Lustre: 22189:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages Feb 27 11:24:58 fir-md1-s1 kernel: Lustre: 22189:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551295491/real 1551295491] req@ffff9cab184c3300 x1625960309053568/t0(0) o106->fir-MDT0002@10.8.20.15@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1551295498 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 27 11:24:58 fir-md1-s1 kernel: Lustre: 22189:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Feb 27 11:25:40 fir-md1-s1 kernel: Lustre: 22189:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551295533/real 1551295533] req@ffff9cab184c3300 x1625960309053568/t0(0) o106->fir-MDT0002@10.8.20.15@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1551295540 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 27 11:25:40 fir-md1-s1 kernel: Lustre: 22189:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages Feb 27 11:26:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 11:26:42 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 11:26:56 fir-md1-s1 kernel: Lustre: 21919:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551295609/real 1551295609] req@ffff9cbc8d37a100 x1625960309396112/t0(0) o104->fir-MDT0002@10.8.20.15@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1551295616 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Feb 27 11:26:56 fir-md1-s1 kernel: Lustre: 21919:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 19 previous similar messages Feb 27 11:27:36 fir-md1-s1 kernel: LustreError: 22189:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.20.15@o2ib6) returned error from glimpse AST (req@ffff9cab184c3300 x1625960309053568 status -107 rc -107), evict it ns: mdt-fir-MDT0002_UUID lock: ffff9ca417074800/0xb7044c65201fed41 lrc: 6/0,0 mode: PW/PW res: [0x2c00076f0:0xd:0x0].0x0 bits 0x40/0x0 rrc: 10 type: IBT flags: 0x40200000000000 nid: 10.8.20.15@o2ib6 remote: 0xf5d4ae45d8ad633f expref: 183 pid: 21291 timeout: 0 lvb_type: 0 Feb 27 11:27:36 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0002: A client on nid 10.8.20.15@o2ib6 was evicted due to a lock glimpse callback time out: rc -107 Feb 27 11:27:36 fir-md1-s1 kernel: LustreError: 129189:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 1551295656s: evicting client at 10.8.20.15@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff9ca417074800/0xb7044c65201fed41 lrc: 6/0,0 mode: PW/PW res: [0x2c00076f0:0xd:0x0].0x0 bits 0x40/0x0 rrc: 11 type: IBT flags: 0x40200000000000 nid: 10.8.20.15@o2ib6 remote: 0xf5d4ae45d8ad633f expref: 184 pid: 21291 timeout: 0 lvb_type: 0 Feb 27 11:27:36 fir-md1-s1 kernel: LustreError: 22189:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) Skipped 1 previous similar message Feb 27 11:28:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 084655ec-d855-a8eb-cec8-de4cebea897b (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cbbfe2e4c00, cur 1551295684 expire 1551295534 last 1551295457 Feb 27 11:28:04 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 27 11:31:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 11:31:40 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Feb 27 11:34:42 fir-md1-s1 kernel: Lustre: 21245:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551296038/real 1551296038] req@ffff9cbdeb351800 x1625960310554240/t0(0) o104->fir-MDT0002@10.8.20.15@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1551296082 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Feb 27 11:34:42 fir-md1-s1 kernel: Lustre: 21245:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 23 previous similar messages Feb 27 11:36:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 11:36:43 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 11:36:55 fir-md1-s1 kernel: LustreError: 21245:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.20.15@o2ib6) failed to reply to blocking AST (req@ffff9cbdeb351800 x1625960310554240 status 0 rc -110), evict it ns: mdt-fir-MDT0002_UUID lock: ffff9cba60f44a40/0xb7044c6522d997e0 lrc: 4/0,0 mode: PR/PR res: [0x2c0007618:0x29a:0x0].0x0 bits 0x13/0x0 rrc: 80 type: IBT flags: 0x60200400000020 nid: 10.8.20.15@o2ib6 remote: 0xfe8d1a97dfe88fb9 expref: 59 pid: 21803 timeout: 1281299 lvb_type: 0 Feb 27 11:36:55 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0002: A client on nid 10.8.20.15@o2ib6 was evicted due to a lock blocking callback time out: rc -110 Feb 27 11:36:55 fir-md1-s1 kernel: LustreError: Skipped 1 previous similar message Feb 27 11:36:55 fir-md1-s1 kernel: LustreError: 129189:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 177s: evicting client at 10.8.20.15@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff9cba60f44a40/0xb7044c6522d997e0 lrc: 3/0,0 mode: PR/PR res: [0x2c0007618:0x29a:0x0].0x0 bits 0x13/0x0 rrc: 80 type: IBT flags: 0x60200400000020 nid: 10.8.20.15@o2ib6 remote: 0xfe8d1a97dfe88fb9 expref: 60 pid: 21803 timeout: 0 lvb_type: 0 Feb 27 11:41:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 11:41:42 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Feb 27 11:45:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client a5975f23-557d-1956-7ad0-48974842cd06 (at 10.8.9.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c9aceb3e000, cur 1551296709 expire 1551296559 last 1551296482 Feb 27 11:45:09 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Feb 27 11:46:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 11:46:44 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 27 11:51:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 11:51:43 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Feb 27 11:55:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 52f3795b-786e-c941-f1fe-da136d13043c (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb0d304a000, cur 1551297320 expire 1551297170 last 1551297093 Feb 27 11:55:20 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 27 11:56:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 11:56:45 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 27 12:01:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 12:01:44 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Feb 27 12:06:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 12:06:46 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 12:10:51 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 8bff2243-5d47-aa0d-ee22-f6395580006e (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb2e5140400, cur 1551298251 expire 1551298101 last 1551298024 Feb 27 12:10:51 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Feb 27 12:11:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 12:11:45 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Feb 27 12:16:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 12:16:48 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 12:21:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 12:21:46 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Feb 27 12:22:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 2aa6ece8-715d-20bc-af20-9a246b513f57 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca8654c7800, cur 1551298931 expire 1551298781 last 1551298704 Feb 27 12:22:11 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Feb 27 12:26:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 12:26:49 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 12:31:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 12:31:47 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Feb 27 12:35:19 fir-md1-s1 kernel: Lustre: DEBUG MARKER: Wed Feb 27 12:35:19 2019 Feb 27 12:36:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 81a79a66-9742-198c-a78e-d8b1866bc7f7 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cd2b5f17000, cur 1551299793 expire 1551299643 last 1551299566 Feb 27 12:36:33 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 27 12:36:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 12:36:50 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 27 12:41:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 12:41:48 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Feb 27 12:46:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 12:46:51 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 27 12:51:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 00bb6ae3-a9d5-86ec-cfe9-e83fc2e636f8 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cad2e199c00, cur 1551300678 expire 1551300528 last 1551300451 Feb 27 12:51:18 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 27 12:51:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 12:51:49 fir-md1-s1 kernel: Lustre: Skipped 14 previous similar messages Feb 27 12:56:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 12:56:52 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 13:01:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 13:01:50 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Feb 27 13:06:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 13:06:53 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 13:09:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 45f5bc21-d8f3-47fb-bbc1-d662f914bf50 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cbd03a26000, cur 1551301790 expire 1551301640 last 1551301563 Feb 27 13:09:50 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 27 13:11:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 13:11:51 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Feb 27 13:16:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 13:16:54 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 27 13:20:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 6a7a623b-69df-2d9f-ef46-e123ab7ce14e (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cc19caff800, cur 1551302425 expire 1551302275 last 1551302198 Feb 27 13:20:25 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 27 13:21:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 13:21:52 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Feb 27 13:26:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 13:26:55 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 13:30:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17f79f6d-3412-2976-c13a-45dd639ec025 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cc35e25ac00, cur 1551303056 expire 1551302906 last 1551302829 Feb 27 13:30:56 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 27 13:31:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 13:31:53 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Feb 27 13:36:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 13:36:56 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 27 13:41:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 13:41:54 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Feb 27 13:46:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 13:46:57 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 13:51:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 13:51:56 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 13:55:20 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client ad24ddbc-b8d1-77f8-afa2-32099a8e9355 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cd187360400, cur 1551304520 expire 1551304370 last 1551304293 Feb 27 13:55:20 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 27 13:56:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 13:56:58 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 14:01:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 14:01:57 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Feb 27 14:06:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 14:06:59 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 27 14:08:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client ce9ed73a-7ee8-4c7f-fd68-05dc5ca1f1ed (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cbbb9faa800, cur 1551305311 expire 1551305161 last 1551305084 Feb 27 14:08:31 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 27 14:11:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 14:11:58 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Feb 27 14:17:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 14:17:00 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 14:21:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 14:21:59 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 14:23:56 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client fd0d1392-0542-d631-2d0b-031376bd132c (at 10.9.106.12@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c98c7a9e400, cur 1551306236 expire 1551306086 last 1551306009 Feb 27 14:23:56 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 27 14:24:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 7e4f58d8-6e8c-32f9-59fb-7de9af0df4b4 (at 10.9.106.12@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb3f7e8d400, cur 1551306247 expire 1551306097 last 1551306020 Feb 27 14:24:07 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 27 14:27:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 14:27:01 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 27 14:32:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 14:32:00 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 27 14:37:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 14:37:03 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 14:42:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 14:42:01 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 14:47:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 14:47:04 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 14:52:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 14:52:02 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 27 14:57:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 14:57:05 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 27 15:02:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 15:02:03 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Feb 27 15:07:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 15:07:06 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 15:12:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 15:12:04 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 15:17:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 15:17:07 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 27 15:22:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 15:22:05 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 27 15:27:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 15:27:08 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 15:32:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 15:32:06 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 15:37:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 15:37:09 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 15:37:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 68521de7-1794-c75f-f6e3-1d6477a534d1 (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca772752400, cur 1551310648 expire 1551310498 last 1551310421 Feb 27 15:42:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 15:42:07 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Feb 27 15:47:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 15:47:10 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 27 15:49:11 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 3d8a67f4-2492-af32-af26-1fb7013841d0 (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cab876cf400, cur 1551311351 expire 1551311201 last 1551311124 Feb 27 15:49:11 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 27 15:51:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client aa26fc7d-8d56-d044-df07-735cb4ad01f5 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cab002d8000, cur 1551311504 expire 1551311354 last 1551311277 Feb 27 15:51:44 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 27 15:51:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client aa26fc7d-8d56-d044-df07-735cb4ad01f5 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cab044dcc00, cur 1551311507 expire 1551311357 last 1551311280 Feb 27 15:51:47 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 27 15:52:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 15:52:08 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Feb 27 15:57:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 15:57:11 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 16:02:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 16:02:10 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Feb 27 16:07:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 16:07:12 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 27 16:12:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 16:12:11 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 27 16:17:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 16:17:13 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 16:22:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 16:22:12 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 16:22:28 fir-md1-s1 kernel: Lustre: 22132:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551313341/real 1551313341] req@ffff9cd056b82d00 x1625960353283760/t0(0) o106->fir-MDT0002@10.8.20.15@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1551313348 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Feb 27 16:22:28 fir-md1-s1 kernel: Lustre: 22132:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages Feb 27 16:23:10 fir-md1-s1 kernel: Lustre: 22132:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551313383/real 1551313383] req@ffff9cd056b82d00 x1625960353283760/t0(0) o106->fir-MDT0002@10.8.20.15@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1551313390 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 27 16:23:10 fir-md1-s1 kernel: Lustre: 22132:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages Feb 27 16:23:57 fir-md1-s1 kernel: LustreError: 22132:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.20.15@o2ib6) returned error from glimpse AST (req@ffff9cd056b82d00 x1625960353283760 status -107 rc -107), evict it ns: mdt-fir-MDT0002_UUID lock: ffff9cc25730e9c0/0xb7044c65323d4b98 lrc: 4/0,0 mode: PW/PW res: [0x2c00074a9:0x15:0x0].0x0 bits 0x40/0x0 rrc: 7 type: IBT flags: 0x40200000000000 nid: 10.8.20.15@o2ib6 remote: 0x649cfea256792a6f expref: 77 pid: 22185 timeout: 0 lvb_type: 0 Feb 27 16:23:57 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0002: A client on nid 10.8.20.15@o2ib6 was evicted due to a lock glimpse callback time out: rc -107 Feb 27 16:23:57 fir-md1-s1 kernel: LustreError: 129189:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 1556s: evicting client at 10.8.20.15@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff9cc25730e9c0/0xb7044c65323d4b98 lrc: 4/0,0 mode: PW/PW res: [0x2c00074a9:0x15:0x0].0x0 bits 0x40/0x0 rrc: 7 type: IBT flags: 0x40200000000000 nid: 10.8.20.15@o2ib6 remote: 0x649cfea256792a6f expref: 78 pid: 22185 timeout: 0 lvb_type: 0 Feb 27 16:24:19 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 1be32e77-4dff-95fa-5a66-67760221c941 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb1e839bc00, cur 1551313459 expire 1551313309 last 1551313232 Feb 27 16:26:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 92360cf1-e21f-b2b4-ad08-6e3a83ed41ed (at 10.8.10.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cba1eb21800, cur 1551313614 expire 1551313464 last 1551313387 Feb 27 16:26:54 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 27 16:27:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 16:27:14 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 16:32:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 16:32:13 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Feb 27 16:37:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 16:37:15 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 27 16:42:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 16:42:14 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 16:47:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 16:47:16 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 16:52:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 16:52:15 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 16:57:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 16:57:18 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 27 17:02:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 17:02:16 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 27 17:07:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 17:07:19 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 17:12:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 17:12:17 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 17:17:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 17:17:20 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 17:22:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 17:22:18 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 27 17:27:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 17:27:21 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 27 17:32:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 17:32:19 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 17:33:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 75075323-e215-772e-2f5a-cb5ed5087a2d (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c99eba7a400, cur 1551317605 expire 1551317455 last 1551317378 Feb 27 17:33:25 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 27 17:33:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 75075323-e215-772e-2f5a-cb5ed5087a2d (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cab12688400, cur 1551317606 expire 1551317456 last 1551317379 Feb 27 17:37:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 17:37:22 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 17:42:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 17:42:20 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Feb 27 17:47:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 17:47:23 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 27 17:52:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 17:52:21 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 27 17:57:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 17:57:24 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 17:57:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client a2ccc640-0142-52e3-e3a8-9859992ac8f9 (at 10.9.102.15@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cc34ea04800, cur 1551319050 expire 1551318900 last 1551318823 Feb 27 17:57:30 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 27 18:02:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 0adc7979-82ee-6805-7daa-e6743e532923 (at 10.8.6.36@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb293316000, cur 1551319322 expire 1551319172 last 1551319095 Feb 27 18:02:02 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Feb 27 18:02:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to 85b46e79-68b6-aa40-1b7b-41cfd6897156 (at 10.9.101.14@o2ib4) Feb 27 18:02:23 fir-md1-s1 kernel: Lustre: Skipped 8 previous similar messages Feb 27 18:07:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 18:07:25 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Feb 27 18:08:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 0098b3ca-9c64-c96b-0dc4-0d4b5b3a3268 (at 10.8.4.33@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cc4b9e6fc00, cur 1551319710 expire 1551319560 last 1551319483 Feb 27 18:08:30 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 27 18:09:09 fir-md1-s1 kernel: EXT4-fs (sdk2): error count since last fsck: 5 Feb 27 18:09:09 fir-md1-s1 kernel: EXT4-fs (sdk2): initial error at time 1550022155: ext4_mb_generate_buddy:757 Feb 27 18:09:09 fir-md1-s1 kernel: EXT4-fs (sdk2): last error at time 1550448029: ext4_mb_generate_buddy:757 Feb 27 18:13:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 27 18:13:34 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Feb 27 18:17:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 18:17:26 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 27 18:18:41 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 91f82685-8c0d-ca47-2535-2fe64ad08eab (at 10.9.107.34@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb78d2fd400, cur 1551320321 expire 1551320171 last 1551320094 Feb 27 18:18:41 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Feb 27 18:23:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 27 18:23:35 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 27 18:27:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 18:27:27 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 27 18:33:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 27 18:33:37 fir-md1-s1 kernel: Lustre: Skipped 58 previous similar messages Feb 27 18:37:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 18:37:28 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 27 18:43:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 27 18:43:38 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Feb 27 18:47:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 18:47:29 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 27 18:53:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 27 18:53:39 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 27 18:57:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 18:57:30 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 27 18:59:41 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client bc731d0c-feae-61bf-7ef5-05fbfddd169e (at 10.8.9.8@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca1cfbc8400, cur 1551322781 expire 1551322631 last 1551322554 Feb 27 18:59:41 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 27 19:03:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 27 19:03:40 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Feb 27 19:07:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 19:07:32 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 27 19:13:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 27 19:13:41 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 27 19:17:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 19:17:33 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 27 19:23:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 27 19:23:42 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 27 19:27:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 19:27:34 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 27 19:33:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 27 19:33:43 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 27 19:37:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 19:37:35 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 27 19:43:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 27 19:43:44 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 27 19:47:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 19:47:36 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 27 19:53:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 27 19:53:45 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 27 19:57:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 19:57:37 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 27 20:03:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 27 20:03:46 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 27 20:07:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 20:07:38 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 27 20:13:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 27 20:13:47 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 27 20:17:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 20:17:39 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 27 20:23:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 27 20:23:48 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 27 20:27:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 20:27:40 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 27 20:33:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 27 20:33:49 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 27 20:37:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 20:37:41 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 27 20:43:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 27 20:43:51 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 27 20:47:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 20:47:42 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 27 20:53:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 27 20:53:52 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 27 20:57:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 20:57:43 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 27 21:03:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client eb2d653c-5f78-e201-f039-efda162bfa3d (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb44dbea800, cur 1551330219 expire 1551330069 last 1551329992 Feb 27 21:03:39 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 27 21:03:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 27 21:03:53 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 27 21:07:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 21:07:44 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 27 21:13:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 27 21:13:54 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Feb 27 21:15:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 0ee32720-6faf-1dc2-9639-7c8c0223fc6f (at 10.8.11.10@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c998168f800, cur 1551330943 expire 1551330793 last 1551330716 Feb 27 21:15:43 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 27 21:16:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 0ee32720-6faf-1dc2-9639-7c8c0223fc6f (at 10.8.11.10@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb1f6b07000, cur 1551330960 expire 1551330810 last 1551330733 Feb 27 21:16:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 686a9563-5d4d-7a20-2915-9dc469e71dde (at 10.9.113.7@o2ib4) in 223 seconds. I think it's dead, and I am evicting it. exp ffff9cc2053d3c00, cur 1551331019 expire 1551330869 last 1551330796 Feb 27 21:16:59 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Feb 27 21:17:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 21:17:45 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 27 21:23:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 27 21:23:55 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 27 21:23:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 27 21:23:55 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 27 21:27:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 21:27:47 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 27 21:33:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 27 21:33:56 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Feb 27 21:37:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 21:37:48 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 27 21:43:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 27 21:43:57 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Feb 27 21:47:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 21:47:49 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 27 21:53:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 27 21:53:58 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 27 21:55:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 9211fe35-0c9d-5fa3-d53b-1793ba0106aa (at 10.8.11.10@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb6d3366400, cur 1551333341 expire 1551333191 last 1551333114 Feb 27 21:57:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 21:57:50 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 27 22:03:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 27 22:03:59 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Feb 27 22:07:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 22:07:51 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 27 22:14:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 27 22:14:00 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 27 22:17:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 22:17:52 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 27 22:24:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 27 22:24:01 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 27 22:27:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 22:27:53 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 27 22:34:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 27 22:34:02 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 27 22:37:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 22:37:54 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 27 22:44:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 27 22:44:03 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 27 22:47:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 22:47:55 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 27 22:50:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 0a1d480b-55e3-0f6f-5c5a-b5feda8b310f (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cadb7a96c00, cur 1551336645 expire 1551336495 last 1551336418 Feb 27 22:50:45 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 27 22:50:56 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client c0c3c25c-0465-1651-f0d7-61917a328d17 (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cbd1c2c2800, cur 1551336656 expire 1551336506 last 1551336429 Feb 27 22:50:56 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 27 22:54:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 27 22:54:04 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 27 22:57:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 22:57:56 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 27 23:02:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client ef6e940b-980e-1ede-0e9d-fe0e36937f1c (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb6c5f85800, cur 1551337360 expire 1551337210 last 1551337133 Feb 27 23:04:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 27 23:04:06 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Feb 27 23:06:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 789dd17b-09a7-4efa-2265-a78b0f7e8a03 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb4607d1000, cur 1551337597 expire 1551337447 last 1551337370 Feb 27 23:06:37 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 27 23:06:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 789dd17b-09a7-4efa-2265-a78b0f7e8a03 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb974628c00, cur 1551337609 expire 1551337459 last 1551337382 Feb 27 23:06:49 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 27 23:07:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 23:07:57 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 27 23:14:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 27 23:14:07 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 27 23:17:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 23:17:58 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 27 23:24:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 27 23:24:08 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 27 23:27:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 23:27:59 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 27 23:34:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 27 23:34:09 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 27 23:38:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 23:38:01 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 27 23:44:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 27 23:44:10 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 27 23:47:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 69b34652-c357-7e9a-4564-7fad9c281069 (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ccb4aa8d000, cur 1551340020 expire 1551339870 last 1551339793 Feb 27 23:48:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 23:48:02 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 27 23:54:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 27 23:54:11 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Feb 27 23:58:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 27 23:58:03 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 00:04:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 00:04:12 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 00:08:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 00:08:04 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 00:14:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 00:14:13 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 00:18:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 00:18:05 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 00:24:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 00:24:14 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 00:28:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 00:28:06 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 00:34:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 00:34:15 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 00:35:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client fd0ec97a-cf23-ffd3-7938-958f490cce96 (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cbf99b4d800, cur 1551342928 expire 1551342778 last 1551342701 Feb 28 00:35:28 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 28 00:38:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 00:38:07 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 00:42:50 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client a240c881-ef1a-99ee-2e79-8ae099644fe3 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cc1946ee400, cur 1551343370 expire 1551343220 last 1551343143 Feb 28 00:42:50 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 28 00:43:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 8aef762d-9b56-cd87-8e3a-ac3fe7c0c7e4 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cbe17789c00, cur 1551343382 expire 1551343232 last 1551343155 Feb 28 00:43:02 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 00:44:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 00:44:16 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 28 00:46:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client a83c90fa-9af4-f8a9-25e0-2ee34d4d33e5 (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cc4a9e8f800, cur 1551343568 expire 1551343418 last 1551343341 Feb 28 00:48:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 00:48:08 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 00:54:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 00:54:17 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Feb 28 00:58:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 00:58:09 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 01:03:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client a415e366-7aa0-e275-36c5-cc6ca510eea6 (at 10.9.113.6@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca386d74000, cur 1551344635 expire 1551344485 last 1551344408 Feb 28 01:03:55 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 28 01:04:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 01:04:18 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 01:08:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 01:08:10 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 01:14:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 01:14:20 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 01:18:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 01:18:11 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 01:24:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 01:24:21 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 01:28:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 01:28:12 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 01:34:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 01:34:22 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Feb 28 01:38:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 01:38:13 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 01:44:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 01:44:23 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 01:48:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 01:48:14 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 01:54:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 01:54:24 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 01:58:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 01:58:16 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 02:04:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 02:04:25 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 02:08:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 02:08:17 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 02:14:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 02:14:26 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 02:18:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 02:18:18 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 02:24:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 02:24:27 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 02:28:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 02:28:19 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 02:34:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 02:34:28 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 02:38:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 02:38:20 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 02:44:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 02:44:29 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 02:48:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 02:48:21 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 02:54:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 02:54:30 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 02:58:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 02:58:22 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 03:04:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 03:04:31 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 03:08:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 03:08:23 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 03:14:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 03:14:32 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 03:18:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 03:18:24 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 03:24:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 03:24:33 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 03:28:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 03:28:25 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 03:34:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 03:34:35 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 03:38:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 03:38:26 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 03:44:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 03:44:36 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 03:48:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 03:48:27 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 03:54:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 03:54:37 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 03:58:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 03:58:28 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 03:59:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 8b467f67-701e-6281-4801-dec7e28f7c79 (at 10.9.104.11@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c9fef068400, cur 1551355174 expire 1551355024 last 1551354947 Feb 28 03:59:34 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 28 04:04:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 04:04:38 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Feb 28 04:08:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 04:08:29 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 04:14:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 04:14:39 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 04:18:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 04:18:31 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 04:24:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 04:24:40 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 04:28:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 04:28:32 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 04:34:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 04:34:41 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 04:38:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 04:38:33 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 04:44:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 04:44:42 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 04:48:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 04:48:34 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 04:54:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 04:54:43 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 04:58:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 04:58:35 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 05:04:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 05:04:44 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 05:08:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 05:08:36 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 05:14:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 05:14:45 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 05:18:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 05:18:37 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 05:24:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 05:24:46 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 05:28:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 05:28:38 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 05:29:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 3879140b-0065-9855-1189-63f86dc8c822 (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cd0f2ed7c00, cur 1551360590 expire 1551360440 last 1551360363 Feb 28 05:29:50 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 28 05:34:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 05:34:47 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Feb 28 05:38:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 05:38:39 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 05:41:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client c038235a-cde9-3cf1-6cb0-faf4262482f4 (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cba48e15800, cur 1551361294 expire 1551361144 last 1551361067 Feb 28 05:41:34 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 28 05:44:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 05:44:49 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Feb 28 05:48:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 05:48:40 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 05:54:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 05:54:50 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 05:58:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 05:58:41 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 06:04:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 06:04:51 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 06:07:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 1c2ad138-1eff-d341-1c94-9fdd6e306d9b (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cd0d8a4e000, cur 1551362863 expire 1551362713 last 1551362636 Feb 28 06:07:43 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 28 06:08:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 06:08:42 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 06:14:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 4c04773e-96c1-f080-44fd-628a8575903e (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb9eeee3800, cur 1551363281 expire 1551363131 last 1551363054 Feb 28 06:14:41 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 28 06:14:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 06:14:52 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Feb 28 06:18:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 06:18:43 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 06:23:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client ac4380c7-7e83-5a11-7c23-071937dd99e4 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cafcc733400, cur 1551363798 expire 1551363648 last 1551363571 Feb 28 06:23:18 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 28 06:24:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 06:24:53 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 28 06:28:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 06:28:44 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 06:34:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 06:34:54 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 06:38:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 06:38:46 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 06:44:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 06:44:55 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 06:48:09 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client ef9aaf92-f53c-694e-3d95-8f74472277a4 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cab044de800, cur 1551365289 expire 1551365139 last 1551365062 Feb 28 06:48:09 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 28 06:48:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 06:48:47 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 06:54:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 06:54:56 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Feb 28 06:58:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 06:58:48 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 07:04:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 7b8d0b0b-ae9b-5ddd-4c6c-79308e8c2b7a (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca45d72b400, cur 1551366240 expire 1551366090 last 1551366013 Feb 28 07:04:00 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 28 07:04:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 07:04:57 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Feb 28 07:08:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 07:08:49 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 07:10:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client a6a8a0bd-48a0-c61a-b31f-ff9d3b3455d3 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb41bf40400, cur 1551366612 expire 1551366462 last 1551366385 Feb 28 07:10:12 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 28 07:14:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 07:14:58 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Feb 28 07:16:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client bd5ffe7d-13d0-5670-bb96-f3c034eb281f (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca781f3c400, cur 1551367016 expire 1551366866 last 1551366789 Feb 28 07:16:56 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 28 07:18:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 07:18:50 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 07:22:55 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 9914b636-7eb4-b7d7-6978-3fbe68e49ac4 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb04942a400, cur 1551367375 expire 1551367225 last 1551367148 Feb 28 07:22:55 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 28 07:24:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 07:24:59 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 28 07:28:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 07:28:51 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 07:28:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client aefa195c-5342-2e13-5254-dbf8c9b8e6bb (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca0e979a400, cur 1551367736 expire 1551367586 last 1551367509 Feb 28 07:28:56 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 28 07:34:48 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 145c0f2d-c289-795e-d5b0-42c04c065a34 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cbb24bb4c00, cur 1551368088 expire 1551367938 last 1551367861 Feb 28 07:34:48 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 28 07:35:00 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 07:35:00 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 28 07:38:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 07:38:52 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 07:40:20 fir-md1-s1 kernel: LustreError: 50701:0:(mdt_io.c:442:mdt_preprw_write()) fir-MDT0002: BRW to missing obj [0x2c0007588:0xa51d:0x0] Feb 28 07:45:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 07:45:01 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 07:48:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 07:48:53 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 07:50:27 fir-md1-s1 kernel: LustreError: 50809:0:(mdt_io.c:442:mdt_preprw_write()) fir-MDT0002: BRW to missing obj [0x2c0007588:0xab5c:0x0] Feb 28 07:55:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 07:55:03 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 07:58:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 07:58:54 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 08:05:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 08:05:04 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 08:08:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 08:08:55 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 08:15:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 08:15:05 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 08:18:56 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 08:18:56 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 08:25:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 08:25:06 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 08:28:57 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 08:28:57 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 08:35:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 08:35:07 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 08:36:46 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 384564b1-f4b2-b616-630e-bb7cb3901fc3 (at 10.8.13.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9caa5c395400, cur 1551371806 expire 1551371656 last 1551371579 Feb 28 08:36:46 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 28 08:38:58 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 08:38:58 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 08:45:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 08:45:08 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 08:48:59 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 08:48:59 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 08:52:01 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 809366d4-4623-4b23-12da-b85dfc5e4a98 (at 10.8.9.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cc8f72df800, cur 1551372721 expire 1551372571 last 1551372494 Feb 28 08:52:01 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 28 08:52:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client b88b5f9c-341a-6922-1f2d-90bc070377cb (at 10.8.9.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cbb6fb2dc00, cur 1551372730 expire 1551372580 last 1551372503 Feb 28 08:52:10 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 08:55:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 08:55:09 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Feb 28 08:59:01 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 08:59:01 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 09:05:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 09:05:10 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 09:09:02 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 09:09:02 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 09:15:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 09:15:11 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 09:19:03 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 09:19:03 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 09:25:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 09:25:12 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 09:29:04 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 09:29:04 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 09:35:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 09:35:13 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 09:39:05 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 09:39:05 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 09:45:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 09:45:14 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 09:49:06 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 09:49:06 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 09:55:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 09:55:15 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 09:56:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client b72d6051-cdad-cb35-92ee-c51c9ba641cc (at 10.9.104.4@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9caa9d6e8000, cur 1551376575 expire 1551376425 last 1551376348 Feb 28 09:59:07 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 09:59:07 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 10:05:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 10:05:16 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 10:09:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 10:09:08 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 10:15:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 10:15:18 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 10:19:09 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 10:19:09 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 10:25:19 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.104.1@o2ib4) Feb 28 10:25:19 fir-md1-s1 kernel: Lustre: Skipped 16 previous similar messages Feb 28 10:28:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 2d486816-0fee-3e02-cb25-89549e06c293 (at 10.9.104.52@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9caf05274800, cur 1551378524 expire 1551378374 last 1551378297 Feb 28 10:28:44 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Feb 28 10:29:10 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 10:29:10 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 10:30:00 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 4e4aa96f-0fb9-1802-c39f-f7dc18ae372f (at 10.9.106.61@o2ib4) in 223 seconds. I think it's dead, and I am evicting it. exp ffff9cc214313800, cur 1551378600 expire 1551378450 last 1551378377 Feb 28 10:30:00 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Feb 28 10:35:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 10:35:20 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Feb 28 10:39:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 10:39:11 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 10:41:25 fir-md1-s1 kernel: Lustre: 22159:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Feb 28 10:45:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 10:45:21 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 10:47:45 fir-md1-s1 kernel: Lustre: 21883:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0002: Failure to clear the changelog for user 1: -22 Feb 28 10:49:12 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 10:49:12 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 10:55:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 10:55:22 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Feb 28 10:59:13 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 10:59:13 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 11:05:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 11:05:23 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Feb 28 11:09:15 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 11:09:15 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 11:15:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 11:15:24 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 11:19:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 11:19:16 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 11:25:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 11:25:25 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 11:29:17 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 11:29:17 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 11:35:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 11:35:26 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 11:39:18 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 11:39:18 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 11:44:08 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 3e1bfda9-cfe9-bdcb-ad30-df8e6dbb01f9 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cabb3591400, cur 1551383048 expire 1551382898 last 1551382821 Feb 28 11:44:11 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 3e1bfda9-cfe9-bdcb-ad30-df8e6dbb01f9 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cc483620400, cur 1551383051 expire 1551382901 last 1551382824 Feb 28 11:45:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 11:45:27 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 11:49:19 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 11:49:19 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 11:52:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 3e426987-a264-d1c6-f8da-8ba52e379c08 (at 10.8.27.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb4611e2000, cur 1551383528 expire 1551383378 last 1551383301 Feb 28 11:52:08 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 11:55:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 11:55:28 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Feb 28 11:59:20 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 11:59:20 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 12:02:53 fir-md1-s1 kernel: Lustre: 22175:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551384166/real 1551384166] req@ffff9ca843a4e000 x1625960620359552/t0(0) o104->fir-MDT0000@10.9.107.66@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1551384173 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Feb 28 12:02:53 fir-md1-s1 kernel: Lustre: 22175:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 6 previous similar messages Feb 28 12:03:07 fir-md1-s1 kernel: Lustre: 22175:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551384180/real 1551384180] req@ffff9ca843a4e000 x1625960620359552/t0(0) o104->fir-MDT0000@10.9.107.66@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1551384187 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 28 12:03:07 fir-md1-s1 kernel: Lustre: 22175:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 7 previous similar messages Feb 28 12:03:09 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client a6f7aa0c-5066-376a-a5cc-37cdddd1fe39 (at 10.9.106.22@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cd1967a3c00, cur 1551384189 expire 1551384039 last 1551383962 Feb 28 12:03:09 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 28 12:05:29 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 12:05:29 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 12:09:21 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 12:09:21 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 12:15:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 12:15:30 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 12:19:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 12:19:22 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 12:25:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 12:25:32 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Feb 28 12:29:23 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 12:29:23 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 12:32:48 fir-md1-s1 kernel: Lustre: 22189:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551385961/real 1551385961] req@ffff9cabfca42d00 x1625960625327504/t0(0) o104->fir-MDT0000@10.9.106.35@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1551385968 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Feb 28 12:32:48 fir-md1-s1 kernel: Lustre: 22189:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages Feb 28 12:32:55 fir-md1-s1 kernel: Lustre: 22189:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551385968/real 1551385968] req@ffff9cabfca42d00 x1625960625327504/t0(0) o104->fir-MDT0000@10.9.106.35@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1551385975 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 28 12:33:02 fir-md1-s1 kernel: Lustre: 22189:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551385975/real 1551385975] req@ffff9cabfca42d00 x1625960625327504/t0(0) o104->fir-MDT0000@10.9.106.35@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1551385982 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 28 12:33:16 fir-md1-s1 kernel: Lustre: 22189:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551385989/real 1551385989] req@ffff9cabfca42d00 x1625960625327504/t0(0) o104->fir-MDT0000@10.9.106.35@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1551385996 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 28 12:33:16 fir-md1-s1 kernel: Lustre: 22189:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Feb 28 12:33:37 fir-md1-s1 kernel: Lustre: 22189:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551386010/real 1551386010] req@ffff9cabfca42d00 x1625960625327504/t0(0) o104->fir-MDT0000@10.9.106.35@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1551386017 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 28 12:33:37 fir-md1-s1 kernel: Lustre: 22189:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Feb 28 12:34:19 fir-md1-s1 kernel: Lustre: 22189:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551386052/real 1551386052] req@ffff9cabfca42d00 x1625960625327504/t0(0) o104->fir-MDT0000@10.9.106.35@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1551386059 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 28 12:34:19 fir-md1-s1 kernel: Lustre: 22189:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages Feb 28 12:35:15 fir-md1-s1 kernel: LustreError: 22189:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.9.106.35@o2ib4) failed to reply to blocking AST (req@ffff9cabfca42d00 x1625960625327504 status 0 rc -110), evict it ns: mdt-fir-MDT0000_UUID lock: ffff9cc19df0f740/0xb7044c65ace6372d lrc: 4/0,0 mode: PR/PR res: [0x2000068f5:0x48:0x0].0x0 bits 0x1b/0x0 rrc: 10 type: IBT flags: 0x60200400000020 nid: 10.9.106.35@o2ib4 remote: 0xa225731b063a6a51 expref: 171 pid: 22214 timeout: 1371237 lvb_type: 0 Feb 28 12:35:15 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0000: A client on nid 10.9.106.35@o2ib4 was evicted due to a lock blocking callback time out: rc -110 Feb 28 12:35:15 fir-md1-s1 kernel: LustreError: 129189:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 154s: evicting client at 10.9.106.35@o2ib4 ns: mdt-fir-MDT0000_UUID lock: ffff9cc19df0f740/0xb7044c65ace6372d lrc: 3/0,0 mode: PR/PR res: [0x2000068f5:0x48:0x0].0x0 bits 0x1b/0x0 rrc: 10 type: IBT flags: 0x60200400000020 nid: 10.9.106.35@o2ib4 remote: 0xa225731b063a6a51 expref: 172 pid: 22214 timeout: 0 lvb_type: 0 Feb 28 12:35:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 12:35:33 fir-md1-s1 kernel: Lustre: Skipped 25 previous similar messages Feb 28 12:35:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 36d97814-a44e-773c-9e61-e3a9072c045f (at 10.9.106.56@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9caad9a7cc00, cur 1551386140 expire 1551385990 last 1551385913 Feb 28 12:35:40 fir-md1-s1 kernel: Lustre: Skipped 26 previous similar messages Feb 28 12:35:49 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 8d3e304f-0a26-5080-2ece-6e1b7cec7ea5 (at 10.9.106.35@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cbc846d0400, cur 1551386149 expire 1551385999 last 1551385922 Feb 28 12:35:49 fir-md1-s1 kernel: Lustre: Skipped 86 previous similar messages Feb 28 12:39:24 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 12:39:24 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 12:45:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 12:45:34 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 12:49:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 12:49:25 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 12:55:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 12:55:35 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 12:59:26 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 12:59:26 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 13:05:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 13:05:36 fir-md1-s1 kernel: Lustre: Skipped 106 previous similar messages Feb 28 13:09:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 13:09:27 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 13:15:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 13:15:37 fir-md1-s1 kernel: Lustre: Skipped 28 previous similar messages Feb 28 13:19:28 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 13:19:28 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 13:25:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 13:25:38 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 13:29:30 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 13:29:30 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 13:35:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 13:35:39 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 13:39:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 13:39:31 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 13:45:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 13:45:40 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 13:49:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 13:49:32 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 13:55:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 13:55:41 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 13:59:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 13:59:33 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 14:05:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 14:05:42 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 14:09:34 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 14:09:34 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 14:15:43 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 14:15:43 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 14:19:35 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 14:19:35 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 14:25:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 14:25:44 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Feb 28 14:29:36 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 14:29:36 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 14:35:45 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 14:35:45 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 14:39:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 14:39:37 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 14:40:10 fir-md1-s1 kernel: LNet: Service thread pid 51459 was inactive for 200.20s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Feb 28 14:40:10 fir-md1-s1 kernel: Pid: 51459, comm: mdt03_052 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Feb 28 14:40:10 fir-md1-s1 kernel: Call Trace: Feb 28 14:40:10 fir-md1-s1 kernel: [] ldlm_completion_ast+0x5b1/0x920 [ptlrpc] Feb 28 14:40:10 fir-md1-s1 kernel: [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] Feb 28 14:40:10 fir-md1-s1 kernel: [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] Feb 28 14:40:10 fir-md1-s1 kernel: [] osp_md_object_lock+0x162/0x2d0 [osp] Feb 28 14:40:10 fir-md1-s1 kernel: [] lod_object_lock+0xf3/0x7b0 [lod] Feb 28 14:40:10 fir-md1-s1 kernel: [] mdd_object_lock+0x3e/0xe0 [mdd] Feb 28 14:40:10 fir-md1-s1 kernel: [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] Feb 28 14:40:10 fir-md1-s1 kernel: [] mdt_remote_object_lock+0x2a/0x30 [mdt] Feb 28 14:40:10 fir-md1-s1 kernel: [] mdt_reint_rename_or_migrate.isra.51+0x362/0x860 [mdt] Feb 28 14:40:10 fir-md1-s1 kernel: [] mdt_reint_rename+0x13/0x20 [mdt] Feb 28 14:40:10 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Feb 28 14:40:10 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Feb 28 14:40:10 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Feb 28 14:40:10 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Feb 28 14:40:10 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Feb 28 14:40:10 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Feb 28 14:40:10 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Feb 28 14:40:10 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Feb 28 14:40:10 fir-md1-s1 kernel: [] 0xffffffffffffffff Feb 28 14:40:10 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551393610.51459 Feb 28 14:40:11 fir-md1-s1 kernel: LNet: Service thread pid 22249 was inactive for 200.95s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Feb 28 14:40:11 fir-md1-s1 kernel: Pid: 22249, comm: mdt01_094 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Feb 28 14:40:11 fir-md1-s1 kernel: Call Trace: Feb 28 14:40:11 fir-md1-s1 kernel: [] ldlm_completion_ast+0x5b1/0x920 [ptlrpc] Feb 28 14:40:11 fir-md1-s1 kernel: [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] Feb 28 14:40:11 fir-md1-s1 kernel: [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] Feb 28 14:40:11 fir-md1-s1 kernel: [] osp_md_object_lock+0x162/0x2d0 [osp] Feb 28 14:40:11 fir-md1-s1 kernel: [] lod_object_lock+0xf3/0x7b0 [lod] Feb 28 14:40:11 fir-md1-s1 kernel: [] mdd_object_lock+0x3e/0xe0 [mdd] Feb 28 14:40:11 fir-md1-s1 kernel: [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] Feb 28 14:40:11 fir-md1-s1 kernel: [] mdt_remote_object_lock+0x2a/0x30 [mdt] Feb 28 14:40:11 fir-md1-s1 kernel: [] mdt_reint_rename_or_migrate.isra.51+0x362/0x860 [mdt] Feb 28 14:40:11 fir-md1-s1 kernel: [] mdt_reint_rename+0x13/0x20 [mdt] Feb 28 14:40:11 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Feb 28 14:40:11 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Feb 28 14:40:11 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Feb 28 14:40:11 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Feb 28 14:40:11 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Feb 28 14:40:11 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Feb 28 14:40:11 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Feb 28 14:40:11 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Feb 28 14:40:11 fir-md1-s1 kernel: [] 0xffffffffffffffff Feb 28 14:40:11 fir-md1-s1 kernel: Pid: 22251, comm: mdt00_066 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Feb 28 14:40:11 fir-md1-s1 kernel: Call Trace: Feb 28 14:40:11 fir-md1-s1 kernel: [] ldlm_completion_ast+0x5b1/0x920 [ptlrpc] Feb 28 14:40:11 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Feb 28 14:40:11 fir-md1-s1 kernel: [] mdt_reint_rename_or_migrate.isra.51+0x67a/0x860 [mdt] Feb 28 14:40:11 fir-md1-s1 kernel: [] mdt_reint_rename+0x13/0x20 [mdt] Feb 28 14:40:11 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Feb 28 14:40:11 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Feb 28 14:40:11 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Feb 28 14:40:11 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Feb 28 14:40:11 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Feb 28 14:40:11 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Feb 28 14:40:11 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Feb 28 14:40:11 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Feb 28 14:40:11 fir-md1-s1 kernel: [] 0xffffffffffffffff Feb 28 14:40:11 fir-md1-s1 kernel: Pid: 48743, comm: mdt03_054 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Feb 28 14:40:11 fir-md1-s1 kernel: Call Trace: Feb 28 14:40:11 fir-md1-s1 kernel: [] ldlm_completion_ast+0x5b1/0x920 [ptlrpc] Feb 28 14:40:11 fir-md1-s1 kernel: [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] Feb 28 14:40:11 fir-md1-s1 kernel: [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] Feb 28 14:40:11 fir-md1-s1 kernel: [] osp_md_object_lock+0x162/0x2d0 [osp] Feb 28 14:40:11 fir-md1-s1 kernel: [] lod_object_lock+0xf3/0x7b0 [lod] Feb 28 14:40:11 fir-md1-s1 kernel: [] mdd_object_lock+0x3e/0xe0 [mdd] Feb 28 14:40:11 fir-md1-s1 kernel: [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] Feb 28 14:40:11 fir-md1-s1 kernel: [] mdt_remote_object_lock+0x2a/0x30 [mdt] Feb 28 14:40:11 fir-md1-s1 kernel: [] mdt_reint_rename_or_migrate.isra.51+0x362/0x860 [mdt] Feb 28 14:40:11 fir-md1-s1 kernel: [] mdt_reint_rename+0x13/0x20 [mdt] Feb 28 14:40:11 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Feb 28 14:40:11 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Feb 28 14:40:11 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Feb 28 14:40:11 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Feb 28 14:40:11 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Feb 28 14:40:11 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Feb 28 14:40:11 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Feb 28 14:40:11 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Feb 28 14:40:11 fir-md1-s1 kernel: [] 0xffffffffffffffff Feb 28 14:40:11 fir-md1-s1 kernel: Pid: 21983, comm: mdt00_032 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Feb 28 14:40:11 fir-md1-s1 kernel: Call Trace: Feb 28 14:40:11 fir-md1-s1 kernel: [] ldlm_completion_ast+0x5b1/0x920 [ptlrpc] Feb 28 14:40:11 fir-md1-s1 kernel: [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] Feb 28 14:40:11 fir-md1-s1 kernel: [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] Feb 28 14:40:11 fir-md1-s1 kernel: [] osp_md_object_lock+0x162/0x2d0 [osp] Feb 28 14:40:11 fir-md1-s1 kernel: [] lod_object_lock+0xf3/0x7b0 [lod] Feb 28 14:40:12 fir-md1-s1 kernel: [] mdd_object_lock+0x3e/0xe0 [mdd] Feb 28 14:40:12 fir-md1-s1 kernel: [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] Feb 28 14:40:12 fir-md1-s1 kernel: [] mdt_remote_object_lock+0x2a/0x30 [mdt] Feb 28 14:40:12 fir-md1-s1 kernel: [] mdt_reint_rename_or_migrate.isra.51+0x362/0x860 [mdt] Feb 28 14:40:12 fir-md1-s1 kernel: [] mdt_reint_rename+0x13/0x20 [mdt] Feb 28 14:40:12 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Feb 28 14:40:12 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Feb 28 14:40:12 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Feb 28 14:40:12 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Feb 28 14:40:12 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Feb 28 14:40:12 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Feb 28 14:40:12 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Feb 28 14:40:12 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Feb 28 14:40:12 fir-md1-s1 kernel: [] 0xffffffffffffffff Feb 28 14:40:12 fir-md1-s1 kernel: LNet: Service thread pid 51417 was inactive for 202.32s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Feb 28 14:40:12 fir-md1-s1 kernel: LNet: Skipped 260 previous similar messages Feb 28 14:40:13 fir-md1-s1 kernel: LNet: Service thread pid 21548 was inactive for 200.74s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Feb 28 14:40:13 fir-md1-s1 kernel: LNet: Skipped 2 previous similar messages Feb 28 14:40:13 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551393613.21548 Feb 28 14:40:18 fir-md1-s1 kernel: LNet: Service thread pid 22142 was inactive for 200.66s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Feb 28 14:40:18 fir-md1-s1 kernel: LNet: Skipped 1 previous similar message Feb 28 14:40:18 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551393618.22142 Feb 28 14:40:19 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551393619.21796 Feb 28 14:40:25 fir-md1-s1 kernel: LNet: Service thread pid 21546 was inactive for 200.47s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Feb 28 14:40:25 fir-md1-s1 kernel: LNet: Skipped 1 previous similar message Feb 28 14:40:25 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551393625.21546 Feb 28 14:40:32 fir-md1-s1 kernel: LNet: Service thread pid 21929 was inactive for 200.42s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Feb 28 14:40:32 fir-md1-s1 kernel: LNet: Skipped 1 previous similar message Feb 28 14:40:32 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551393632.21929 Feb 28 14:40:35 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551393635.51414 Feb 28 14:40:42 fir-md1-s1 kernel: LNet: Service thread pid 22233 was inactive for 200.32s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Feb 28 14:40:42 fir-md1-s1 kernel: LNet: Skipped 2 previous similar messages Feb 28 14:40:42 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551393642.22233 Feb 28 14:40:47 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551393647.22214 Feb 28 14:40:49 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551393649.22137 Feb 28 14:41:42 fir-md1-s1 kernel: LNet: Service thread pid 48757 was inactive for 212.00s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Feb 28 14:41:42 fir-md1-s1 kernel: LNet: Skipped 3 previous similar messages Feb 28 14:41:42 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551393702.48757 Feb 28 14:41:45 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551393705.22268 Feb 28 14:41:49 fir-md1-s1 kernel: Lustre: fir-MDT0000-osp-MDT0002: Connection to fir-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete Feb 28 14:41:49 fir-md1-s1 kernel: LustreError: 51417:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551393409, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0002_UUID lock: ffff9cca6d62af40/0xb7044c65c10730b7 lrc: 3/1,0 mode: --/PR res: [0x2c000744a:0x18884:0x0].0x0 bits 0x13/0x8 rrc: 5 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 51417 timeout: 0 lvb_type: 0 Feb 28 14:41:49 fir-md1-s1 kernel: LustreError: 51459:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551393409, 300s ago), entering recovery for fir-MDT0000_UUID@10.0.10.51@o2ib7 ns: fir-MDT0000-osp-MDT0002 lock: ffff9cd30ba28480/0xb7044c65c1072fb4 lrc: 4/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 27 type: IBT flags: 0x1000001000000 nid: local remote: 0xb7044c65c1072fd0 expref: -99 pid: 51459 timeout: 0 lvb_type: 0 Feb 28 14:41:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: Received new LWP connection from 0@lo, removing former export from same NID Feb 28 14:41:50 fir-md1-s1 kernel: LustreError: 22249:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551393410, 300s ago), entering recovery for fir-MDT0000_UUID@10.0.10.51@o2ib7 ns: fir-MDT0000-osp-MDT0002 lock: ffff9ca658e960c0/0xb7044c65c108389c lrc: 4/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 27 type: IBT flags: 0x1000001000000 nid: local remote: 0xb7044c65c10838a3 expref: -99 pid: 22249 timeout: 0 lvb_type: 0 Feb 28 14:41:50 fir-md1-s1 kernel: LustreError: 22249:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 2 previous similar messages Feb 28 14:41:53 fir-md1-s1 kernel: LustreError: 51445:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551393413, 300s ago), entering recovery for fir-MDT0000_UUID@10.0.10.51@o2ib7 ns: fir-MDT0000-osp-MDT0002 lock: ffff9cd30ba2c380/0xb7044c65c10b62a4 lrc: 4/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 27 type: IBT flags: 0x1000001000000 nid: local remote: 0xb7044c65c10b62ab expref: -99 pid: 51445 timeout: 0 lvb_type: 0 Feb 28 14:41:57 fir-md1-s1 kernel: LustreError: 22142:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551393417, 300s ago), entering recovery for fir-MDT0000_UUID@10.0.10.51@o2ib7 ns: fir-MDT0000-osp-MDT0002 lock: ffff9cad777c8d80/0xb7044c65c11120f8 lrc: 4/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 27 type: IBT flags: 0x1000001000000 nid: local remote: 0xb7044c65c1112114 expref: -99 pid: 22142 timeout: 0 lvb_type: 0 Feb 28 14:41:58 fir-md1-s1 kernel: LustreError: 21796:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551393418, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff9cb4f5b08480/0xb7044c65c112779d lrc: 3/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 129 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 21796 timeout: 0 lvb_type: 0 Feb 28 14:41:58 fir-md1-s1 kernel: LustreError: 21796:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 4 previous similar messages Feb 28 14:42:04 fir-md1-s1 kernel: LustreError: 21546:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551393424, 300s ago), entering recovery for fir-MDT0000_UUID@10.0.10.51@o2ib7 ns: fir-MDT0000-osp-MDT0002 lock: ffff9c9e88f9f080/0xb7044c65c11a7d4b lrc: 4/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 27 type: IBT flags: 0x1000001000000 nid: local remote: 0xb7044c65c11a7d52 expref: -99 pid: 21546 timeout: 0 lvb_type: 0 Feb 28 14:42:12 fir-md1-s1 kernel: LustreError: 21974:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551393432, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff9cbd04b3ec00/0xb7044c65c123ed40 lrc: 3/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 129 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 21974 timeout: 0 lvb_type: 0 Feb 28 14:42:21 fir-md1-s1 kernel: LustreError: 22233:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551393441, 300s ago), entering recovery for fir-MDT0000_UUID@10.0.10.51@o2ib7 ns: fir-MDT0000-osp-MDT0002 lock: ffff9cb706b6e0c0/0xb7044c65c13045f8 lrc: 4/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 27 type: IBT flags: 0x1000001000000 nid: local remote: 0xb7044c65c13045ff expref: -99 pid: 22233 timeout: 0 lvb_type: 0 Feb 28 14:42:21 fir-md1-s1 kernel: LustreError: 22233:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 2 previous similar messages Feb 28 14:42:40 fir-md1-s1 kernel: LNet: Service thread pid 22241 was inactive for 236.44s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Feb 28 14:42:40 fir-md1-s1 kernel: LNet: Skipped 1 previous similar message Feb 28 14:42:40 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551393760.22241 Feb 28 14:42:43 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551393763.21826 Feb 28 14:42:57 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551393777.51431 Feb 28 14:43:05 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551393785.22159 Feb 28 14:43:09 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551393789.47900 Feb 28 14:43:10 fir-md1-s1 kernel: LustreError: 48757:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551393490, 300s ago), entering recovery for fir-MDT0000_UUID@10.0.10.51@o2ib7 ns: fir-MDT0000-osp-MDT0002 lock: ffff9c94d081f740/0xb7044c65c16e036c lrc: 4/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 27 type: IBT flags: 0x1000001000000 nid: local remote: 0xb7044c65c16e0373 expref: -99 pid: 48757 timeout: 0 lvb_type: 0 Feb 28 14:43:10 fir-md1-s1 kernel: LustreError: 48757:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 3 previous similar messages Feb 28 14:43:10 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551393790.21878 Feb 28 14:43:42 fir-md1-s1 kernel: LustreError: 22159:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551393522, 300s ago), entering recovery for fir-MDT0000_UUID@10.0.10.51@o2ib7 ns: fir-MDT0000-osp-MDT0002 lock: ffff9c9ed6e157c0/0xb7044c65c197eaf3 lrc: 4/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 28 type: IBT flags: 0x1000001000000 nid: local remote: 0xb7044c65c197eb0f expref: -99 pid: 22159 timeout: 0 lvb_type: 0 Feb 28 14:43:42 fir-md1-s1 kernel: LustreError: 22159:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 2 previous similar messages Feb 28 14:44:27 fir-md1-s1 kernel: LNet: Service thread pid 22157 was inactive for 313.10s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Feb 28 14:44:27 fir-md1-s1 kernel: LNet: Skipped 5 previous similar messages Feb 28 14:44:27 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551393867.22157 Feb 28 14:45:27 fir-md1-s1 kernel: LNet: Service thread pid 21779 was inactive for 362.88s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Feb 28 14:45:27 fir-md1-s1 kernel: LNet: Skipped 3 previous similar messages Feb 28 14:45:27 fir-md1-s1 kernel: Pid: 21779, comm: mdt01_015 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Feb 28 14:45:27 fir-md1-s1 kernel: Call Trace: Feb 28 14:45:27 fir-md1-s1 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Feb 28 14:45:27 fir-md1-s1 kernel: [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] Feb 28 14:45:27 fir-md1-s1 kernel: [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] Feb 28 14:45:27 fir-md1-s1 kernel: [] osp_md_object_lock+0x162/0x2d0 [osp] Feb 28 14:45:27 fir-md1-s1 kernel: [] lod_object_lock+0xf3/0x7b0 [lod] Feb 28 14:45:27 fir-md1-s1 kernel: [] mdd_object_lock+0x3e/0xe0 [mdd] Feb 28 14:45:27 fir-md1-s1 kernel: [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] Feb 28 14:45:27 fir-md1-s1 kernel: [] mdt_remote_object_lock+0x2a/0x30 [mdt] Feb 28 14:45:27 fir-md1-s1 kernel: [] mdt_reint_rename_or_migrate.isra.51+0x362/0x860 [mdt] Feb 28 14:45:27 fir-md1-s1 kernel: [] mdt_reint_rename+0x13/0x20 [mdt] Feb 28 14:45:27 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Feb 28 14:45:27 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Feb 28 14:45:27 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Feb 28 14:45:27 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Feb 28 14:45:27 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Feb 28 14:45:27 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Feb 28 14:45:27 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Feb 28 14:45:27 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Feb 28 14:45:27 fir-md1-s1 kernel: [] 0xffffffffffffffff Feb 28 14:45:27 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551393927.21779 Feb 28 14:45:31 fir-md1-s1 kernel: LNet: Service thread pid 22174 was inactive for 362.49s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Feb 28 14:45:31 fir-md1-s1 kernel: Pid: 22174, comm: mdt00_044 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Feb 28 14:45:31 fir-md1-s1 kernel: Call Trace: Feb 28 14:45:31 fir-md1-s1 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Feb 28 14:45:31 fir-md1-s1 kernel: [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] Feb 28 14:45:31 fir-md1-s1 kernel: [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] Feb 28 14:45:31 fir-md1-s1 kernel: [] osp_md_object_lock+0x162/0x2d0 [osp] Feb 28 14:45:31 fir-md1-s1 kernel: [] lod_object_lock+0xf3/0x7b0 [lod] Feb 28 14:45:31 fir-md1-s1 kernel: [] mdd_object_lock+0x3e/0xe0 [mdd] Feb 28 14:45:31 fir-md1-s1 kernel: [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] Feb 28 14:45:31 fir-md1-s1 kernel: [] mdt_remote_object_lock+0x2a/0x30 [mdt] Feb 28 14:45:31 fir-md1-s1 kernel: [] mdt_reint_rename_or_migrate.isra.51+0x362/0x860 [mdt] Feb 28 14:45:31 fir-md1-s1 kernel: [] mdt_reint_rename+0x13/0x20 [mdt] Feb 28 14:45:31 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Feb 28 14:45:31 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Feb 28 14:45:31 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Feb 28 14:45:31 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Feb 28 14:45:31 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Feb 28 14:45:31 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Feb 28 14:45:31 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Feb 28 14:45:31 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Feb 28 14:45:31 fir-md1-s1 kernel: [] 0xffffffffffffffff Feb 28 14:45:31 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551393931.22174 Feb 28 14:45:42 fir-md1-s1 kernel: LustreError: 21759:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551393642, 300s ago), entering recovery for fir-MDT0000_UUID@10.0.10.51@o2ib7 ns: fir-MDT0000-osp-MDT0002 lock: ffff9ca8061b3600/0xb7044c65c2335c3a lrc: 4/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 28 type: IBT flags: 0x1000001000000 nid: local remote: 0xb7044c65c2335c41 expref: -99 pid: 21759 timeout: 0 lvb_type: 0 Feb 28 14:45:42 fir-md1-s1 kernel: LustreError: 21759:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 7 previous similar messages Feb 28 14:45:45 fir-md1-s1 kernel: LustreError: 22284:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551393645, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff9c9e6b6d2880/0xb7044c65c236a877 lrc: 3/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 138 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 22284 timeout: 0 lvb_type: 0 Feb 28 14:45:45 fir-md1-s1 kernel: LustreError: 22284:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 2 previous similar messages Feb 28 14:45:47 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 14:45:47 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 28 14:46:37 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client a73a38e8-7a0c-0288-aa98-7ef6ed07112b (at 10.8.3.11@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb0fae64800, cur 1551393997 expire 1551393847 last 1551393770 Feb 28 14:46:37 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Feb 28 14:46:44 fir-md1-s1 kernel: Lustre: 21965:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply#012 req@ffff9cad2c3da700 x1626125905700320/t0(0) o101->331d55ae-d9ca-503e-8727-d0c89ef082cb@10.8.27.35@o2ib6:379/0 lens 592/3264 e 24 to 0 dl 1551394009 ref 2 fl Interpret:/0/0 rc 0/0 Feb 28 14:46:48 fir-md1-s1 kernel: Lustre: 21247:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply#012 req@ffff9cceaf25e300 x1626363549477264/t0(0) o36->646618d1-6745-ba9a-c427-133f76bdb12b@10.9.103.20@o2ib4:383/0 lens 536/2888 e 24 to 0 dl 1551394013 ref 2 fl Interpret:/0/0 rc 0/0 Feb 28 14:46:48 fir-md1-s1 kernel: Lustre: 21247:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 8 previous similar messages Feb 28 14:46:59 fir-md1-s1 kernel: Lustre: 21935:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply#012 req@ffff9ca41aa87500 x1626105502343840/t0(0) o36->1fb97d42-6f2c-31eb-ac34-9428db915f5a@10.9.0.61@o2ib4:394/0 lens 520/2888 e 15 to 0 dl 1551394024 ref 2 fl Interpret:/0/0 rc 0/0 Feb 28 14:46:59 fir-md1-s1 kernel: Lustre: 21935:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages Feb 28 14:47:17 fir-md1-s1 kernel: Lustre: 21925:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply#012 req@ffff9cbb76f93f00 x1626389659838608/t0(0) o36->9fcde41d-19ca-aff3-e4b9-6a6b3e517ea5@10.9.108.15@o2ib4:411/0 lens 600/2888 e 21 to 0 dl 1551394041 ref 2 fl Interpret:/0/0 rc 0/0 Feb 28 14:47:17 fir-md1-s1 kernel: Lustre: 21925:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 4 previous similar messages Feb 28 14:47:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: Received new LWP connection from 10.0.10.52@o2ib7, removing former export from same NID Feb 28 14:47:17 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 28 14:47:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: Received new LWP connection from 10.0.10.52@o2ib7, removing former export from same NID Feb 28 14:48:05 fir-md1-s1 kernel: Lustre: 51419:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply#012 req@ffff9cd27ff9c500 x1626261310564080/t0(0) o36->2b5ea6bb-9bbc-8164-5ec8-7cffe32aebcd@10.9.108.42@o2ib4:460/0 lens 784/2888 e 5 to 0 dl 1551394090 ref 2 fl Interpret:/0/0 rc 0/0 Feb 28 14:48:05 fir-md1-s1 kernel: Lustre: 51419:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 3 previous similar messages Feb 28 14:48:22 fir-md1-s1 kernel: Lustre: fir-MDT0000-osp-MDT0002: Connection to fir-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete Feb 28 14:48:22 fir-md1-s1 kernel: LustreError: 21795:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551393802, 300s ago), entering recovery for fir-MDT0000_UUID@10.0.10.51@o2ib7 ns: fir-MDT0000-osp-MDT0002 lock: ffff9cb3dcdbd340/0xb7044c65c2f8cd57 lrc: 4/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 28 type: IBT flags: 0x1000001000000 nid: local remote: 0xb7044c65c2f8cd5e expref: -99 pid: 21795 timeout: 0 lvb_type: 0 Feb 28 14:48:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Received new LWP connection from 0@lo, removing former export from same NID Feb 28 14:48:51 fir-md1-s1 kernel: LNet: Service thread pid 21919 was inactive for 486.06s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Feb 28 14:48:51 fir-md1-s1 kernel: Pid: 21919, comm: mdt02_023 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Feb 28 14:48:51 fir-md1-s1 kernel: Call Trace: Feb 28 14:48:51 fir-md1-s1 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Feb 28 14:48:51 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Feb 28 14:48:51 fir-md1-s1 kernel: [] mdt_reint_rename_or_migrate.isra.51+0x67a/0x860 [mdt] Feb 28 14:48:51 fir-md1-s1 kernel: [] mdt_reint_rename+0x13/0x20 [mdt] Feb 28 14:48:51 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Feb 28 14:48:51 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Feb 28 14:48:51 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Feb 28 14:48:51 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Feb 28 14:48:51 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Feb 28 14:48:51 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Feb 28 14:48:51 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Feb 28 14:48:51 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Feb 28 14:48:51 fir-md1-s1 kernel: [] 0xffffffffffffffff Feb 28 14:48:51 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551394131.21919 Feb 28 14:49:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client cfa92e15-9f67-a8be-d51e-06608e1376a7 (at 10.8.14.1@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c99362f6c00, cur 1551394154 expire 1551394004 last 1551393927 Feb 28 14:49:14 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 28 14:49:15 fir-md1-s1 kernel: LNet: Service thread pid 21759 was inactive for 513.08s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Feb 28 14:49:15 fir-md1-s1 kernel: Pid: 21759, comm: mdt01_009 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Feb 28 14:49:15 fir-md1-s1 kernel: Call Trace: Feb 28 14:49:15 fir-md1-s1 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Feb 28 14:49:15 fir-md1-s1 kernel: [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] Feb 28 14:49:15 fir-md1-s1 kernel: [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] Feb 28 14:49:15 fir-md1-s1 kernel: [] osp_md_object_lock+0x162/0x2d0 [osp] Feb 28 14:49:15 fir-md1-s1 kernel: [] lod_object_lock+0xf3/0x7b0 [lod] Feb 28 14:49:15 fir-md1-s1 kernel: [] mdd_object_lock+0x3e/0xe0 [mdd] Feb 28 14:49:15 fir-md1-s1 kernel: [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] Feb 28 14:49:15 fir-md1-s1 kernel: [] mdt_remote_object_lock+0x2a/0x30 [mdt] Feb 28 14:49:15 fir-md1-s1 kernel: [] mdt_reint_rename_or_migrate.isra.51+0x362/0x860 [mdt] Feb 28 14:49:15 fir-md1-s1 kernel: [] mdt_reint_rename+0x13/0x20 [mdt] Feb 28 14:49:15 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Feb 28 14:49:15 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Feb 28 14:49:15 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Feb 28 14:49:15 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Feb 28 14:49:15 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Feb 28 14:49:15 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Feb 28 14:49:15 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Feb 28 14:49:15 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Feb 28 14:49:15 fir-md1-s1 kernel: [] 0xffffffffffffffff Feb 28 14:49:15 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551394155.21759 Feb 28 14:49:17 fir-md1-s1 kernel: Pid: 22284, comm: mdt00_075 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Feb 28 14:49:17 fir-md1-s1 kernel: Call Trace: Feb 28 14:49:17 fir-md1-s1 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Feb 28 14:49:17 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Feb 28 14:49:17 fir-md1-s1 kernel: [] mdt_reint_rename_or_migrate.isra.51+0x67a/0x860 [mdt] Feb 28 14:49:17 fir-md1-s1 kernel: [] mdt_reint_rename+0x13/0x20 [mdt] Feb 28 14:49:17 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Feb 28 14:49:17 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Feb 28 14:49:17 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Feb 28 14:49:17 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Feb 28 14:49:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Feb 28 14:49:17 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Feb 28 14:49:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Feb 28 14:49:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Feb 28 14:49:17 fir-md1-s1 kernel: [] 0xffffffffffffffff Feb 28 14:49:17 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551394157.22284 Feb 28 14:49:19 fir-md1-s1 kernel: Lustre: 22175:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply#012 req@ffff9cac8566a100 x1626368187515536/t0(0) o36->a898ad4d-d0f7-2cbd-fb6c-7ee55a99e544@10.8.27.18@o2ib6:534/0 lens 536/2888 e 3 to 0 dl 1551394164 ref 2 fl Interpret:/0/0 rc 0/0 Feb 28 14:49:19 fir-md1-s1 kernel: Lustre: 22175:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 7 previous similar messages Feb 28 14:49:38 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 14:49:38 fir-md1-s1 kernel: Lustre: Skipped 29 previous similar messages Feb 28 14:50:29 fir-md1-s1 kernel: LNet: Service thread pid 21575 was inactive for 562.26s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Feb 28 14:50:29 fir-md1-s1 kernel: LNet: Skipped 1 previous similar message Feb 28 14:50:29 fir-md1-s1 kernel: Pid: 21575, comm: mdt01_007 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Feb 28 14:50:29 fir-md1-s1 kernel: Call Trace: Feb 28 14:50:29 fir-md1-s1 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Feb 28 14:50:29 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Feb 28 14:50:29 fir-md1-s1 kernel: [] mdt_reint_rename_or_migrate.isra.51+0x67a/0x860 [mdt] Feb 28 14:50:29 fir-md1-s1 kernel: [] mdt_reint_rename+0x13/0x20 [mdt] Feb 28 14:50:29 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Feb 28 14:50:29 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Feb 28 14:50:29 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Feb 28 14:50:29 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Feb 28 14:50:29 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Feb 28 14:50:29 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Feb 28 14:50:29 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Feb 28 14:50:29 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Feb 28 14:50:29 fir-md1-s1 kernel: [] 0xffffffffffffffff Feb 28 14:50:29 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551394229.21575 Feb 28 14:51:03 fir-md1-s1 kernel: LustreError: 51470:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551393963, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff9c94db6e4c80/0xb7044c65c3b4c143 lrc: 3/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 157 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 51470 timeout: 0 lvb_type: 0 Feb 28 14:51:03 fir-md1-s1 kernel: LustreError: 51470:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 1 previous similar message Feb 28 14:52:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: Received new LWP connection from 10.0.10.52@o2ib7, removing former export from same NID Feb 28 14:53:12 fir-md1-s1 kernel: LustreError: 22195:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551394092, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff9cbc67e70fc0/0xb7044c65c452d04c lrc: 3/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 165 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 22195 timeout: 0 lvb_type: 0 Feb 28 14:53:12 fir-md1-s1 kernel: LustreError: 22195:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 2 previous similar messages Feb 28 14:53:18 fir-md1-s1 kernel: Lustre: 21959:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (4/4), not sending early reply#012 req@ffff9caba0b77800 x1626125632230128/t0(0) o36->f3b7f1d7-82ba-da21-3815-15a179e95f69@10.8.8.30@o2ib6:17/0 lens 520/2888 e 1 to 0 dl 1551394402 ref 2 fl Interpret:/0/0 rc 0/0 Feb 28 14:53:18 fir-md1-s1 kernel: Lustre: 21959:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 4 previous similar messages Feb 28 14:55:48 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 14:55:48 fir-md1-s1 kernel: Lustre: Skipped 43 previous similar messages Feb 28 14:56:04 fir-md1-s1 kernel: Lustre: fir-MDT0000-osp-MDT0002: Connection to fir-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete Feb 28 14:56:04 fir-md1-s1 kernel: LustreError: 21803:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551394264, 300s ago), entering recovery for fir-MDT0000_UUID@10.0.10.51@o2ib7 ns: fir-MDT0000-osp-MDT0002 lock: ffff9cbd3727a1c0/0xb7044c65c5224721 lrc: 4/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 36 type: IBT flags: 0x1000001000000 nid: local remote: 0xb7044c65c5224728 expref: -99 pid: 21803 timeout: 0 lvb_type: 0 Feb 28 14:56:04 fir-md1-s1 kernel: Lustre: fir-MDT0000: Received new LWP connection from 0@lo, removing former export from same NID Feb 28 14:56:04 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 14:56:47 fir-md1-s1 kernel: LustreError: 47886:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551394307, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff9ca0593b4ec0/0xb7044c65c554373c lrc: 3/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 172 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 47886 timeout: 0 lvb_type: 0 Feb 28 14:56:47 fir-md1-s1 kernel: LustreError: 47886:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 3 previous similar messages Feb 28 14:56:56 fir-md1-s1 kernel: LNet: Service thread pid 21795 was inactive for 813.84s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Feb 28 14:56:56 fir-md1-s1 kernel: Pid: 21795, comm: mdt01_017 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Feb 28 14:56:56 fir-md1-s1 kernel: Call Trace: Feb 28 14:56:56 fir-md1-s1 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Feb 28 14:56:56 fir-md1-s1 kernel: [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] Feb 28 14:56:56 fir-md1-s1 kernel: [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] Feb 28 14:56:56 fir-md1-s1 kernel: [] osp_md_object_lock+0x162/0x2d0 [osp] Feb 28 14:56:56 fir-md1-s1 kernel: [] lod_object_lock+0xf3/0x7b0 [lod] Feb 28 14:56:56 fir-md1-s1 kernel: [] mdd_object_lock+0x3e/0xe0 [mdd] Feb 28 14:56:56 fir-md1-s1 kernel: [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] Feb 28 14:56:56 fir-md1-s1 kernel: [] mdt_remote_object_lock+0x2a/0x30 [mdt] Feb 28 14:56:56 fir-md1-s1 kernel: [] mdt_reint_rename_or_migrate.isra.51+0x362/0x860 [mdt] Feb 28 14:56:56 fir-md1-s1 kernel: [] mdt_reint_rename+0x13/0x20 [mdt] Feb 28 14:56:56 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Feb 28 14:56:56 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Feb 28 14:56:56 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Feb 28 14:56:56 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Feb 28 14:56:56 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Feb 28 14:56:56 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Feb 28 14:56:56 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Feb 28 14:56:56 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Feb 28 14:56:56 fir-md1-s1 kernel: [] 0xffffffffffffffff Feb 28 14:56:56 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551394616.21795 Feb 28 14:57:59 fir-md1-s1 kernel: Lustre: 22259:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-110), not sending early reply#012 req@ffff9cac6de70300 x1626589335333440/t0(0) o36->8c1a5c06-c8d1-1742-f2b6-163e94d737d7@10.8.18.25@o2ib6:299/0 lens 544/2888 e 0 to 0 dl 1551394684 ref 2 fl Interpret:/0/0 rc 0/0 Feb 28 14:57:59 fir-md1-s1 kernel: Lustre: 22259:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Feb 28 14:59:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: Received new LWP connection from 10.0.10.52@o2ib7, removing former export from same NID Feb 28 14:59:39 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 14:59:39 fir-md1-s1 kernel: Lustre: Skipped 37 previous similar messages Feb 28 15:02:17 fir-md1-s1 kernel: LNet: Service thread pid 51469 was inactive for 200.22s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Feb 28 15:02:17 fir-md1-s1 kernel: Pid: 51469, comm: mdt03_053 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Feb 28 15:02:17 fir-md1-s1 kernel: Call Trace: Feb 28 15:02:17 fir-md1-s1 kernel: [] ldlm_completion_ast+0x5b1/0x920 [ptlrpc] Feb 28 15:02:17 fir-md1-s1 kernel: [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] Feb 28 15:02:17 fir-md1-s1 kernel: [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] Feb 28 15:02:17 fir-md1-s1 kernel: [] osp_md_object_lock+0x162/0x2d0 [osp] Feb 28 15:02:17 fir-md1-s1 kernel: [] lod_object_lock+0xf3/0x7b0 [lod] Feb 28 15:02:17 fir-md1-s1 kernel: [] mdd_object_lock+0x3e/0xe0 [mdd] Feb 28 15:02:17 fir-md1-s1 kernel: [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] Feb 28 15:02:17 fir-md1-s1 kernel: [] mdt_remote_object_lock+0x2a/0x30 [mdt] Feb 28 15:02:17 fir-md1-s1 kernel: [] mdt_reint_rename_or_migrate.isra.51+0x362/0x860 [mdt] Feb 28 15:02:17 fir-md1-s1 kernel: [] mdt_reint_rename+0x13/0x20 [mdt] Feb 28 15:02:17 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Feb 28 15:02:17 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Feb 28 15:02:17 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Feb 28 15:02:17 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Feb 28 15:02:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Feb 28 15:02:17 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Feb 28 15:02:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Feb 28 15:02:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Feb 28 15:02:17 fir-md1-s1 kernel: [] 0xffffffffffffffff Feb 28 15:02:17 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551394937.51469 Feb 28 15:02:27 fir-md1-s1 kernel: Lustre: fir-MDT0000-osp-MDT0002: Connection to fir-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete Feb 28 15:02:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: Received new LWP connection from 0@lo, removing former export from same NID Feb 28 15:04:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: Received new LWP connection from 10.0.10.52@o2ib7, removing former export from same NID Feb 28 15:05:18 fir-md1-s1 kernel: LNet: Service thread pid 47885 was inactive for 262.32s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Feb 28 15:05:18 fir-md1-s1 kernel: Pid: 47885, comm: mdt03_028 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Feb 28 15:05:18 fir-md1-s1 kernel: Call Trace: Feb 28 15:05:18 fir-md1-s1 kernel: [] ldlm_completion_ast+0x5b1/0x920 [ptlrpc] Feb 28 15:05:18 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Feb 28 15:05:18 fir-md1-s1 kernel: [] mdt_reint_rename_or_migrate.isra.51+0x67a/0x860 [mdt] Feb 28 15:05:18 fir-md1-s1 kernel: [] mdt_reint_rename+0x13/0x20 [mdt] Feb 28 15:05:18 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Feb 28 15:05:18 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Feb 28 15:05:18 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Feb 28 15:05:18 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Feb 28 15:05:18 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Feb 28 15:05:18 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Feb 28 15:05:18 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Feb 28 15:05:18 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Feb 28 15:05:18 fir-md1-s1 kernel: [] 0xffffffffffffffff Feb 28 15:05:18 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551395118.47885 Feb 28 15:05:28 fir-md1-s1 kernel: Pid: 51470, comm: mdt00_098 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Feb 28 15:05:28 fir-md1-s1 kernel: Call Trace: Feb 28 15:05:28 fir-md1-s1 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Feb 28 15:05:28 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Feb 28 15:05:28 fir-md1-s1 kernel: [] mdt_reint_rename_or_migrate.isra.51+0x67a/0x860 [mdt] Feb 28 15:05:28 fir-md1-s1 kernel: [] mdt_reint_rename+0x13/0x20 [mdt] Feb 28 15:05:28 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Feb 28 15:05:28 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Feb 28 15:05:28 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Feb 28 15:05:28 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Feb 28 15:05:28 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Feb 28 15:05:28 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Feb 28 15:05:28 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Feb 28 15:05:28 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Feb 28 15:05:28 fir-md1-s1 kernel: [] 0xffffffffffffffff Feb 28 15:05:28 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551395128.51470 Feb 28 15:05:32 fir-md1-s1 kernel: Pid: 22201, comm: mdt01_077 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Feb 28 15:05:32 fir-md1-s1 kernel: Call Trace: Feb 28 15:05:32 fir-md1-s1 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Feb 28 15:05:32 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Feb 28 15:05:32 fir-md1-s1 kernel: [] mdt_reint_rename_or_migrate.isra.51+0x67a/0x860 [mdt] Feb 28 15:05:32 fir-md1-s1 kernel: [] mdt_reint_rename+0x13/0x20 [mdt] Feb 28 15:05:32 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Feb 28 15:05:32 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Feb 28 15:05:32 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Feb 28 15:05:32 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Feb 28 15:05:32 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Feb 28 15:05:32 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Feb 28 15:05:32 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Feb 28 15:05:32 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Feb 28 15:05:32 fir-md1-s1 kernel: [] 0xffffffffffffffff Feb 28 15:05:32 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551395132.22201 Feb 28 15:05:49 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 15:05:49 fir-md1-s1 kernel: Lustre: Skipped 53 previous similar messages Feb 28 15:05:56 fir-md1-s1 kernel: LustreError: 47885:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551394856, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff9cbcffbb3cc0/0xb7044c65c796a822 lrc: 3/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 185 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 47885 timeout: 0 lvb_type: 0 Feb 28 15:05:56 fir-md1-s1 kernel: LustreError: 47885:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 4 previous similar messages Feb 28 15:06:46 fir-md1-s1 kernel: Pid: 21965, comm: mdt01_037 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Feb 28 15:06:46 fir-md1-s1 kernel: Call Trace: Feb 28 15:06:46 fir-md1-s1 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Feb 28 15:06:46 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Feb 28 15:06:46 fir-md1-s1 kernel: [] mdt_reint_rename_or_migrate.isra.51+0x67a/0x860 [mdt] Feb 28 15:06:46 fir-md1-s1 kernel: [] mdt_reint_rename+0x13/0x20 [mdt] Feb 28 15:06:46 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Feb 28 15:06:46 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Feb 28 15:06:46 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Feb 28 15:06:46 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Feb 28 15:06:46 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Feb 28 15:06:46 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Feb 28 15:06:46 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Feb 28 15:06:46 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Feb 28 15:06:46 fir-md1-s1 kernel: [] 0xffffffffffffffff Feb 28 15:06:46 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551395206.21965 Feb 28 15:06:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: Received new LWP connection from 10.0.10.52@o2ib7, removing former export from same NID Feb 28 15:07:15 fir-md1-s1 kernel: LustreError: 22248:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551394935, 300s ago), entering recovery for fir-MDT0000_UUID@10.0.10.51@o2ib7 ns: fir-MDT0000-osp-MDT0002 lock: ffff9c968b043a80/0xb7044c65c79b5728 lrc: 4/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 49 type: IBT flags: 0x1000001000000 nid: local remote: 0xb7044c65c79b572f expref: -99 pid: 22248 timeout: 0 lvb_type: 0 Feb 28 15:07:15 fir-md1-s1 kernel: LustreError: 22248:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 11 previous similar messages Feb 28 15:07:36 fir-md1-s1 kernel: Pid: 48745, comm: mdt03_056 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Feb 28 15:07:36 fir-md1-s1 kernel: Call Trace: Feb 28 15:07:36 fir-md1-s1 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Feb 28 15:07:36 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Feb 28 15:07:36 fir-md1-s1 kernel: [] mdt_reint_rename_or_migrate.isra.51+0x67a/0x860 [mdt] Feb 28 15:07:36 fir-md1-s1 kernel: [] mdt_reint_rename+0x13/0x20 [mdt] Feb 28 15:07:36 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Feb 28 15:07:36 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Feb 28 15:07:36 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Feb 28 15:07:36 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Feb 28 15:07:36 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Feb 28 15:07:36 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Feb 28 15:07:36 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Feb 28 15:07:36 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Feb 28 15:07:36 fir-md1-s1 kernel: [] 0xffffffffffffffff Feb 28 15:07:36 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551395256.48745 Feb 28 15:07:42 fir-md1-s1 kernel: Lustre: 21820:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-150), not sending early reply#012 req@ffff9cbc213f1800 x1626244907755696/t0(0) o36->b0f50f4d-0ea2-aeb8-0fcd-c8962c905da8@10.9.113.10@o2ib4:127/0 lens 528/2888 e 0 to 0 dl 1551395267 ref 2 fl Interpret:/0/0 rc 0/0 Feb 28 15:07:42 fir-md1-s1 kernel: Lustre: 21820:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 16 previous similar messages Feb 28 15:08:16 fir-md1-s1 kernel: Pid: 22195, comm: mdt02_055 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Feb 28 15:08:16 fir-md1-s1 kernel: Call Trace: Feb 28 15:08:16 fir-md1-s1 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Feb 28 15:08:16 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Feb 28 15:08:16 fir-md1-s1 kernel: [] mdt_reint_rename_or_migrate.isra.51+0x67a/0x860 [mdt] Feb 28 15:08:16 fir-md1-s1 kernel: [] mdt_reint_rename+0x13/0x20 [mdt] Feb 28 15:08:16 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Feb 28 15:08:16 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Feb 28 15:08:16 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Feb 28 15:08:16 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Feb 28 15:08:16 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Feb 28 15:08:16 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Feb 28 15:08:16 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Feb 28 15:08:16 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Feb 28 15:08:16 fir-md1-s1 kernel: [] 0xffffffffffffffff Feb 28 15:08:16 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551395296.22195 Feb 28 15:09:22 fir-md1-s1 kernel: Pid: 22164, comm: mdt01_060 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Feb 28 15:09:22 fir-md1-s1 kernel: Call Trace: Feb 28 15:09:22 fir-md1-s1 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Feb 28 15:09:22 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Feb 28 15:09:22 fir-md1-s1 kernel: [] mdt_reint_rename_or_migrate.isra.51+0x67a/0x860 [mdt] Feb 28 15:09:22 fir-md1-s1 kernel: [] mdt_reint_rename+0x13/0x20 [mdt] Feb 28 15:09:22 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Feb 28 15:09:22 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Feb 28 15:09:22 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Feb 28 15:09:22 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Feb 28 15:09:22 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Feb 28 15:09:22 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Feb 28 15:09:22 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Feb 28 15:09:22 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Feb 28 15:09:22 fir-md1-s1 kernel: [] 0xffffffffffffffff Feb 28 15:09:22 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551395362.22164 Feb 28 15:09:38 fir-md1-s1 kernel: LNet: Service thread pid 21595 was inactive for 1203.36s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Feb 28 15:09:38 fir-md1-s1 kernel: LNet: Skipped 6 previous similar messages Feb 28 15:09:38 fir-md1-s1 kernel: Pid: 21595, comm: mdt01_008 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Feb 28 15:09:38 fir-md1-s1 kernel: Call Trace: Feb 28 15:09:38 fir-md1-s1 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Feb 28 15:09:38 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Feb 28 15:09:38 fir-md1-s1 kernel: [] mdt_reint_rename_or_migrate.isra.51+0x67a/0x860 [mdt] Feb 28 15:09:38 fir-md1-s1 kernel: [] mdt_reint_rename+0x13/0x20 [mdt] Feb 28 15:09:38 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Feb 28 15:09:38 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Feb 28 15:09:38 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Feb 28 15:09:38 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Feb 28 15:09:38 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Feb 28 15:09:38 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Feb 28 15:09:38 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Feb 28 15:09:38 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Feb 28 15:09:38 fir-md1-s1 kernel: [] 0xffffffffffffffff Feb 28 15:09:38 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551395378.21595 Feb 28 15:09:40 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 15:09:40 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Feb 28 15:09:50 fir-md1-s1 kernel: Pid: 51451, comm: mdt03_049 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Feb 28 15:09:50 fir-md1-s1 kernel: Call Trace: Feb 28 15:09:50 fir-md1-s1 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Feb 28 15:09:50 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Feb 28 15:09:50 fir-md1-s1 kernel: [] mdt_reint_rename_or_migrate.isra.51+0x67a/0x860 [mdt] Feb 28 15:09:50 fir-md1-s1 kernel: [] mdt_reint_rename+0x13/0x20 [mdt] Feb 28 15:09:50 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Feb 28 15:09:50 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Feb 28 15:09:50 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Feb 28 15:09:50 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Feb 28 15:09:50 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Feb 28 15:09:50 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Feb 28 15:09:50 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Feb 28 15:09:50 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Feb 28 15:09:50 fir-md1-s1 kernel: [] 0xffffffffffffffff Feb 28 15:09:50 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551395390.51451 Feb 28 15:11:08 fir-md1-s1 kernel: LNet: Service thread pid 21803 was inactive for 1203.90s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Feb 28 15:11:08 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551395468.21803 Feb 28 15:11:12 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551395472.22145 Feb 28 15:11:49 fir-md1-s1 kernel: LNet: Service thread pid 47886 was inactive for 1201.70s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Feb 28 15:11:49 fir-md1-s1 kernel: LNet: Skipped 1 previous similar message Feb 28 15:11:49 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551395509.47886 Feb 28 15:12:00 fir-md1-s1 kernel: Lustre: fir-MDT0000-osp-MDT0002: Connection to fir-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete Feb 28 15:12:00 fir-md1-s1 kernel: Lustre: fir-MDT0000: Received new LWP connection from 0@lo, removing former export from same NID Feb 28 15:12:00 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 15:12:01 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551395521.22188 Feb 28 15:12:05 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551395525.47879 Feb 28 15:13:23 fir-md1-s1 kernel: Pid: 22189, comm: mdt01_071 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Feb 28 15:13:23 fir-md1-s1 kernel: Call Trace: Feb 28 15:13:23 fir-md1-s1 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Feb 28 15:13:23 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Feb 28 15:13:23 fir-md1-s1 kernel: [] mdt_reint_rename_or_migrate.isra.51+0x67a/0x860 [mdt] Feb 28 15:13:23 fir-md1-s1 kernel: [] mdt_reint_rename+0x13/0x20 [mdt] Feb 28 15:13:23 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Feb 28 15:13:23 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Feb 28 15:13:23 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Feb 28 15:13:23 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Feb 28 15:13:23 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Feb 28 15:13:23 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Feb 28 15:13:23 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Feb 28 15:13:23 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Feb 28 15:13:23 fir-md1-s1 kernel: [] 0xffffffffffffffff Feb 28 15:13:23 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551395603.22189 Feb 28 15:13:23 fir-md1-s1 kernel: Pid: 22175, comm: mdt01_066 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Feb 28 15:13:23 fir-md1-s1 kernel: Call Trace: Feb 28 15:13:23 fir-md1-s1 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Feb 28 15:13:23 fir-md1-s1 kernel: [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] Feb 28 15:13:23 fir-md1-s1 kernel: [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] Feb 28 15:13:23 fir-md1-s1 kernel: [] osp_md_object_lock+0x162/0x2d0 [osp] Feb 28 15:13:23 fir-md1-s1 kernel: [] lod_object_lock+0xf3/0x7b0 [lod] Feb 28 15:13:23 fir-md1-s1 kernel: [] mdd_object_lock+0x3e/0xe0 [mdd] Feb 28 15:13:23 fir-md1-s1 kernel: [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] Feb 28 15:13:23 fir-md1-s1 kernel: [] mdt_remote_object_lock+0x2a/0x30 [mdt] Feb 28 15:13:23 fir-md1-s1 kernel: [] mdt_reint_rename_or_migrate.isra.51+0x362/0x860 [mdt] Feb 28 15:13:23 fir-md1-s1 kernel: [] mdt_reint_rename+0x13/0x20 [mdt] Feb 28 15:13:23 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Feb 28 15:13:23 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Feb 28 15:13:23 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Feb 28 15:13:23 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Feb 28 15:13:23 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Feb 28 15:13:23 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Feb 28 15:13:23 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Feb 28 15:13:23 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Feb 28 15:13:23 fir-md1-s1 kernel: [] 0xffffffffffffffff Feb 28 15:14:59 fir-md1-s1 kernel: LustreError: 21248:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551395399, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff9cd0a521e540/0xb7044c65c7cbfa76 lrc: 3/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 212 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 21248 timeout: 0 lvb_type: 0 Feb 28 15:14:59 fir-md1-s1 kernel: LustreError: 21248:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 3 previous similar messages Feb 28 15:15:14 fir-md1-s1 kernel: Pid: 51439, comm: mdt00_091 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Feb 28 15:15:14 fir-md1-s1 kernel: Call Trace: Feb 28 15:15:14 fir-md1-s1 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Feb 28 15:15:14 fir-md1-s1 kernel: [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] Feb 28 15:15:14 fir-md1-s1 kernel: [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] Feb 28 15:15:14 fir-md1-s1 kernel: [] osp_md_object_lock+0x162/0x2d0 [osp] Feb 28 15:15:14 fir-md1-s1 kernel: [] lod_object_lock+0xf3/0x7b0 [lod] Feb 28 15:15:14 fir-md1-s1 kernel: [] mdd_object_lock+0x3e/0xe0 [mdd] Feb 28 15:15:14 fir-md1-s1 kernel: [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] Feb 28 15:15:14 fir-md1-s1 kernel: [] mdt_remote_object_lock+0x2a/0x30 [mdt] Feb 28 15:15:14 fir-md1-s1 kernel: [] mdt_reint_rename_or_migrate.isra.51+0x362/0x860 [mdt] Feb 28 15:15:14 fir-md1-s1 kernel: [] mdt_reint_rename+0x13/0x20 [mdt] Feb 28 15:15:14 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Feb 28 15:15:14 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Feb 28 15:15:14 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Feb 28 15:15:14 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Feb 28 15:15:14 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Feb 28 15:15:14 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Feb 28 15:15:14 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Feb 28 15:15:14 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Feb 28 15:15:14 fir-md1-s1 kernel: [] 0xffffffffffffffff Feb 28 15:15:14 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551395714.51439 Feb 28 15:15:14 fir-md1-s1 kernel: Pid: 21291, comm: mdt01_003 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Feb 28 15:15:14 fir-md1-s1 kernel: Call Trace: Feb 28 15:15:14 fir-md1-s1 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Feb 28 15:15:14 fir-md1-s1 kernel: [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] Feb 28 15:15:14 fir-md1-s1 kernel: [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] Feb 28 15:15:14 fir-md1-s1 kernel: [] osp_md_object_lock+0x162/0x2d0 [osp] Feb 28 15:15:14 fir-md1-s1 kernel: [] lod_object_lock+0xf3/0x7b0 [lod] Feb 28 15:15:14 fir-md1-s1 kernel: [] mdd_object_lock+0x3e/0xe0 [mdd] Feb 28 15:15:14 fir-md1-s1 kernel: [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] Feb 28 15:15:14 fir-md1-s1 kernel: [] mdt_remote_object_lock+0x2a/0x30 [mdt] Feb 28 15:15:14 fir-md1-s1 kernel: [] mdt_reint_rename_or_migrate.isra.51+0x362/0x860 [mdt] Feb 28 15:15:14 fir-md1-s1 kernel: [] mdt_reint_rename+0x13/0x20 [mdt] Feb 28 15:15:14 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Feb 28 15:15:14 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Feb 28 15:15:14 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Feb 28 15:15:14 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Feb 28 15:15:14 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Feb 28 15:15:14 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Feb 28 15:15:14 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Feb 28 15:15:14 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Feb 28 15:15:14 fir-md1-s1 kernel: [] 0xffffffffffffffff Feb 28 15:15:22 fir-md1-s1 kernel: Pid: 22266, comm: mdt02_077 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Feb 28 15:15:22 fir-md1-s1 kernel: Call Trace: Feb 28 15:15:22 fir-md1-s1 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Feb 28 15:15:22 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Feb 28 15:15:22 fir-md1-s1 kernel: [] mdt_reint_rename_or_migrate.isra.51+0x67a/0x860 [mdt] Feb 28 15:15:22 fir-md1-s1 kernel: [] mdt_reint_rename+0x13/0x20 [mdt] Feb 28 15:15:22 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Feb 28 15:15:22 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Feb 28 15:15:22 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Feb 28 15:15:22 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Feb 28 15:15:22 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Feb 28 15:15:22 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Feb 28 15:15:22 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Feb 28 15:15:22 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Feb 28 15:15:22 fir-md1-s1 kernel: [] 0xffffffffffffffff Feb 28 15:15:22 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551395722.22266 Feb 28 15:15:47 fir-md1-s1 kernel: LNet: Service thread pid 21763 was inactive for 1204.57s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Feb 28 15:15:47 fir-md1-s1 kernel: LNet: Skipped 4 previous similar messages Feb 28 15:15:47 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551395747.21763 Feb 28 15:15:50 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 15:15:50 fir-md1-s1 kernel: Lustre: Skipped 57 previous similar messages Feb 28 15:17:29 fir-md1-s1 kernel: LNet: Service thread pid 22219 was inactive for 1202.08s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Feb 28 15:17:29 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551395849.22219 Feb 28 15:17:39 fir-md1-s1 kernel: Lustre: fir-MDT0000-osp-MDT0002: Connection to fir-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete Feb 28 15:17:39 fir-md1-s1 kernel: LustreError: 21928:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551395559, 300s ago), entering recovery for fir-MDT0000_UUID@10.0.10.51@o2ib7 ns: fir-MDT0000-osp-MDT0002 lock: ffff9cc00b7caf40/0xb7044c65c7da261a lrc: 4/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 60 type: IBT flags: 0x1000001000000 nid: local remote: 0xb7044c65c7da2621 expref: -99 pid: 21928 timeout: 0 lvb_type: 0 Feb 28 15:17:39 fir-md1-s1 kernel: LustreError: 21928:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 11 previous similar messages Feb 28 15:18:27 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client ab6423f5-6890-6c98-d786-25703ca7a051 (at 10.8.15.6@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca438ab8400, cur 1551395907 expire 1551395757 last 1551395680 Feb 28 15:18:27 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 15:18:39 fir-md1-s1 kernel: LNet: Service thread pid 21952 was inactive for 1202.56s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Feb 28 15:18:39 fir-md1-s1 kernel: LNet: Skipped 6 previous similar messages Feb 28 15:18:39 fir-md1-s1 kernel: Pid: 21952, comm: mdt01_032 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Feb 28 15:18:39 fir-md1-s1 kernel: Call Trace: Feb 28 15:18:39 fir-md1-s1 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Feb 28 15:18:39 fir-md1-s1 kernel: [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] Feb 28 15:18:39 fir-md1-s1 kernel: [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] Feb 28 15:18:39 fir-md1-s1 kernel: [] osp_md_object_lock+0x162/0x2d0 [osp] Feb 28 15:18:39 fir-md1-s1 kernel: [] lod_object_lock+0xf3/0x7b0 [lod] Feb 28 15:18:39 fir-md1-s1 kernel: [] mdd_object_lock+0x3e/0xe0 [mdd] Feb 28 15:18:39 fir-md1-s1 kernel: [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] Feb 28 15:18:39 fir-md1-s1 kernel: [] mdt_remote_object_lock+0x2a/0x30 [mdt] Feb 28 15:18:39 fir-md1-s1 kernel: [] mdt_reint_rename_or_migrate.isra.51+0x362/0x860 [mdt] Feb 28 15:18:39 fir-md1-s1 kernel: [] mdt_reint_rename+0x13/0x20 [mdt] Feb 28 15:18:39 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Feb 28 15:18:39 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Feb 28 15:18:39 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Feb 28 15:18:39 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Feb 28 15:18:39 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Feb 28 15:18:39 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Feb 28 15:18:39 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Feb 28 15:18:39 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Feb 28 15:18:39 fir-md1-s1 kernel: [] 0xffffffffffffffff Feb 28 15:18:39 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551395919.21952 Feb 28 15:19:20 fir-md1-s1 kernel: Pid: 21959, comm: mdt01_034 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Feb 28 15:19:20 fir-md1-s1 kernel: Call Trace: Feb 28 15:19:20 fir-md1-s1 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Feb 28 15:19:20 fir-md1-s1 kernel: [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] Feb 28 15:19:20 fir-md1-s1 kernel: [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] Feb 28 15:19:20 fir-md1-s1 kernel: [] osp_md_object_lock+0x162/0x2d0 [osp] Feb 28 15:19:20 fir-md1-s1 kernel: [] lod_object_lock+0xf3/0x7b0 [lod] Feb 28 15:19:20 fir-md1-s1 kernel: [] mdd_object_lock+0x3e/0xe0 [mdd] Feb 28 15:19:20 fir-md1-s1 kernel: [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] Feb 28 15:19:20 fir-md1-s1 kernel: [] mdt_remote_object_lock+0x2a/0x30 [mdt] Feb 28 15:19:20 fir-md1-s1 kernel: [] mdt_reint_rename_or_migrate.isra.51+0x362/0x860 [mdt] Feb 28 15:19:20 fir-md1-s1 kernel: [] mdt_reint_rename+0x13/0x20 [mdt] Feb 28 15:19:20 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Feb 28 15:19:20 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Feb 28 15:19:20 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Feb 28 15:19:20 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Feb 28 15:19:20 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Feb 28 15:19:20 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Feb 28 15:19:20 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Feb 28 15:19:20 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Feb 28 15:19:20 fir-md1-s1 kernel: [] 0xffffffffffffffff Feb 28 15:19:20 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551395960.21959 Feb 28 15:19:30 fir-md1-s1 kernel: Lustre: 50316:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-150), not sending early reply#012 req@ffff9cb968752d00 x1626747728509264/t0(0) o36->2311e0c5-324d-c20c-3aff-483d234118f3@10.9.108.2@o2ib4:80/0 lens 520/2888 e 0 to 0 dl 1551395975 ref 2 fl Interpret:/0/0 rc 0/0 Feb 28 15:19:30 fir-md1-s1 kernel: Lustre: 50316:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 9 previous similar messages Feb 28 15:19:41 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 15:19:41 fir-md1-s1 kernel: Lustre: Skipped 63 previous similar messages Feb 28 15:20:58 fir-md1-s1 kernel: Pid: 48717, comm: mdt00_108 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Feb 28 15:20:58 fir-md1-s1 kernel: Call Trace: Feb 28 15:20:58 fir-md1-s1 kernel: [] ldlm_completion_ast+0x5b1/0x920 [ptlrpc] Feb 28 15:20:58 fir-md1-s1 kernel: [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] Feb 28 15:20:58 fir-md1-s1 kernel: [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] Feb 28 15:20:58 fir-md1-s1 kernel: [] osp_md_object_lock+0x162/0x2d0 [osp] Feb 28 15:20:58 fir-md1-s1 kernel: [] lod_object_lock+0xf3/0x7b0 [lod] Feb 28 15:20:58 fir-md1-s1 kernel: [] mdd_object_lock+0x3e/0xe0 [mdd] Feb 28 15:20:58 fir-md1-s1 kernel: [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] Feb 28 15:20:58 fir-md1-s1 kernel: [] mdt_remote_object_lock+0x2a/0x30 [mdt] Feb 28 15:20:58 fir-md1-s1 kernel: [] mdt_reint_rename_or_migrate.isra.51+0x362/0x860 [mdt] Feb 28 15:20:58 fir-md1-s1 kernel: [] mdt_reint_rename+0x13/0x20 [mdt] Feb 28 15:20:58 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Feb 28 15:20:58 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Feb 28 15:20:58 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Feb 28 15:20:58 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Feb 28 15:20:58 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Feb 28 15:20:58 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Feb 28 15:20:58 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Feb 28 15:20:58 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Feb 28 15:20:58 fir-md1-s1 kernel: [] 0xffffffffffffffff Feb 28 15:20:58 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551396058.48717 Feb 28 15:20:59 fir-md1-s1 kernel: Pid: 21244, comm: mdt02_001 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Feb 28 15:20:59 fir-md1-s1 kernel: Call Trace: Feb 28 15:20:59 fir-md1-s1 kernel: [] ldlm_completion_ast+0x5b1/0x920 [ptlrpc] Feb 28 15:20:59 fir-md1-s1 kernel: [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] Feb 28 15:20:59 fir-md1-s1 kernel: [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] Feb 28 15:20:59 fir-md1-s1 kernel: [] osp_md_object_lock+0x162/0x2d0 [osp] Feb 28 15:20:59 fir-md1-s1 kernel: [] lod_object_lock+0xf3/0x7b0 [lod] Feb 28 15:20:59 fir-md1-s1 kernel: [] mdd_object_lock+0x3e/0xe0 [mdd] Feb 28 15:20:59 fir-md1-s1 kernel: [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] Feb 28 15:20:59 fir-md1-s1 kernel: [] mdt_remote_object_lock+0x2a/0x30 [mdt] Feb 28 15:20:59 fir-md1-s1 kernel: [] mdt_reint_rename_or_migrate.isra.51+0x362/0x860 [mdt] Feb 28 15:20:59 fir-md1-s1 kernel: [] mdt_reint_rename+0x13/0x20 [mdt] Feb 28 15:20:59 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Feb 28 15:20:59 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Feb 28 15:20:59 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Feb 28 15:20:59 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Feb 28 15:20:59 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Feb 28 15:20:59 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Feb 28 15:20:59 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Feb 28 15:20:59 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Feb 28 15:20:59 fir-md1-s1 kernel: [] 0xffffffffffffffff Feb 28 15:20:59 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551396059.21244 Feb 28 15:20:59 fir-md1-s1 kernel: Pid: 21968, comm: mdt00_025 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Feb 28 15:20:59 fir-md1-s1 kernel: Call Trace: Feb 28 15:20:59 fir-md1-s1 kernel: [] ldlm_completion_ast+0x5b1/0x920 [ptlrpc] Feb 28 15:20:59 fir-md1-s1 kernel: [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] Feb 28 15:20:59 fir-md1-s1 kernel: [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] Feb 28 15:20:59 fir-md1-s1 kernel: [] osp_md_object_lock+0x162/0x2d0 [osp] Feb 28 15:20:59 fir-md1-s1 kernel: [] lod_object_lock+0xf3/0x7b0 [lod] Feb 28 15:20:59 fir-md1-s1 kernel: [] mdd_object_lock+0x3e/0xe0 [mdd] Feb 28 15:20:59 fir-md1-s1 kernel: [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] Feb 28 15:20:59 fir-md1-s1 kernel: [] mdt_remote_object_lock+0x2a/0x30 [mdt] Feb 28 15:20:59 fir-md1-s1 kernel: [] mdt_reint_rename_or_migrate.isra.51+0x362/0x860 [mdt] Feb 28 15:20:59 fir-md1-s1 kernel: [] mdt_reint_rename+0x13/0x20 [mdt] Feb 28 15:20:59 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Feb 28 15:20:59 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Feb 28 15:20:59 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Feb 28 15:21:00 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Feb 28 15:21:00 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Feb 28 15:21:00 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Feb 28 15:21:00 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Feb 28 15:21:00 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Feb 28 15:21:00 fir-md1-s1 kernel: [] 0xffffffffffffffff Feb 28 15:21:00 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551396060.21968 Feb 28 15:21:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Received new LWP connection from 10.0.10.52@o2ib7, removing former export from same NID Feb 28 15:21:23 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Feb 28 15:22:16 fir-md1-s1 kernel: LNet: Service thread pid 22248 was inactive for 1200.61s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Feb 28 15:22:16 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551396136.22248 Feb 28 15:22:27 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551396147.47905 Feb 28 15:22:36 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551396156.22286 Feb 28 15:22:44 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551396164.22278 Feb 28 15:23:24 fir-md1-s1 kernel: Lustre: fir-MDT0000-osp-MDT0002: Connection to fir-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete Feb 28 15:25:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 15:25:51 fir-md1-s1 kernel: Lustre: Skipped 69 previous similar messages Feb 28 15:27:02 fir-md1-s1 kernel: Pid: 21925, comm: mdt02_026 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Feb 28 15:27:02 fir-md1-s1 kernel: Call Trace: Feb 28 15:27:02 fir-md1-s1 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Feb 28 15:27:02 fir-md1-s1 kernel: [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] Feb 28 15:27:02 fir-md1-s1 kernel: [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] Feb 28 15:27:02 fir-md1-s1 kernel: [] osp_md_object_lock+0x162/0x2d0 [osp] Feb 28 15:27:02 fir-md1-s1 kernel: [] lod_object_lock+0xf3/0x7b0 [lod] Feb 28 15:27:02 fir-md1-s1 kernel: [] mdd_object_lock+0x3e/0xe0 [mdd] Feb 28 15:27:02 fir-md1-s1 kernel: [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] Feb 28 15:27:02 fir-md1-s1 kernel: [] mdt_remote_object_lock+0x2a/0x30 [mdt] Feb 28 15:27:02 fir-md1-s1 kernel: [] mdt_reint_rename_or_migrate.isra.51+0x362/0x860 [mdt] Feb 28 15:27:02 fir-md1-s1 kernel: [] mdt_reint_rename+0x13/0x20 [mdt] Feb 28 15:27:02 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Feb 28 15:27:02 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Feb 28 15:27:03 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Feb 28 15:27:03 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Feb 28 15:27:03 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Feb 28 15:27:03 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Feb 28 15:27:03 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Feb 28 15:27:03 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Feb 28 15:27:03 fir-md1-s1 kernel: [] 0xffffffffffffffff Feb 28 15:27:03 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551396423.21925 Feb 28 15:27:03 fir-md1-s1 kernel: Pid: 48711, comm: mdt00_102 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Feb 28 15:27:03 fir-md1-s1 kernel: Call Trace: Feb 28 15:27:03 fir-md1-s1 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Feb 28 15:27:03 fir-md1-s1 kernel: [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] Feb 28 15:27:03 fir-md1-s1 kernel: [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] Feb 28 15:27:03 fir-md1-s1 kernel: [] osp_md_object_lock+0x162/0x2d0 [osp] Feb 28 15:27:03 fir-md1-s1 kernel: [] lod_object_lock+0xf3/0x7b0 [lod] Feb 28 15:27:03 fir-md1-s1 kernel: [] mdd_object_lock+0x3e/0xe0 [mdd] Feb 28 15:27:03 fir-md1-s1 kernel: [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] Feb 28 15:27:03 fir-md1-s1 kernel: [] mdt_remote_object_lock+0x2a/0x30 [mdt] Feb 28 15:27:03 fir-md1-s1 kernel: [] mdt_reint_rename_or_migrate.isra.51+0x362/0x860 [mdt] Feb 28 15:27:03 fir-md1-s1 kernel: [] mdt_reint_rename+0x13/0x20 [mdt] Feb 28 15:27:03 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Feb 28 15:27:03 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Feb 28 15:27:03 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Feb 28 15:27:03 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Feb 28 15:27:03 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Feb 28 15:27:03 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Feb 28 15:27:03 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Feb 28 15:27:03 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Feb 28 15:27:03 fir-md1-s1 kernel: [] 0xffffffffffffffff Feb 28 15:27:03 fir-md1-s1 kernel: Pid: 22160, comm: mdt01_057 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Feb 28 15:27:03 fir-md1-s1 kernel: Call Trace: Feb 28 15:27:03 fir-md1-s1 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Feb 28 15:27:03 fir-md1-s1 kernel: [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] Feb 28 15:27:03 fir-md1-s1 kernel: [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] Feb 28 15:27:03 fir-md1-s1 kernel: [] osp_md_object_lock+0x162/0x2d0 [osp] Feb 28 15:27:03 fir-md1-s1 kernel: [] lod_object_lock+0xf3/0x7b0 [lod] Feb 28 15:27:03 fir-md1-s1 kernel: [] mdd_object_lock+0x3e/0xe0 [mdd] Feb 28 15:27:03 fir-md1-s1 kernel: [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] Feb 28 15:27:03 fir-md1-s1 kernel: [] mdt_remote_object_lock+0x2a/0x30 [mdt] Feb 28 15:27:03 fir-md1-s1 kernel: [] mdt_reint_rename_or_migrate.isra.51+0x362/0x860 [mdt] Feb 28 15:27:03 fir-md1-s1 kernel: [] mdt_reint_rename+0x13/0x20 [mdt] Feb 28 15:27:03 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Feb 28 15:27:03 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Feb 28 15:27:03 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Feb 28 15:27:03 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Feb 28 15:27:03 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Feb 28 15:27:03 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Feb 28 15:27:03 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Feb 28 15:27:03 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Feb 28 15:27:03 fir-md1-s1 kernel: [] 0xffffffffffffffff Feb 28 15:27:03 fir-md1-s1 kernel: Pid: 21888, comm: mdt03_013 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Feb 28 15:27:03 fir-md1-s1 kernel: Call Trace: Feb 28 15:27:03 fir-md1-s1 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Feb 28 15:27:03 fir-md1-s1 kernel: [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] Feb 28 15:27:03 fir-md1-s1 kernel: [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] Feb 28 15:27:03 fir-md1-s1 kernel: [] osp_md_object_lock+0x162/0x2d0 [osp] Feb 28 15:27:03 fir-md1-s1 kernel: [] lod_object_lock+0xf3/0x7b0 [lod] Feb 28 15:27:03 fir-md1-s1 kernel: [] mdd_object_lock+0x3e/0xe0 [mdd] Feb 28 15:27:03 fir-md1-s1 kernel: [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] Feb 28 15:27:03 fir-md1-s1 kernel: [] mdt_remote_object_lock+0x2a/0x30 [mdt] Feb 28 15:27:03 fir-md1-s1 kernel: [] mdt_reint_rename_or_migrate.isra.51+0x362/0x860 [mdt] Feb 28 15:27:03 fir-md1-s1 kernel: [] mdt_reint_rename+0x13/0x20 [mdt] Feb 28 15:27:03 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Feb 28 15:27:03 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Feb 28 15:27:03 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Feb 28 15:27:03 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Feb 28 15:27:03 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Feb 28 15:27:03 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Feb 28 15:27:03 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Feb 28 15:27:03 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Feb 28 15:27:03 fir-md1-s1 kernel: [] 0xffffffffffffffff Feb 28 15:27:03 fir-md1-s1 kernel: Pid: 21245, comm: mdt02_002 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Feb 28 15:27:03 fir-md1-s1 kernel: Call Trace: Feb 28 15:27:03 fir-md1-s1 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Feb 28 15:27:03 fir-md1-s1 kernel: [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] Feb 28 15:27:03 fir-md1-s1 kernel: [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] Feb 28 15:27:03 fir-md1-s1 kernel: [] osp_md_object_lock+0x162/0x2d0 [osp] Feb 28 15:27:03 fir-md1-s1 kernel: [] lod_object_lock+0xf3/0x7b0 [lod] Feb 28 15:27:03 fir-md1-s1 kernel: [] mdd_object_lock+0x3e/0xe0 [mdd] Feb 28 15:27:03 fir-md1-s1 kernel: [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] Feb 28 15:27:03 fir-md1-s1 kernel: [] mdt_remote_object_lock+0x2a/0x30 [mdt] Feb 28 15:27:03 fir-md1-s1 kernel: [] mdt_reint_rename_or_migrate.isra.51+0x362/0x860 [mdt] Feb 28 15:27:03 fir-md1-s1 kernel: [] mdt_reint_rename+0x13/0x20 [mdt] Feb 28 15:27:03 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Feb 28 15:27:03 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Feb 28 15:27:03 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Feb 28 15:27:03 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Feb 28 15:27:03 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Feb 28 15:27:03 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Feb 28 15:27:03 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Feb 28 15:27:03 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Feb 28 15:27:03 fir-md1-s1 kernel: [] 0xffffffffffffffff Feb 28 15:27:03 fir-md1-s1 kernel: LNet: Service thread pid 22232 was inactive for 1202.61s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Feb 28 15:27:03 fir-md1-s1 kernel: LNet: Skipped 3 previous similar messages Feb 28 15:27:18 fir-md1-s1 kernel: LustreError: 22236:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551396138, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff9cb1338bde80/0xb7044c65c80087c0 lrc: 3/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 239 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 22236 timeout: 0 lvb_type: 0 Feb 28 15:27:45 fir-md1-s1 kernel: LustreError: 22257:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551396165, 300s ago), entering recovery for fir-MDT0000_UUID@10.0.10.51@o2ib7 ns: fir-MDT0000-osp-MDT0002 lock: ffff9cc7e1bc1680/0xb7044c65c802fa2b lrc: 4/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 70 type: IBT flags: 0x1000001000000 nid: local remote: 0xb7044c65c802fa32 expref: -99 pid: 22257 timeout: 0 lvb_type: 0 Feb 28 15:27:45 fir-md1-s1 kernel: LustreError: 22257:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 14 previous similar messages Feb 28 15:27:52 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551396472.22254 Feb 28 15:28:33 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551396513.21980 Feb 28 15:28:37 fir-md1-s1 kernel: Lustre: fir-MDT0000-osp-MDT0002: Connection to fir-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete Feb 28 15:29:42 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 15:29:42 fir-md1-s1 kernel: Lustre: Skipped 60 previous similar messages Feb 28 15:30:03 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551396603.21248 Feb 28 15:30:08 fir-md1-s1 kernel: Lustre: 48713:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-150), not sending early reply#012 req@ffff9ca25c0b8f00 x1626239119756416/t0(0) o36->043c3b68-4c87-dea4-1a9b-f14ca335337c@10.9.112.12@o2ib4:717/0 lens 528/2888 e 0 to 0 dl 1551396612 ref 2 fl Interpret:/0/0 rc 0/0 Feb 28 15:30:08 fir-md1-s1 kernel: Lustre: 48713:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 15 previous similar messages Feb 28 15:30:23 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551396623.21761 Feb 28 15:32:42 fir-md1-s1 kernel: LNet: Service thread pid 21928 was inactive for 1203.24s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Feb 28 15:32:42 fir-md1-s1 kernel: LNet: Skipped 9 previous similar messages Feb 28 15:32:42 fir-md1-s1 kernel: Pid: 21928, comm: mdt02_028 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Feb 28 15:32:42 fir-md1-s1 kernel: Call Trace: Feb 28 15:32:42 fir-md1-s1 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Feb 28 15:32:42 fir-md1-s1 kernel: [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] Feb 28 15:32:42 fir-md1-s1 kernel: [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] Feb 28 15:32:42 fir-md1-s1 kernel: [] osp_md_object_lock+0x162/0x2d0 [osp] Feb 28 15:32:42 fir-md1-s1 kernel: [] lod_object_lock+0xf3/0x7b0 [lod] Feb 28 15:32:42 fir-md1-s1 kernel: [] mdd_object_lock+0x3e/0xe0 [mdd] Feb 28 15:32:42 fir-md1-s1 kernel: [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] Feb 28 15:32:42 fir-md1-s1 kernel: [] mdt_remote_object_lock+0x2a/0x30 [mdt] Feb 28 15:32:42 fir-md1-s1 kernel: [] mdt_reint_rename_or_migrate.isra.51+0x362/0x860 [mdt] Feb 28 15:32:42 fir-md1-s1 kernel: [] mdt_reint_rename+0x13/0x20 [mdt] Feb 28 15:32:42 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Feb 28 15:32:42 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Feb 28 15:32:42 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Feb 28 15:32:43 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Feb 28 15:32:43 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Feb 28 15:32:43 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Feb 28 15:32:43 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Feb 28 15:32:43 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Feb 28 15:32:43 fir-md1-s1 kernel: [] 0xffffffffffffffff Feb 28 15:32:43 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551396763.21928 Feb 28 15:33:07 fir-md1-s1 kernel: Pid: 21883, comm: mdt00_013 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Feb 28 15:33:07 fir-md1-s1 kernel: Call Trace: Feb 28 15:33:07 fir-md1-s1 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Feb 28 15:33:07 fir-md1-s1 kernel: [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] Feb 28 15:33:07 fir-md1-s1 kernel: [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] Feb 28 15:33:07 fir-md1-s1 kernel: [] osp_md_object_lock+0x162/0x2d0 [osp] Feb 28 15:33:07 fir-md1-s1 kernel: [] lod_object_lock+0xf3/0x7b0 [lod] Feb 28 15:33:07 fir-md1-s1 kernel: [] mdd_object_lock+0x3e/0xe0 [mdd] Feb 28 15:33:07 fir-md1-s1 kernel: [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] Feb 28 15:33:07 fir-md1-s1 kernel: [] mdt_remote_object_lock+0x2a/0x30 [mdt] Feb 28 15:33:07 fir-md1-s1 kernel: [] mdt_reint_rename_or_migrate.isra.51+0x362/0x860 [mdt] Feb 28 15:33:07 fir-md1-s1 kernel: [] mdt_reint_rename+0x13/0x20 [mdt] Feb 28 15:33:07 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Feb 28 15:33:07 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Feb 28 15:33:07 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Feb 28 15:33:07 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Feb 28 15:33:07 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Feb 28 15:33:07 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Feb 28 15:33:07 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Feb 28 15:33:07 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Feb 28 15:33:07 fir-md1-s1 kernel: [] 0xffffffffffffffff Feb 28 15:33:07 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551396787.21883 Feb 28 15:33:07 fir-md1-s1 kernel: Pid: 21875, comm: mdt03_010 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Feb 28 15:33:07 fir-md1-s1 kernel: Call Trace: Feb 28 15:33:07 fir-md1-s1 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Feb 28 15:33:07 fir-md1-s1 kernel: [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] Feb 28 15:33:07 fir-md1-s1 kernel: [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] Feb 28 15:33:07 fir-md1-s1 kernel: [] osp_md_object_lock+0x162/0x2d0 [osp] Feb 28 15:33:07 fir-md1-s1 kernel: [] lod_object_lock+0xf3/0x7b0 [lod] Feb 28 15:33:07 fir-md1-s1 kernel: [] mdd_object_lock+0x3e/0xe0 [mdd] Feb 28 15:33:07 fir-md1-s1 kernel: [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] Feb 28 15:33:07 fir-md1-s1 kernel: [] mdt_remote_object_lock+0x2a/0x30 [mdt] Feb 28 15:33:07 fir-md1-s1 kernel: [] mdt_reint_rename_or_migrate.isra.51+0x362/0x860 [mdt] Feb 28 15:33:07 fir-md1-s1 kernel: [] mdt_reint_rename+0x13/0x20 [mdt] Feb 28 15:33:07 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Feb 28 15:33:07 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Feb 28 15:33:07 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Feb 28 15:33:07 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Feb 28 15:33:07 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Feb 28 15:33:07 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Feb 28 15:33:07 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Feb 28 15:33:07 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Feb 28 15:33:07 fir-md1-s1 kernel: [] 0xffffffffffffffff Feb 28 15:35:52 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 15:35:52 fir-md1-s1 kernel: Lustre: Skipped 81 previous similar messages Feb 28 15:37:37 fir-md1-s1 kernel: Pid: 51457, comm: mdt03_050 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Feb 28 15:37:37 fir-md1-s1 kernel: Call Trace: Feb 28 15:37:37 fir-md1-s1 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Feb 28 15:37:37 fir-md1-s1 kernel: [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] Feb 28 15:37:37 fir-md1-s1 kernel: [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] Feb 28 15:37:37 fir-md1-s1 kernel: [] osp_md_object_lock+0x162/0x2d0 [osp] Feb 28 15:37:37 fir-md1-s1 kernel: [] lod_object_lock+0xf3/0x7b0 [lod] Feb 28 15:37:37 fir-md1-s1 kernel: [] mdd_object_lock+0x3e/0xe0 [mdd] Feb 28 15:37:37 fir-md1-s1 kernel: [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] Feb 28 15:37:37 fir-md1-s1 kernel: [] mdt_remote_object_lock+0x2a/0x30 [mdt] Feb 28 15:37:37 fir-md1-s1 kernel: [] mdt_reint_rename_or_migrate.isra.51+0x362/0x860 [mdt] Feb 28 15:37:37 fir-md1-s1 kernel: [] mdt_reint_rename+0x13/0x20 [mdt] Feb 28 15:37:37 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Feb 28 15:37:37 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Feb 28 15:37:37 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Feb 28 15:37:37 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Feb 28 15:37:37 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Feb 28 15:37:37 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Feb 28 15:37:37 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Feb 28 15:37:37 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Feb 28 15:37:37 fir-md1-s1 kernel: [] 0xffffffffffffffff Feb 28 15:37:37 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551397057.51457 Feb 28 15:37:41 fir-md1-s1 kernel: Pid: 22148, comm: mdt01_051 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Feb 28 15:37:41 fir-md1-s1 kernel: Call Trace: Feb 28 15:37:41 fir-md1-s1 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Feb 28 15:37:41 fir-md1-s1 kernel: [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] Feb 28 15:37:41 fir-md1-s1 kernel: [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] Feb 28 15:37:41 fir-md1-s1 kernel: [] osp_md_object_lock+0x162/0x2d0 [osp] Feb 28 15:37:41 fir-md1-s1 kernel: [] lod_object_lock+0xf3/0x7b0 [lod] Feb 28 15:37:41 fir-md1-s1 kernel: [] mdd_object_lock+0x3e/0xe0 [mdd] Feb 28 15:37:41 fir-md1-s1 kernel: [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] Feb 28 15:37:41 fir-md1-s1 kernel: [] mdt_remote_object_lock+0x2a/0x30 [mdt] Feb 28 15:37:41 fir-md1-s1 kernel: [] mdt_reint_rename_or_migrate.isra.51+0x362/0x860 [mdt] Feb 28 15:37:41 fir-md1-s1 kernel: [] mdt_reint_rename+0x13/0x20 [mdt] Feb 28 15:37:41 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Feb 28 15:37:41 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Feb 28 15:37:41 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Feb 28 15:37:41 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Feb 28 15:37:41 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Feb 28 15:37:42 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Feb 28 15:37:42 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Feb 28 15:37:42 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Feb 28 15:37:42 fir-md1-s1 kernel: [] 0xffffffffffffffff Feb 28 15:37:42 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551397062.22148 Feb 28 15:38:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: Received new LWP connection from 10.0.10.52@o2ib7, removing former export from same NID Feb 28 15:38:23 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Feb 28 15:38:26 fir-md1-s1 kernel: Pid: 21767, comm: mdt01_012 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Feb 28 15:38:26 fir-md1-s1 kernel: Call Trace: Feb 28 15:38:26 fir-md1-s1 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Feb 28 15:38:26 fir-md1-s1 kernel: [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] Feb 28 15:38:26 fir-md1-s1 kernel: [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] Feb 28 15:38:26 fir-md1-s1 kernel: [] osp_md_object_lock+0x162/0x2d0 [osp] Feb 28 15:38:26 fir-md1-s1 kernel: [] lod_object_lock+0xf3/0x7b0 [lod] Feb 28 15:38:26 fir-md1-s1 kernel: [] mdd_object_lock+0x3e/0xe0 [mdd] Feb 28 15:38:26 fir-md1-s1 kernel: [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] Feb 28 15:38:27 fir-md1-s1 kernel: [] mdt_remote_object_lock+0x2a/0x30 [mdt] Feb 28 15:38:27 fir-md1-s1 kernel: [] mdt_reint_rename_or_migrate.isra.51+0x362/0x860 [mdt] Feb 28 15:38:27 fir-md1-s1 kernel: [] mdt_reint_rename+0x13/0x20 [mdt] Feb 28 15:38:27 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Feb 28 15:38:27 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Feb 28 15:38:27 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Feb 28 15:38:27 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Feb 28 15:38:27 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Feb 28 15:38:27 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Feb 28 15:38:27 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Feb 28 15:38:27 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Feb 28 15:38:27 fir-md1-s1 kernel: [] 0xffffffffffffffff Feb 28 15:38:27 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551397107.21767 Feb 28 15:38:34 fir-md1-s1 kernel: Lustre: fir-MDT0000-osp-MDT0002: Connection to fir-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete Feb 28 15:38:34 fir-md1-s1 kernel: LustreError: 22280:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551396814, 300s ago), entering recovery for fir-MDT0000_UUID@10.0.10.51@o2ib7 ns: fir-MDT0000-osp-MDT0002 lock: ffff9c9645d369c0/0xb7044c65c8240684 lrc: 4/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 71 type: IBT flags: 0x1000001000000 nid: local remote: 0xb7044c65c824068b expref: -99 pid: 22280 timeout: 0 lvb_type: 0 Feb 28 15:38:34 fir-md1-s1 kernel: LustreError: 22280:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) Skipped 2 previous similar messages Feb 28 15:38:59 fir-md1-s1 kernel: Pid: 22141, comm: mdt01_047 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Feb 28 15:38:59 fir-md1-s1 kernel: Call Trace: Feb 28 15:38:59 fir-md1-s1 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Feb 28 15:38:59 fir-md1-s1 kernel: [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] Feb 28 15:38:59 fir-md1-s1 kernel: [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] Feb 28 15:38:59 fir-md1-s1 kernel: [] osp_md_object_lock+0x162/0x2d0 [osp] Feb 28 15:38:59 fir-md1-s1 kernel: [] lod_object_lock+0xf3/0x7b0 [lod] Feb 28 15:38:59 fir-md1-s1 kernel: [] mdd_object_lock+0x3e/0xe0 [mdd] Feb 28 15:38:59 fir-md1-s1 kernel: [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] Feb 28 15:38:59 fir-md1-s1 kernel: [] mdt_remote_object_lock+0x2a/0x30 [mdt] Feb 28 15:38:59 fir-md1-s1 kernel: [] mdt_reint_rename_or_migrate.isra.51+0x362/0x860 [mdt] Feb 28 15:38:59 fir-md1-s1 kernel: [] mdt_reint_rename+0x13/0x20 [mdt] Feb 28 15:38:59 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Feb 28 15:38:59 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Feb 28 15:38:59 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Feb 28 15:38:59 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Feb 28 15:38:59 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Feb 28 15:38:59 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Feb 28 15:38:59 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Feb 28 15:38:59 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Feb 28 15:38:59 fir-md1-s1 kernel: [] 0xffffffffffffffff Feb 28 15:38:59 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551397139.22141 Feb 28 15:39:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: Client ccbcd80f-5d0b-d5f6-c884-d5d4fbcfa529 (at 10.9.108.51@o2ib4) reconnecting Feb 28 15:39:44 fir-md1-s1 kernel: Lustre: Skipped 73 previous similar messages Feb 28 15:41:23 fir-md1-s1 kernel: Pid: 22182, comm: mdt01_069 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Feb 28 15:41:23 fir-md1-s1 kernel: Call Trace: Feb 28 15:41:23 fir-md1-s1 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Feb 28 15:41:23 fir-md1-s1 kernel: [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] Feb 28 15:41:23 fir-md1-s1 kernel: [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] Feb 28 15:41:23 fir-md1-s1 kernel: [] osp_md_object_lock+0x162/0x2d0 [osp] Feb 28 15:41:23 fir-md1-s1 kernel: [] lod_object_lock+0xf3/0x7b0 [lod] Feb 28 15:41:23 fir-md1-s1 kernel: [] mdd_object_lock+0x3e/0xe0 [mdd] Feb 28 15:41:23 fir-md1-s1 kernel: [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] Feb 28 15:41:23 fir-md1-s1 kernel: [] mdt_remote_object_lock+0x2a/0x30 [mdt] Feb 28 15:41:23 fir-md1-s1 kernel: [] mdt_reint_rename_or_migrate.isra.51+0x362/0x860 [mdt] Feb 28 15:41:23 fir-md1-s1 kernel: [] mdt_reint_rename+0x13/0x20 [mdt] Feb 28 15:41:23 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Feb 28 15:41:23 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Feb 28 15:41:23 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Feb 28 15:41:23 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Feb 28 15:41:23 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Feb 28 15:41:23 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Feb 28 15:41:23 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Feb 28 15:41:23 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Feb 28 15:41:23 fir-md1-s1 kernel: [] 0xffffffffffffffff Feb 28 15:41:23 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551397283.22182 Feb 28 15:42:20 fir-md1-s1 kernel: Pid: 21899, comm: mdt01_022 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Feb 28 15:42:20 fir-md1-s1 kernel: Call Trace: Feb 28 15:42:20 fir-md1-s1 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Feb 28 15:42:20 fir-md1-s1 kernel: [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] Feb 28 15:42:20 fir-md1-s1 kernel: [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] Feb 28 15:42:20 fir-md1-s1 kernel: [] osp_md_object_lock+0x162/0x2d0 [osp] Feb 28 15:42:20 fir-md1-s1 kernel: [] lod_object_lock+0xf3/0x7b0 [lod] Feb 28 15:42:20 fir-md1-s1 kernel: [] mdd_object_lock+0x3e/0xe0 [mdd] Feb 28 15:42:20 fir-md1-s1 kernel: [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] Feb 28 15:42:20 fir-md1-s1 kernel: [] mdt_remote_object_lock+0x2a/0x30 [mdt] Feb 28 15:42:20 fir-md1-s1 kernel: [] mdt_reint_rename_or_migrate.isra.51+0x362/0x860 [mdt] Feb 28 15:42:20 fir-md1-s1 kernel: [] mdt_reint_rename+0x13/0x20 [mdt] Feb 28 15:42:20 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Feb 28 15:42:20 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Feb 28 15:42:20 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Feb 28 15:42:20 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Feb 28 15:42:20 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Feb 28 15:42:20 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Feb 28 15:42:20 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Feb 28 15:42:20 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Feb 28 15:42:20 fir-md1-s1 kernel: [] 0xffffffffffffffff Feb 28 15:42:20 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551397340.21899 Feb 28 15:42:20 fir-md1-s1 kernel: Pid: 22236, comm: mdt01_090 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Feb 28 15:42:20 fir-md1-s1 kernel: Call Trace: Feb 28 15:42:20 fir-md1-s1 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Feb 28 15:42:20 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Feb 28 15:42:20 fir-md1-s1 kernel: [] mdt_reint_rename_or_migrate.isra.51+0x67a/0x860 [mdt] Feb 28 15:42:20 fir-md1-s1 kernel: [] mdt_reint_rename+0x13/0x20 [mdt] Feb 28 15:42:20 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Feb 28 15:42:20 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Feb 28 15:42:20 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Feb 28 15:42:20 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Feb 28 15:42:20 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Feb 28 15:42:20 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Feb 28 15:42:20 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Feb 28 15:42:20 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Feb 28 15:42:20 fir-md1-s1 kernel: [] 0xffffffffffffffff Feb 28 15:42:24 fir-md1-s1 kernel: LNet: Service thread pid 22220 was inactive for 1202.37s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Feb 28 15:42:24 fir-md1-s1 kernel: LNet: Skipped 7 previous similar messages Feb 28 15:42:24 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551397344.22220 Feb 28 15:42:49 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551397369.22257 Feb 28 15:43:31 fir-md1-s1 kernel: LNet: Service thread pid 22261 was inactive for 200.56s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Feb 28 15:43:31 fir-md1-s1 kernel: LNet: Skipped 9 previous similar messages Feb 28 15:43:31 fir-md1-s1 kernel: Pid: 22261, comm: mdt02_074 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Feb 28 15:43:31 fir-md1-s1 kernel: Call Trace: Feb 28 15:43:31 fir-md1-s1 kernel: [] ldlm_completion_ast+0x5b1/0x920 [ptlrpc] Feb 28 15:43:31 fir-md1-s1 kernel: [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] Feb 28 15:43:31 fir-md1-s1 kernel: [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] Feb 28 15:43:31 fir-md1-s1 kernel: [] osp_md_object_lock+0x162/0x2d0 [osp] Feb 28 15:43:31 fir-md1-s1 kernel: [] lod_object_lock+0xf3/0x7b0 [lod] Feb 28 15:43:31 fir-md1-s1 kernel: [] mdd_object_lock+0x3e/0xe0 [mdd] Feb 28 15:43:31 fir-md1-s1 kernel: [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] Feb 28 15:43:31 fir-md1-s1 kernel: [] mdt_remote_object_lock+0x2a/0x30 [mdt] Feb 28 15:43:31 fir-md1-s1 kernel: [] mdt_reint_rename_or_migrate.isra.51+0x362/0x860 [mdt] Feb 28 15:43:31 fir-md1-s1 kernel: [] mdt_reint_rename+0x13/0x20 [mdt] Feb 28 15:43:31 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Feb 28 15:43:31 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Feb 28 15:43:31 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Feb 28 15:43:31 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Feb 28 15:43:31 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Feb 28 15:43:31 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Feb 28 15:43:31 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Feb 28 15:43:31 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Feb 28 15:43:31 fir-md1-s1 kernel: [] 0xffffffffffffffff Feb 28 15:43:31 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551397411.22261 Feb 28 15:43:38 fir-md1-s1 kernel: Pid: 50316, comm: mdt02_090 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Feb 28 15:43:38 fir-md1-s1 kernel: Call Trace: Feb 28 15:43:38 fir-md1-s1 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Feb 28 15:43:38 fir-md1-s1 kernel: [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] Feb 28 15:43:38 fir-md1-s1 kernel: [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] Feb 28 15:43:38 fir-md1-s1 kernel: [] osp_md_object_lock+0x162/0x2d0 [osp] Feb 28 15:43:38 fir-md1-s1 kernel: [] lod_object_lock+0xf3/0x7b0 [lod] Feb 28 15:43:38 fir-md1-s1 kernel: [] mdd_object_lock+0x3e/0xe0 [mdd] Feb 28 15:43:38 fir-md1-s1 kernel: [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] Feb 28 15:43:38 fir-md1-s1 kernel: [] mdt_remote_object_lock+0x2a/0x30 [mdt] Feb 28 15:43:38 fir-md1-s1 kernel: [] mdt_reint_rename_or_migrate.isra.51+0x362/0x860 [mdt] Feb 28 15:43:38 fir-md1-s1 kernel: [] mdt_reint_rename+0x13/0x20 [mdt] Feb 28 15:43:38 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Feb 28 15:43:38 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Feb 28 15:43:38 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Feb 28 15:43:38 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Feb 28 15:43:38 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Feb 28 15:43:38 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Feb 28 15:43:38 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Feb 28 15:43:38 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Feb 28 15:43:38 fir-md1-s1 kernel: [] 0xffffffffffffffff Feb 28 15:43:38 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551397418.50316 Feb 28 15:43:58 fir-md1-s1 kernel: Pid: 21935, comm: mdt00_021 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Feb 28 15:43:58 fir-md1-s1 kernel: Call Trace: Feb 28 15:43:58 fir-md1-s1 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Feb 28 15:43:58 fir-md1-s1 kernel: [] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc] Feb 28 15:43:58 fir-md1-s1 kernel: [] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc] Feb 28 15:43:58 fir-md1-s1 kernel: [] osp_md_object_lock+0x162/0x2d0 [osp] Feb 28 15:43:58 fir-md1-s1 kernel: [] lod_object_lock+0xf3/0x7b0 [lod] Feb 28 15:43:58 fir-md1-s1 kernel: [] mdd_object_lock+0x3e/0xe0 [mdd] Feb 28 15:43:58 fir-md1-s1 kernel: [] mdt_remote_object_lock_try+0x1e1/0x750 [mdt] Feb 28 15:43:58 fir-md1-s1 kernel: [] mdt_remote_object_lock+0x2a/0x30 [mdt] Feb 28 15:43:58 fir-md1-s1 kernel: [] mdt_reint_rename_or_migrate.isra.51+0x362/0x860 [mdt] Feb 28 15:43:58 fir-md1-s1 kernel: [] mdt_reint_rename+0x13/0x20 [mdt] Feb 28 15:43:58 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Feb 28 15:43:58 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Feb 28 15:43:58 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Feb 28 15:43:58 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Feb 28 15:43:58 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Feb 28 15:43:58 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Feb 28 15:43:58 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Feb 28 15:43:58 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Feb 28 15:43:58 fir-md1-s1 kernel: [] 0xffffffffffffffff Feb 28 15:43:58 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551397438.21935 Feb 28 15:44:48 fir-md1-s1 kernel: LustreError: 21242:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1551397188, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffff9cac936b8fc0/0xb7044c65c837f16a lrc: 3/0,1 mode: --/EX res: [0x200000004:0x1:0x0].0x0 bits 0x2/0x0 rrc: 251 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 21242 timeout: 0 lvb_type: 0 Feb 28 15:45:10 fir-md1-s1 kernel: Lustre: fir-MDT0000-osp-MDT0002: Connection to fir-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete Feb 28 15:45:29 fir-md1-s1 kernel: Pid: 22146, comm: mdt02_043 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Feb 28 15:45:29 fir-md1-s1 kernel: Call Trace: Feb 28 15:45:29 fir-md1-s1 kernel: [] ldlm_completion_ast+0x5b1/0x920 [ptlrpc] Feb 28 15:45:29 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Feb 28 15:45:29 fir-md1-s1 kernel: [] mdt_reint_rename_or_migrate.isra.51+0x67a/0x860 [mdt] Feb 28 15:45:29 fir-md1-s1 kernel: [] mdt_reint_rename+0x13/0x20 [mdt] Feb 28 15:45:29 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Feb 28 15:45:29 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Feb 28 15:45:29 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Feb 28 15:45:29 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Feb 28 15:45:29 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Feb 28 15:45:29 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Feb 28 15:45:29 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Feb 28 15:45:29 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Feb 28 15:45:29 fir-md1-s1 kernel: [] 0xffffffffffffffff Feb 28 15:45:29 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551397529.22146 Feb 28 15:45:45 fir-md1-s1 kernel: Pid: 50351, comm: mdt02_091 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Feb 28 15:45:45 fir-md1-s1 kernel: Call Trace: Feb 28 15:45:45 fir-md1-s1 kernel: [] ldlm_completion_ast+0x5b1/0x920 [ptlrpc] Feb 28 15:45:45 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Feb 28 15:45:45 fir-md1-s1 kernel: [] mdt_reint_rename_or_migrate.isra.51+0x67a/0x860 [mdt] Feb 28 15:45:45 fir-md1-s1 kernel: [] mdt_reint_rename+0x13/0x20 [mdt] Feb 28 15:45:45 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Feb 28 15:45:45 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Feb 28 15:45:45 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Feb 28 15:45:45 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Feb 28 15:45:45 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Feb 28 15:45:45 fir-md1-s1 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Feb 28 15:45:45 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Feb 28 15:45:45 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Feb 28 15:45:45 fir-md1-s1 kernel: [] 0xffffffffffffffff Feb 28 15:45:45 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1551397545.50351 Feb 28 15:45:53 fir-md1-s1 kernel: Lustre: fir-MDT0002: Connection restored to b95d5169-3bf7-053b-2221-3f09acd35903 (at 10.9.108.38@o2ib4) Feb 28 15:45:53 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Feb 28 15:46:04 fir-md1-s1 kernel: Lustre: 22178:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-150), not sending early reply#012 req@ffff9c950c746900 x1626139274416880/t0(0) o36->a8a05f8e-db8d-59cc-6120-95d964b973fe@10.8.8.34@o2ib6:164/0 lens 528/2888 e 0 to 0 dl 1551397569 ref 2 fl Interpret:/0/0 rc 0/0 Feb 28 15:46:04 fir-md1-s1 kernel: Lustre: 22178:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 8 previous similar messages Feb 28 15:47:14 fir-md1-s1 kernel: Lustre: Failing over fir-MDT0000 Feb 28 15:47:14 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 15:47:14 fir-md1-s1 kernel: LNet: Service thread pid 21240 completed after 4224.32s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Feb 28 15:47:14 fir-md1-s1 kernel: LustreError: 11-0: fir-MDT0000-osp-MDT0002: operation ldlm_cancel to node 0@lo failed: rc = -19 Feb 28 15:47:14 fir-md1-s1 kernel: Lustre: 21885:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (37265:164558s); client may timeout. req@ffff9c9b12636300 x1626126239498464/t56194458294(0) o36->06a1a62f-4958-2183-fc0f-5921522e8fe4@10.9.101.47@o2ib4:591/0 lens 544/424 e 24 to 0 dl 1551233076 ref 1 fl Complete:/0/0 rc -19/-19 Feb 28 15:47:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: Not available for connect from 10.9.108.10@o2ib4 (stopping) Feb 28 15:47:14 fir-md1-s1 kernel: Lustre: Skipped 211 previous similar messages Feb 28 15:47:14 fir-md1-s1 kernel: LNet: Skipped 15 previous similar messages Feb 28 15:47:14 fir-md1-s1 kernel: LustreError: 21983:0:(mdt_reint.c:2603:mdt_reint_rename_or_migrate()) fir-MDT0002: can't lock FS for rename: rc = -5 Feb 28 15:47:14 fir-md1-s1 kernel: LustreError: 105929:0:(ldlm_resource.c:1146:ldlm_resource_complain()) fir-MDT0000-osp-MDT0002: namespace resource [0x200000004:0x1:0x0].0x0 (ffff9cb40c2a6840) refcount nonzero (72) after lock cleanup; forcing cleanup. Feb 28 15:47:14 fir-md1-s1 kernel: LustreError: 21983:0:(mdt_reint.c:2603:mdt_reint_rename_or_migrate()) Skipped 66 previous similar messages Feb 28 15:47:14 fir-md1-s1 kernel: LustreError: 22251:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff9ca006b86600 x1625960654498608/t0(0) o104->fir-MDT0000@10.9.108.45@o2ib4:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Feb 28 15:47:14 fir-md1-s1 kernel: LustreError: 51470:0:(osp_object.c:594:osp_attr_get()) fir-MDT0001-osp-MDT0000:osp_attr_get update error [0x240000402:0x5:0x0]: rc = -5 Feb 28 15:47:14 fir-md1-s1 kernel: LustreError: 22251:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 11 previous similar messages Feb 28 15:47:14 fir-md1-s1 kernel: LustreError: 129575:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.0.82@o2ib6 arrived at 1551397634 with bad export cookie 13187749600596214698 Feb 28 15:47:14 fir-md1-s1 kernel: LustreError: 129575:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 1 previous similar message Feb 28 15:47:15 fir-md1-s1 kernel: LustreError: 7584:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.9.107.52@o2ib4 arrived at 1551397635 with bad export cookie 13187749596893464745 Feb 28 15:47:15 fir-md1-s1 kernel: LustreError: 129084:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff9cc907637500 x1625960654499312/t0(0) o41->fir-MDT0001-osp-MDT0000@10.0.10.52@o2ib7:24/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Feb 28 15:47:15 fir-md1-s1 kernel: LustreError: 105930:0:(ldlm_resource.c:1146:ldlm_resource_complain()) mdt-fir-MDT0000_UUID: namespace resource [0x200006a0b:0x29e:0x0].0x0 (ffff9cb43969e0c0) refcount nonzero (1) after lock cleanup; forcing cleanup. Feb 28 15:47:15 fir-md1-s1 kernel: LustreError: 105930:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message Feb 28 15:47:15 fir-md1-s1 kernel: LustreError: 11-0: fir-MDT0003-osp-MDT0002: operation mds_statfs to node 10.0.10.52@o2ib7 failed: rc = -107 Feb 28 15:47:15 fir-md1-s1 kernel: LustreError: 11-0: fir-MDT0001-osp-MDT0002: operation mds_statfs to node 10.0.10.52@o2ib7 failed: rc = -107 Feb 28 15:47:15 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Feb 28 15:47:15 fir-md1-s1 kernel: LustreError: Skipped 2 previous similar messages Feb 28 15:47:15 fir-md1-s1 kernel: LustreError: 129084:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 10 previous similar messages Feb 28 15:47:16 fir-md1-s1 kernel: LustreError: 105929:0:(osp_object.c:594:osp_attr_get()) fir-MDT0000-osp-MDT0002:osp_attr_get update error [0x20000000a:0x0:0x0]: rc = -5 Feb 28 15:47:16 fir-md1-s1 kernel: LustreError: 105929:0:(llog_cat.c:424:llog_cat_close()) fir-MDT0000-osp-MDT0002: failure destroying log during cleanup: rc = -5 Feb 28 15:47:16 fir-md1-s1 kernel: LustreError: 105929:0:(llog_cat.c:424:llog_cat_close()) Skipped 3 previous similar messages Feb 28 15:47:16 fir-md1-s1 kernel: LustreError: 129575:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.8.17.4@o2ib6 arrived at 1551397636 with bad export cookie 13187749604539032059 Feb 28 15:47:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: Not available for connect from 10.8.6.15@o2ib6 (stopping) Feb 28 15:47:18 fir-md1-s1 kernel: Lustre: Skipped 524 previous similar messages Feb 28 15:47:19 fir-md1-s1 kernel: LustreError: 7584:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.9.107.40@o2ib4 arrived at 1551397639 with bad export cookie 13187749592613520230 Feb 28 15:47:19 fir-md1-s1 kernel: LustreError: 7584:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 4 previous similar messages Feb 28 15:47:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.20.8@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Feb 28 15:47:21 fir-md1-s1 kernel: LustreError: Skipped 2870 previous similar messages Feb 28 15:47:21 fir-md1-s1 kernel: Lustre: server umount fir-MDT0002 complete Feb 28 15:47:23 fir-md1-s1 kernel: LustreError: 129577:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.9.115.8@o2ib4 arrived at 1551397643 with bad export cookie 13187749593058504266 Feb 28 15:47:23 fir-md1-s1 kernel: LustreError: 129577:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 8 previous similar messages Feb 28 15:47:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: Not available for connect from 10.9.105.20@o2ib4 (stopping) Feb 28 15:47:26 fir-md1-s1 kernel: Lustre: Skipped 694 previous similar messages Feb 28 15:47:33 fir-md1-s1 kernel: LustreError: 129575:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) ldlm_cancel from 10.9.101.12@o2ib4 arrived at 1551397653 with bad export cookie 13187749591102004455 Feb 28 15:47:33 fir-md1-s1 kernel: LustreError: 129575:0:(ldlm_lockd.c:2322:ldlm_cancel_handler()) Skipped 8 previous similar messages Feb 28 15:47:37 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.9.106.49@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Feb 28 15:47:37 fir-md1-s1 kernel: LustreError: Skipped 886 previous similar messages Feb 28 15:47:42 fir-md1-s1 kernel: Lustre: fir-MDT0000: Not available for connect from 10.9.114.15@o2ib4 (stopping) Feb 28 15:47:42 fir-md1-s1 kernel: Lustre: Skipped 684 previous similar messages Feb 28 15:47:52 fir-md1-s1 kernel: Lustre: server umount fir-MDT0000 complete Feb 28 15:48:30 fir-md1-s1 kernel: LDISKFS-fs (dm-1): file extents enabled, maximum tree depth=5 Feb 28 15:48:30 fir-md1-s1 kernel: LDISKFS-fs (dm-2): file extents enabled, maximum tree depth=5 Feb 28 15:48:30 fir-md1-s1 kernel: LDISKFS-fs (dm-1): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,acl,no_mbcache,nodelalloc Feb 28 15:48:30 fir-md1-s1 kernel: LDISKFS-fs (dm-2): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,acl,no_mbcache,nodelalloc Feb 28 15:48:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.9.103.44@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Feb 28 15:48:31 fir-md1-s1 kernel: LustreError: Skipped 1066 previous similar messages Feb 28 15:48:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Not available for connect from 10.9.101.68@o2ib4 (not set up) Feb 28 15:48:31 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Feb 28 15:48:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900 Feb 28 15:48:31 fir-md1-s1 kernel: Lustre: fir-MDD0002: changelog on Feb 28 15:48:31 fir-md1-s1 kernel: Lustre: fir-MDT0002: in recovery but waiting for the first client to connect Feb 28 15:48:32 fir-md1-s1 kernel: Lustre: fir-MDT0002: Will be in recovery for at least 2:30, or until 1416 clients reconnect Feb 28 15:48:32 fir-md1-s1 kernel: LustreError: 11-0: fir-MDT0002-osp-MDT0000: operation mds_connect to node 0@lo failed: rc = -114 Feb 28 15:48:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900 Feb 28 15:48:32 fir-md1-s1 kernel: Lustre: fir-MDD0000: changelog on Feb 28 15:48:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: in recovery but waiting for the first client to connect Feb 28 15:48:37 fir-md1-s1 kernel: LustreError: 106534:0:(mdt_open.c:1364:mdt_reint_open()) @@@ OPEN & CREAT not in open replay/by_fid. req@ffff9cb5d6798f00 x1626125528013312/t0(47609506207) o101->c62f588d-16a2-e6d3-4c97-6fab11264ff8@10.8.27.1@o2ib6:312/0 lens 1784/3288 e 0 to 0 dl 1551398472 ref 1 fl Interpret:/4/0 rc 0/0 Feb 28 15:49:28 fir-md1-s1 kernel: LNetError: 129021:0:(o2iblnd_cb.c:3324:kiblnd_check_txs_locked()) Timed out tx: tx_queue, 1 seconds Feb 28 15:49:28 fir-md1-s1 kernel: LNetError: 129021:0:(o2iblnd_cb.c:3399:kiblnd_check_conns()) Timed out RDMA with 10.0.10.52@o2ib7 (7): c: 0, oc: 0, rc: 8 Feb 28 15:49:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.8.4.26@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Feb 28 15:49:35 fir-md1-s1 kernel: LustreError: Skipped 1720 previous similar messages Feb 28 15:50:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Recovery already passed deadline 7:44. If you do not want to wait more, please abort the recovery by force. Feb 28 15:50:25 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Feb 28 15:50:25 fir-md1-s1 kernel: Lustre: fir-MDT0002: Recovery over after 1:53, of 1416 clients 1416 recovered and 0 were evicted. Feb 28 15:56:22 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client ef079e70-3cb8-ddc6-a4ca-72d0e87a1b53 (at 10.8.3.33@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cd0e3214800, cur 1551398182 expire 1551398032 last 1551397955 Feb 28 15:56:22 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 28 16:39:22 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ce807089-fd76-9353-e0a7-32c622e380eb (at 10.8.20.15@o2ib6) Feb 28 16:39:22 fir-md1-s1 kernel: Lustre: Skipped 3003 previous similar messages Feb 28 16:39:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client d63a820c-5b1c-a1eb-ab43-eec5d17a84cf (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cd306b07400, cur 1551400785 expire 1551400635 last 1551400558 Feb 28 16:39:45 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 28 16:49:54 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 051bc37a-e3f2-75cf-ab37-f2275a37417f (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9caefc7ee400, cur 1551401394 expire 1551401244 last 1551401167 Feb 28 16:49:54 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 28 16:49:59 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ce807089-fd76-9353-e0a7-32c622e380eb (at 10.8.20.15@o2ib6) Feb 28 16:49:59 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 28 17:01:11 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ce807089-fd76-9353-e0a7-32c622e380eb (at 10.8.20.15@o2ib6) Feb 28 17:01:11 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 28 17:01:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 17a5636d-715a-9d16-f13e-89329c5d4dd1 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca9687e6400, cur 1551402106 expire 1551401956 last 1551401879 Feb 28 17:01:46 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 28 17:27:51 fir-md1-s1 kernel: Lustre: 107058:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551403663/real 1551403663] req@ffff9cab81af4200 x1625960684342336/t0(0) o104->fir-MDT0002@10.8.20.15@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1551403670 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Feb 28 17:27:51 fir-md1-s1 kernel: Lustre: 107058:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 8 previous similar messages Feb 28 17:28:05 fir-md1-s1 kernel: Lustre: 107058:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551403678/real 1551403678] req@ffff9cab81af4200 x1625960684342336/t0(0) o104->fir-MDT0002@10.8.20.15@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1551403685 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 28 17:28:05 fir-md1-s1 kernel: Lustre: 107058:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Feb 28 17:28:26 fir-md1-s1 kernel: Lustre: 107058:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551403699/real 1551403699] req@ffff9cab81af4200 x1625960684342336/t0(0) o104->fir-MDT0002@10.8.20.15@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1551403706 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 28 17:28:26 fir-md1-s1 kernel: Lustre: 107058:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Feb 28 17:28:51 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 735f64a7-57a4-0952-f522-322b28de1841 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cbfd672bc00, cur 1551403731 expire 1551403581 last 1551403504 Feb 28 17:28:51 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 28 17:29:31 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ce807089-fd76-9353-e0a7-32c622e380eb (at 10.8.20.15@o2ib6) Feb 28 17:29:31 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 28 17:45:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 874c608b-d57a-70f2-6a7e-5e9af693e5b6 (at 10.8.3.11@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ccfe5f03000, cur 1551404708 expire 1551404558 last 1551404481 Feb 28 17:45:08 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 28 17:46:20 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.3.11@o2ib6) Feb 28 17:46:20 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 28 17:51:39 fir-md1-s1 kernel: Lustre: 106970:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551405092/real 1551405092] req@ffff9cb8e7326300 x1625960688354096/t0(0) o104->fir-MDT0002@10.8.3.11@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1551405099 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Feb 28 17:51:39 fir-md1-s1 kernel: Lustre: 106970:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 3 previous similar messages Feb 28 17:51:46 fir-md1-s1 kernel: Lustre: 106970:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551405099/real 1551405099] req@ffff9cb8e7326300 x1625960688354096/t0(0) o104->fir-MDT0002@10.8.3.11@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1551405106 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 28 17:52:00 fir-md1-s1 kernel: Lustre: 106970:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551405113/real 1551405113] req@ffff9cb8e7326300 x1625960688354096/t0(0) o104->fir-MDT0002@10.8.3.11@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1551405120 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 28 17:52:00 fir-md1-s1 kernel: Lustre: 106970:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Feb 28 17:52:21 fir-md1-s1 kernel: Lustre: 106970:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551405134/real 1551405134] req@ffff9cb8e7326300 x1625960688354096/t0(0) o104->fir-MDT0002@10.8.3.11@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1551405141 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 28 17:52:21 fir-md1-s1 kernel: Lustre: 106970:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Feb 28 17:53:03 fir-md1-s1 kernel: Lustre: 106970:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551405176/real 1551405176] req@ffff9cb8e7326300 x1625960688354096/t0(0) o104->fir-MDT0002@10.8.3.11@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1551405183 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 28 17:53:03 fir-md1-s1 kernel: Lustre: 106970:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages Feb 28 17:54:06 fir-md1-s1 kernel: LustreError: 106970:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.3.11@o2ib6) failed to reply to blocking AST (req@ffff9cb8e7326300 x1625960688354096 status 0 rc -110), evict it ns: mdt-fir-MDT0002_UUID lock: ffff9cbfd4e23840/0xb7044c65d094cc58 lrc: 4/0,0 mode: PR/PR res: [0x2c0001762:0x1a32d:0x0].0x0 bits 0x13/0x0 rrc: 32 type: IBT flags: 0x60200400000020 nid: 10.8.3.11@o2ib6 remote: 0x80cfb2393cc1a413 expref: 101 pid: 106893 timeout: 1390368 lvb_type: 0 Feb 28 17:54:06 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0002: A client on nid 10.8.3.11@o2ib6 was evicted due to a lock blocking callback time out: rc -110 Feb 28 17:54:06 fir-md1-s1 kernel: LustreError: 129189:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 154s: evicting client at 10.8.3.11@o2ib6 ns: mdt-fir-MDT0002_UUID lock: ffff9cbfd4e23840/0xb7044c65d094cc58 lrc: 3/0,0 mode: PR/PR res: [0x2c0001762:0x1a32d:0x0].0x0 bits 0x13/0x0 rrc: 32 type: IBT flags: 0x60200400000020 nid: 10.8.3.11@o2ib6 remote: 0x80cfb2393cc1a413 expref: 102 pid: 106893 timeout: 0 lvb_type: 0 Feb 28 17:54:46 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client ea5e9f07-25ae-beb4-d870-65c23fd540f1 (at 10.8.3.11@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cbe092c2c00, cur 1551405286 expire 1551405136 last 1551405059 Feb 28 17:54:46 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 28 17:54:55 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 8211e459-b683-01cd-d93d-ad1086360141 (at 10.8.20.15@o2ib6) in 166 seconds. I think it's dead, and I am evicting it. exp ffff9cabaf24cc00, cur 1551405295 expire 1551405145 last 1551405129 Feb 28 17:54:55 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Feb 28 17:56:11 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ce807089-fd76-9353-e0a7-32c622e380eb (at 10.8.20.15@o2ib6) Feb 28 17:56:11 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 28 17:56:45 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.3.11@o2ib6) Feb 28 17:56:45 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 28 18:01:44 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 219eae0c-281f-10af-ca3f-1a583864f51b (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cc9b56b8800, cur 1551405704 expire 1551405554 last 1551405477 Feb 28 18:04:06 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 74a03b4f-d0c9-a84a-2aa4-fc50ef9db767 (at 10.8.11.9@o2ib6) Feb 28 18:04:06 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 28 18:05:18 fir-md1-s1 kernel: Lustre: MGS: Connection restored to fe28a108-baf3-cd9c-d6ff-86b12f332cdd (at 10.8.11.10@o2ib6) Feb 28 18:05:18 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 28 18:10:57 fir-md1-s1 kernel: EXT4-fs (sdk2): error count since last fsck: 5 Feb 28 18:10:57 fir-md1-s1 kernel: EXT4-fs (sdk2): initial error at time 1550022155: ext4_mb_generate_buddy:757 Feb 28 18:10:57 fir-md1-s1 kernel: EXT4-fs (sdk2): last error at time 1550448029: ext4_mb_generate_buddy:757 Feb 28 18:17:01 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 55d5b00f-d291-c9bf-b408-e69a398fc734 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cb4f5c44800, cur 1551406621 expire 1551406471 last 1551406394 Feb 28 18:17:01 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Feb 28 18:17:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 55d5b00f-d291-c9bf-b408-e69a398fc734 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cbdaef67000, cur 1551406628 expire 1551406478 last 1551406401 Feb 28 18:18:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 260e6d40-cd5d-3426-6f87-ac9149a2548c (at 10.8.3.11@o2ib6) in 211 seconds. I think it's dead, and I am evicting it. exp ffff9c9a6fb77800, cur 1551406697 expire 1551406547 last 1551406486 Feb 28 18:18:17 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 18:18:24 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client dc48ebf7-abf3-e897-26e8-06b758e96c34 (at 10.8.3.11@o2ib6) in 218 seconds. I think it's dead, and I am evicting it. exp ffff9c9855858400, cur 1551406704 expire 1551406554 last 1551406486 Feb 28 18:18:37 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ce807089-fd76-9353-e0a7-32c622e380eb (at 10.8.20.15@o2ib6) Feb 28 18:18:37 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 28 18:19:42 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.3.11@o2ib6) Feb 28 18:19:42 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 28 18:28:08 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 5903b369-0b86-edb5-e071-aa9e61da6bb0 (at 10.8.3.11@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9c9d62c7a400, cur 1551407288 expire 1551407138 last 1551407061 Feb 28 18:28:08 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 18:30:07 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.3.11@o2ib6) Feb 28 18:30:07 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 28 18:37:43 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 52aad3c7-c216-4118-15d4-ad02110417c1 (at 10.8.3.11@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cc4c87b5800, cur 1551407863 expire 1551407713 last 1551407636 Feb 28 18:37:43 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 28 18:38:44 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.3.11@o2ib6) Feb 28 18:38:44 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 28 18:50:46 fir-md1-s1 kernel: Lustre: 106796:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551408639/real 1551408639] req@ffff9cab7bac7b00 x1625960698585120/t0(0) o104->fir-MDT0000@10.8.20.15@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1551408646 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Feb 28 18:50:46 fir-md1-s1 kernel: Lustre: 106796:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 9 previous similar messages Feb 28 18:51:00 fir-md1-s1 kernel: Lustre: 106796:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551408653/real 1551408653] req@ffff9cab7bac7b00 x1625960698585120/t0(0) o104->fir-MDT0000@10.8.20.15@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1551408660 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 28 18:51:00 fir-md1-s1 kernel: Lustre: 106796:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Feb 28 18:51:21 fir-md1-s1 kernel: Lustre: 106796:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1551408674/real 1551408674] req@ffff9cab7bac7b00 x1625960698585120/t0(0) o104->fir-MDT0000@10.8.20.15@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1551408681 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Feb 28 18:51:21 fir-md1-s1 kernel: Lustre: 106796:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Feb 28 18:51:55 fir-md1-s1 kernel: LustreError: 106796:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.20.15@o2ib6) returned error from blocking AST (req@ffff9cab7bac7b00 x1625960698585120 status -107 rc -107), evict it ns: mdt-fir-MDT0000_UUID lock: ffff9ca671a7b180/0xb7044c65d395fcea lrc: 4/0,0 mode: PR/PR res: [0x2000069f2:0xf81e:0x0].0x0 bits 0x13/0x0 rrc: 5 type: IBT flags: 0x60200400000020 nid: 10.8.20.15@o2ib6 remote: 0x76ac0d7229a63d0d expref: 19 pid: 106675 timeout: 1393838 lvb_type: 0 Feb 28 18:51:55 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ce807089-fd76-9353-e0a7-32c622e380eb (at 10.8.20.15@o2ib6) Feb 28 18:51:55 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 28 18:51:55 fir-md1-s1 kernel: LustreError: 138-a: fir-MDT0000: A client on nid 10.8.20.15@o2ib6 was evicted due to a lock blocking callback time out: rc -107 Feb 28 18:51:55 fir-md1-s1 kernel: LustreError: 129189:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 76s: evicting client at 10.8.20.15@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffff9ca671a7b180/0xb7044c65d395fcea lrc: 3/0,0 mode: PR/PR res: [0x2000069f2:0xf81e:0x0].0x0 bits 0x13/0x0 rrc: 5 type: IBT flags: 0x60200400000020 nid: 10.8.20.15@o2ib6 remote: 0x76ac0d7229a63d0d expref: 20 pid: 106675 timeout: 0 lvb_type: 0 Feb 28 18:52:33 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client 3942d1b2-563b-e42f-e0de-7fcc8cc012d8 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9ca78db9b400, cur 1551408753 expire 1551408603 last 1551408526 Feb 28 18:52:33 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 28 19:11:14 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client f678c693-9d20-da73-5e2a-d31d400bf797 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cbdb4f4f000, cur 1551409874 expire 1551409724 last 1551409647 Feb 28 19:11:14 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Feb 28 19:12:32 fir-md1-s1 kernel: Lustre: MGS: Connection restored to ce807089-fd76-9353-e0a7-32c622e380eb (at 10.8.20.15@o2ib6) Feb 28 19:12:32 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Feb 28 19:17:16 fir-md1-s1 kernel: Lustre: fir-MDT0002: haven't heard from client f0ec523a-6a30-fc32-d953-11222af5eaf8 (at 10.8.3.11@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9cbda8b5f000, cur 1551410236 expire 1551410086 last 1551410009 Feb 28 19:17:16 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages