Mar 18 18:55:04 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 6899a6ff-b1c1-d6d5-9839-98e0b5531d18 (at 10.8.15.10@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8ed737b0f400, cur 1552960504 expire 1552960354 last 1552960277 Mar 18 18:55:04 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 18 22:50:10 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 506c2262-4cca-3894-72db-d5ccf2174a66 (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8ee02a557800, cur 1552974610 expire 1552974460 last 1552974383 Mar 18 22:50:10 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 18 22:51:46 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to f53a043d-d024-dbbb-a021-95a52c6fc449 (at 10.8.11.9@o2ib6) Mar 18 22:51:46 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 18 22:59:22 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 2f0937f7-8221-72c5-644a-212dfc93799d (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8ecbe4374800, cur 1552975162 expire 1552975012 last 1552974935 Mar 18 22:59:22 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 18 23:00:47 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to f53a043d-d024-dbbb-a021-95a52c6fc449 (at 10.8.11.9@o2ib6) Mar 18 23:00:47 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 18 23:42:25 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 91ae8bca-d651-0daa-5d63-b74e222862a5 (at 10.8.28.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8ef0bfc96400, cur 1552977745 expire 1552977595 last 1552977518 Mar 18 23:42:25 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 19 00:15:35 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 91ae8bca-d651-0daa-5d63-b74e222862a5 (at 10.8.28.4@o2ib6) Mar 19 00:15:35 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 19 01:51:59 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 4af51c32-5433-8946-8fc2-4f1ad9c274b4 (at 10.8.14.8@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8ed4ba3f7800, cur 1552985519 expire 1552985369 last 1552985292 Mar 19 01:51:59 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 19 02:42:39 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client a7e9d99b-3529-3851-2fc1-3e109f24d099 (at 10.8.10.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8ef04aab6c00, cur 1552988559 expire 1552988409 last 1552988332 Mar 19 02:42:39 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 19 02:44:49 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.10.9@o2ib6) Mar 19 02:44:49 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 19 04:42:45 fir-md1-s2 kernel: LustreError: 91778:0:(mdt_io.c:442:mdt_preprw_write()) fir-MDT0001: BRW to missing obj [0x24000ecc6:0x14dd3:0x0] Mar 19 06:37:35 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client d251a63a-cf5f-7b49-f078-0f10ea97d229 (at 10.8.11.10@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8ee0324cc000, cur 1553002655 expire 1553002505 last 1553002428 Mar 19 06:37:35 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 19 06:40:22 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.11.10@o2ib6) Mar 19 06:40:22 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 19 06:45:52 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 3923848c-a52c-adc9-c644-e9ae293e136d (at 10.8.11.10@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8ed7cee7e800, cur 1553003152 expire 1553003002 last 1553002925 Mar 19 06:45:52 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 19 07:01:15 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 016bfca4-4707-5e56-8faf-ae2a629ca193 (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8ecb38769400, cur 1553004075 expire 1553003925 last 1553003848 Mar 19 07:01:15 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 19 07:06:15 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to f53a043d-d024-dbbb-a021-95a52c6fc449 (at 10.8.11.9@o2ib6) Mar 19 07:06:15 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 19 07:37:52 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.11.10@o2ib6) Mar 19 07:37:52 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 19 07:42:41 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client b19f31e9-0c9f-7e7b-0ceb-f7adedb279ca (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8ecf808b0c00, cur 1553006561 expire 1553006411 last 1553006334 Mar 19 07:42:41 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 19 07:43:43 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to f53a043d-d024-dbbb-a021-95a52c6fc449 (at 10.8.11.9@o2ib6) Mar 19 07:43:43 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 19 09:15:58 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client f8edd23b-4bf8-0fec-9fdf-f7199ad12e29 (at 10.8.14.5@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8eea55b92400, cur 1553012158 expire 1553012008 last 1553011931 Mar 19 09:15:58 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 19 09:26:51 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client c6090610-4cd1-706d-5e96-56b48e513e07 (at 10.8.11.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8eefcaecc400, cur 1553012811 expire 1553012661 last 1553012584 Mar 19 09:26:51 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 19 09:28:07 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 3f756571-fb33-26bd-8c78-0a6a3d59beda (at 10.8.26.12@o2ib6) in 206 seconds. I think it's dead, and I am evicting it. exp ffff8ec79e24b400, cur 1553012887 expire 1553012737 last 1553012681 Mar 19 09:28:07 fir-md1-s2 kernel: Lustre: Skipped 13 previous similar messages Mar 19 09:28:34 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to e0239d39-e008-34fc-e33b-07dcd3284bae (at 10.8.1.27@o2ib6) Mar 19 09:28:34 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 19 09:28:47 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 3f756571-fb33-26bd-8c78-0a6a3d59beda (at 10.8.26.12@o2ib6) Mar 19 09:28:47 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 19 09:29:04 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.11.24@o2ib6) Mar 19 09:29:04 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 19 09:29:19 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.16.8@o2ib6) Mar 19 09:29:19 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 19 09:29:31 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to c8cce669-fcac-6864-9507-a878eae579b3 (at 10.8.11.1@o2ib6) Mar 19 09:29:31 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 19 09:29:56 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 3a2b662e-54dc-3395-c681-160035133363 (at 10.8.10.36@o2ib6) Mar 19 09:29:56 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 19 09:30:47 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to f88fc651-dec1-3e75-cd66-a5d02a648ff4 (at 10.8.13.10@o2ib6) Mar 19 09:30:47 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 19 09:48:13 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 434948ee-bfe5-5d8b-0ab2-6420dff4bd4f (at 10.8.10.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8ecd1feaf800, cur 1553014093 expire 1553013943 last 1553013866 Mar 19 09:48:13 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Mar 19 09:50:20 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.10.9@o2ib6) Mar 19 09:50:20 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 19 09:56:34 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 5b24b7ad-6ae1-07e6-38c8-53550e03f8cb (at 10.9.101.55@o2ib4) Mar 19 09:56:34 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 19 09:57:40 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 094b9592-6ec3-1e37-ba8c-a34d95123677 (at 10.8.12.10@o2ib6) Mar 19 09:57:40 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 19 09:58:12 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.8.29@o2ib6) Mar 19 09:58:12 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 19 10:00:20 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 14c95501-239d-98ed-7623-c64614da3a71 (at 10.8.11.27@o2ib6) Mar 19 10:00:20 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 19 10:14:08 fir-md1-s2 kernel: Lustre: 91055:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1553015641/real 1553015641] req@ffff8ec433381200 x1628271162704176/t0(0) o104->fir-MDT0003@10.8.9.2@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1553015648 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Mar 19 10:14:15 fir-md1-s2 kernel: Lustre: 91055:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1553015648/real 1553015648] req@ffff8ec433381200 x1628271162704176/t0(0) o104->fir-MDT0003@10.8.9.2@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1553015655 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Mar 19 10:14:17 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 0e6edbf0-d4e4-2bfd-8d10-ebcc53c0105c (at 10.8.9.2@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8ec01803d400, cur 1553015657 expire 1553015507 last 1553015430 Mar 19 10:14:17 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 19 10:14:22 fir-md1-s2 kernel: Lustre: 91055:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1553015655/real 1553015655] req@ffff8ec433381200 x1628271162704176/t0(0) o104->fir-MDT0003@10.8.9.2@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1553015662 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Mar 19 10:14:26 fir-md1-s2 kernel: Lustre: 91486:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8ec303eee600 x1626680993588592/t0(0) o36->9346a7a8-dfc7-3cc1-3897-55b48b060d94@10.8.6.36@o2ib6:1/0 lens 504/448 e 0 to 0 dl 1553015671 ref 2 fl Interpret:/0/0 rc 0/0 Mar 19 10:14:29 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 0e6edbf0-d4e4-2bfd-8d10-ebcc53c0105c (at 10.8.9.2@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8ee004056800, cur 1553015669 expire 1553015519 last 1553015442 Mar 19 10:36:43 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.14.5@o2ib6) Mar 19 10:36:43 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 19 10:40:05 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client a557acba-5c99-b832-b8c9-7f1df36fe8bc (at 10.9.101.55@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8edcffb43000, cur 1553017205 expire 1553017055 last 1553016978 Mar 19 10:54:00 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.20.15@o2ib6) Mar 19 10:54:00 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 19 10:54:32 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client a706839b-701f-5b4f-f0e7-4b38d85f93fa (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8edfdfcd2c00, cur 1553018072 expire 1553017922 last 1553017845 Mar 19 10:54:32 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 19 11:06:19 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 5b24b7ad-6ae1-07e6-38c8-53550e03f8cb (at 10.9.101.55@o2ib4) Mar 19 11:06:19 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 19 11:08:06 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to c6309390-d5ff-3002-57f3-e7314f018ae2 (at 10.9.101.29@o2ib4) Mar 19 11:08:06 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 19 11:08:14 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client c6309390-d5ff-3002-57f3-e7314f018ae2 (at 10.9.101.29@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8ed078435000, cur 1553018894 expire 1553018744 last 1553018667 Mar 19 11:08:14 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 19 11:36:44 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.20.15@o2ib6) Mar 19 11:36:44 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to (at 10.8.20.15@o2ib6) Mar 19 11:37:05 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 0017ebfd-95eb-fbc6-6ddb-e28993d1a90c (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8edc6a6df800, cur 1553020625 expire 1553020475 last 1553020398 Mar 19 11:37:05 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages Mar 19 11:39:39 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.20.15@o2ib6) Mar 19 11:39:39 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 19 11:40:31 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client c7826685-0cd6-a002-cd95-0ea1ac8fe136 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8ecbc7291000, cur 1553020831 expire 1553020681 last 1553020604 Mar 19 11:40:31 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 19 11:54:38 fir-md1-s2 kernel: LustreError: 90907:0:(mdt_io.c:442:mdt_preprw_write()) fir-MDT0003: BRW to missing obj [0x280008e8f:0x6bba:0x0] Mar 19 12:00:57 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 3c73dbf8-3b9c-e12d-8e88-82999a4eea2b (at 10.8.15.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8ef04e81fc00, cur 1553022057 expire 1553021907 last 1553021830 Mar 19 12:00:57 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 19 12:08:38 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.20.15@o2ib6) Mar 19 12:08:38 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 19 12:09:22 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 02cd65a8-402f-266d-98fb-e5756f8cce59 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8ecbc72ed000, cur 1553022562 expire 1553022412 last 1553022335 Mar 19 12:09:22 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 19 12:13:23 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 97584adb-1840-2125-be9f-ed6ea8df5749 (at 10.8.1.29@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8eefb5e7a800, cur 1553022803 expire 1553022653 last 1553022576 Mar 19 12:13:23 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 19 12:19:07 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.15.9@o2ib6) Mar 19 12:19:07 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 19 12:53:02 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 5c6d73c4-6d7c-e986-8930-74829de205ff (at 10.8.14.1@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8ec155b69400, cur 1553025182 expire 1553025032 last 1553024955 Mar 19 12:53:02 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 19 13:20:46 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 2c62a935-22f5-687e-61b8-7ce44c516d90 (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8eca076cd800, cur 1553026846 expire 1553026696 last 1553026619 Mar 19 13:20:46 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 19 13:22:04 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to f53a043d-d024-dbbb-a021-95a52c6fc449 (at 10.8.11.9@o2ib6) Mar 19 13:22:04 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 19 13:34:20 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 9b01daf5-c264-7fbf-6861-9205db47e51b (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8ecf59becc00, cur 1553027660 expire 1553027510 last 1553027433 Mar 19 13:34:20 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages Mar 19 13:34:52 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 2efa0721-d988-fc25-739d-9be810de2f6e (at 10.8.27.23@o2ib6) Mar 19 13:34:52 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 19 13:53:45 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 2efa0721-d988-fc25-739d-9be810de2f6e (at 10.8.27.23@o2ib6) Mar 19 13:53:45 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 19 13:54:02 fir-md1-s2 kernel: LustreError: 91343:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.27.23@o2ib6) returned error from blocking AST (req@ffff8ecba7ed3f00 x1628271319803904 status -107 rc -107), evict it ns: mdt-fir-MDT0003_UUID lock: ffff8ecfe1555e80/0xefacb2c01216d6b7 lrc: 4/0,0 mode: PR/PR res: [0x280000401:0x5:0x0].0x0 bits 0x13/0x0 rrc: 1053 type: IBT flags: 0x60200400000020 nid: 10.8.27.23@o2ib6 remote: 0xcea6b5a2c4edab57 expref: 120 pid: 91550 timeout: 873535 lvb_type: 0 Mar 19 13:54:02 fir-md1-s2 kernel: LustreError: 138-a: fir-MDT0003: A client on nid 10.8.27.23@o2ib6 was evicted due to a lock blocking callback time out: rc -107 Mar 19 13:54:02 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 0s: evicting client at 10.8.27.23@o2ib6 ns: mdt-fir-MDT0003_UUID lock: ffff8ecfe1555e80/0xefacb2c01216d6b7 lrc: 3/0,0 mode: PR/PR res: [0x280000401:0x5:0x0].0x0 bits 0x13/0x0 rrc: 1283 type: IBT flags: 0x60200400000020 nid: 10.8.27.23@o2ib6 remote: 0xcea6b5a2c4edab57 expref: 121 pid: 91550 timeout: 0 lvb_type: 0 Mar 19 13:54:49 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client f50067e8-8b33-9dca-d9d1-f4010211e6dd (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8ec385b9a000, cur 1553028889 expire 1553028739 last 1553028662 Mar 19 13:54:49 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 19 14:00:58 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 828dd8b3-0a36-2519-d144-487922890e9b (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8ec421a1a800, cur 1553029258 expire 1553029108 last 1553029031 Mar 19 14:01:26 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 2efa0721-d988-fc25-739d-9be810de2f6e (at 10.8.27.23@o2ib6) Mar 19 14:01:27 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to 2efa0721-d988-fc25-739d-9be810de2f6e (at 10.8.27.23@o2ib6) Mar 19 14:11:04 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 57f53103-f839-81d5-c131-897db97762e8 (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8ed37bbc7000, cur 1553029864 expire 1553029714 last 1553029637 Mar 19 14:11:04 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 19 14:12:38 fir-md1-s2 kernel: LustreError: 91691:0:(mdt_io.c:442:mdt_preprw_write()) fir-MDT0001: BRW to missing obj [0x24000ecda:0x2cfc:0x0] Mar 19 14:15:37 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to f53a043d-d024-dbbb-a021-95a52c6fc449 (at 10.8.11.9@o2ib6) Mar 19 14:15:37 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 19 14:17:43 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client ab905b54-126d-719c-68ac-791f0a17153b (at 10.9.108.52@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8eb1fa6bdc00, cur 1553030263 expire 1553030113 last 1553030036 Mar 19 14:17:43 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 19 14:33:14 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 386266ad-b29a-a641-0564-58c80858c967 (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8ec536e08400, cur 1553031194 expire 1553031044 last 1553030967 Mar 19 14:33:14 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages Mar 19 14:35:03 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to f53a043d-d024-dbbb-a021-95a52c6fc449 (at 10.8.11.9@o2ib6) Mar 19 14:35:03 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 19 14:42:11 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to ab905b54-126d-719c-68ac-791f0a17153b (at 10.9.108.52@o2ib4) Mar 19 14:42:11 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 19 14:45:49 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.17.21@o2ib6) Mar 19 14:45:49 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 19 14:50:21 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to b7c4a32e-3559-e63b-51fb-8d7f7fc6637a (at 10.8.26.23@o2ib6) Mar 19 14:50:21 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 19 14:51:25 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client ff43ce9a-6722-4dfb-851f-4ce9905ab062 (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8ec419f5e000, cur 1553032285 expire 1553032135 last 1553032058 Mar 19 14:51:25 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 19 14:55:17 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to f53a043d-d024-dbbb-a021-95a52c6fc449 (at 10.8.11.9@o2ib6) Mar 19 14:55:17 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 19 14:56:10 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.11.10@o2ib6) Mar 19 14:56:10 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 19 14:59:36 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 67fabff9-2aff-6efd-4e69-7d069cb29b1e (at 10.8.26.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8eca752f9c00, cur 1553032776 expire 1553032626 last 1553032549 Mar 19 14:59:36 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 19 15:00:18 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 464aa845-dbed-a8e7-d827-38d5dc3cee22 (at 10.8.14.1@o2ib6) Mar 19 15:00:18 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 19 15:04:07 fir-md1-s2 kernel: LustreError: 91559:0:(mdt_io.c:442:mdt_preprw_write()) fir-MDT0001: BRW to missing obj [0x24000ecd7:0x181bd:0x0] Mar 19 15:04:52 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client e1b13546-795a-0c7f-c0f3-58293da03e68 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8ecd51a92c00, cur 1553033092 expire 1553032942 last 1553032865 Mar 19 15:04:52 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 19 15:05:45 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.20.15@o2ib6) Mar 19 15:05:45 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 19 15:23:47 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 27e8ee0c-14af-405a-1862-22bcb182600a (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8ec294ff3800, cur 1553034227 expire 1553034077 last 1553034000 Mar 19 15:23:47 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 19 15:23:53 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 27e8ee0c-14af-405a-1862-22bcb182600a (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8ec48537d000, cur 1553034233 expire 1553034083 last 1553034006 Mar 19 15:24:04 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 2efa0721-d988-fc25-739d-9be810de2f6e (at 10.8.27.23@o2ib6) Mar 19 15:24:04 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 19 15:30:00 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 97584adb-1840-2125-be9f-ed6ea8df5749 (at 10.8.1.29@o2ib6) Mar 19 15:30:00 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 19 15:32:03 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to b7c4a32e-3559-e63b-51fb-8d7f7fc6637a (at 10.8.26.23@o2ib6) Mar 19 15:32:03 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 19 15:42:33 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 26e0c53a-4e65-960e-65a8-e6757fba5388 (at 10.8.14.1@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8ebb47a93000, cur 1553035353 expire 1553035203 last 1553035126 Mar 19 15:49:28 fir-md1-s2 kernel: Lustre: 91218:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1553035761/real 1553035761] req@ffff8eb97170aa00 x1628271466140544/t0(0) o104->fir-MDT0003@10.8.20.15@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1553035768 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Mar 19 15:49:35 fir-md1-s2 kernel: Lustre: 91218:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1553035768/real 1553035768] req@ffff8eb97170aa00 x1628271466140544/t0(0) o104->fir-MDT0003@10.8.20.15@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1553035775 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Mar 19 15:49:36 fir-md1-s2 kernel: Lustre: 91267:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8eb8cbb9a100 x1626300676038464/t0(0) o36->f1f7266e-93ed-b149-5bce-99ca2364246b@10.8.10.16@o2ib6:11/0 lens 512/448 e 1 to 0 dl 1553035781 ref 2 fl Interpret:/0/0 rc 0/0 Mar 19 15:49:42 fir-md1-s2 kernel: Lustre: 91218:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1553035775/real 1553035775] req@ffff8eb97170aa00 x1628271466140544/t0(0) o104->fir-MDT0003@10.8.20.15@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1553035782 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Mar 19 15:49:42 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client f1f7266e-93ed-b149-5bce-99ca2364246b (at 10.8.10.16@o2ib6) reconnecting Mar 19 15:49:42 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to (at 10.8.10.16@o2ib6) Mar 19 15:49:49 fir-md1-s2 kernel: Lustre: 91218:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1553035782/real 1553035782] req@ffff8eb97170aa00 x1628271466140544/t0(0) o104->fir-MDT0003@10.8.20.15@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1553035789 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Mar 19 15:49:50 fir-md1-s2 kernel: Lustre: 91217:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8eca683d3f00 x1627855896117968/t0(0) o101->87eacf64-f8bb-eb35-7b3b-64dd7de06338@10.8.25.2@o2ib6:25/0 lens 592/3264 e 0 to 0 dl 1553035795 ref 2 fl Interpret:/0/0 rc 0/0 Mar 19 15:49:56 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client 87eacf64-f8bb-eb35-7b3b-64dd7de06338 (at 10.8.25.2@o2ib6) reconnecting Mar 19 15:49:56 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to (at 10.8.25.2@o2ib6) Mar 19 15:49:56 fir-md1-s2 kernel: Lustre: 91218:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1553035789/real 1553035789] req@ffff8eb97170aa00 x1628271466140544/t0(0) o104->fir-MDT0003@10.8.20.15@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1553035796 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Mar 19 15:50:03 fir-md1-s2 kernel: Lustre: fir-MDT0003: Client f1f7266e-93ed-b149-5bce-99ca2364246b (at 10.8.10.16@o2ib6) reconnecting Mar 19 15:50:03 fir-md1-s2 kernel: LustreError: 91218:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.20.15@o2ib6) returned error from blocking AST (req@ffff8eb97170aa00 x1628271466140544 status -107 rc -107), evict it ns: mdt-fir-MDT0003_UUID lock: ffff8ebe86c186c0/0xefacb2c07b73d60b lrc: 4/0,0 mode: PR/PR res: [0x2800065ab:0x139c2:0x0].0x0 bits 0x13/0x0 rrc: 4 type: IBT flags: 0x60200400000020 nid: 10.8.20.15@o2ib6 remote: 0x400e81e3b5923760 expref: 13 pid: 90854 timeout: 880496 lvb_type: 0 Mar 19 15:50:03 fir-md1-s2 kernel: LustreError: 138-a: fir-MDT0003: A client on nid 10.8.20.15@o2ib6 was evicted due to a lock blocking callback time out: rc -107 Mar 19 15:50:03 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 42s: evicting client at 10.8.20.15@o2ib6 ns: mdt-fir-MDT0003_UUID lock: ffff8ebe86c186c0/0xefacb2c07b73d60b lrc: 3/0,0 mode: PR/PR res: [0x2800065ab:0x139c2:0x0].0x0 bits 0x13/0x0 rrc: 4 type: IBT flags: 0x60200400000020 nid: 10.8.20.15@o2ib6 remote: 0x400e81e3b5923760 expref: 14 pid: 90854 timeout: 0 lvb_type: 0 Mar 19 15:51:49 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client a9ea0fe4-9f31-0606-5dea-c6a4e81b7ee7 (at 10.8.20.15@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8ec04ea6e000, cur 1553035909 expire 1553035759 last 1553035682 Mar 19 15:51:49 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 19 15:56:27 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client d5530428-6ade-c9de-ffe0-dea132e98805 (at 10.8.11.10@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8eb261544000, cur 1553036187 expire 1553036037 last 1553035960 Mar 19 16:10:41 fir-md1-s2 kernel: LustreError: 91743:0:(mdt_io.c:442:mdt_preprw_write()) fir-MDT0003: BRW to missing obj [0x28000f474:0x124c8:0x0] Mar 19 16:25:33 fir-md1-s2 kernel: LustreError: 91602:0:(mdt_io.c:442:mdt_preprw_write()) fir-MDT0003: BRW to missing obj [0x28000f474:0x14a87:0x0] Mar 19 16:44:14 fir-md1-s2 kernel: LustreError: 91713:0:(mdt_io.c:442:mdt_preprw_write()) fir-MDT0003: BRW to missing obj [0x28000f474:0x17ae0:0x0] Mar 19 17:09:44 fir-md1-s2 kernel: LustreError: 91700:0:(mdt_io.c:442:mdt_preprw_write()) fir-MDT0003: BRW to missing obj [0x28000f474:0x1ba32:0x0] Mar 19 17:15:47 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.11.10@o2ib6) Mar 19 17:15:47 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Mar 19 17:46:14 fir-md1-s2 kernel: LustreError: 91683:0:(mdt_io.c:442:mdt_preprw_write()) fir-MDT0001: BRW to missing obj [0x24000ece4:0x13ce1:0x0] Mar 19 18:26:56 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client b7b00dce-254c-3fee-52a4-03caca689aa9 (at 10.8.11.10@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8ec302e03000, cur 1553045216 expire 1553045066 last 1553044989 Mar 19 18:26:56 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 19 18:28:54 fir-md1-s2 kernel: Lustre: 91538:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 Mar 19 18:31:27 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to f53a043d-d024-dbbb-a021-95a52c6fc449 (at 10.8.11.9@o2ib6) Mar 19 18:31:27 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 19 19:01:51 fir-md1-s2 kernel: Lustre: 91655:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8eefaf230300 x1626181283381792/t112337080947(0) o36->3c080823-2f82-a8fd-3254-84e9f50418c5@10.9.101.6@o2ib4:26/0 lens 488/3152 e 1 to 0 dl 1553047316 ref 2 fl Interpret:/0/0 rc 0/0 Mar 19 19:01:51 fir-md1-s2 kernel: Lustre: 91673:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8ebbe6167b00 x1626181283381760/t0(0) o101->3c080823-2f82-a8fd-3254-84e9f50418c5@10.9.101.6@o2ib4:26/0 lens 480/568 e 1 to 0 dl 1553047316 ref 2 fl Interpret:/0/0 rc 0/0 Mar 19 19:01:51 fir-md1-s2 kernel: Lustre: 91673:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Mar 19 19:01:57 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 3c080823-2f82-a8fd-3254-84e9f50418c5 (at 10.9.101.6@o2ib4) reconnecting Mar 19 19:01:57 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.101.24@o2ib4) Mar 19 19:01:57 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 19 19:02:01 fir-md1-s2 kernel: Lustre: 91463:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8edeed622400 x1626123475002944/t0(0) o101->810fe316-e09a-254c-3020-2540e531f84e@10.9.101.7@o2ib4:6/0 lens 600/3264 e 0 to 0 dl 1553047326 ref 2 fl Interpret:/0/0 rc 0/0 Mar 19 19:02:01 fir-md1-s2 kernel: Lustre: 91463:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 4 previous similar messages Mar 19 19:02:07 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 1269d82d-2021-a198-cfa7-174e25a867c3 (at 10.9.101.18@o2ib4) reconnecting Mar 19 19:02:07 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 19 19:02:07 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.101.18@o2ib4) Mar 19 19:02:07 fir-md1-s2 kernel: Lustre: Skipped 2 previous similar messages Mar 19 19:02:18 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 3c080823-2f82-a8fd-3254-84e9f50418c5 (at 10.9.101.6@o2ib4) reconnecting Mar 19 19:02:18 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 19 19:02:18 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 3c080823-2f82-a8fd-3254-84e9f50418c5 (at 10.9.101.6@o2ib4) Mar 19 19:02:18 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 19 19:02:38 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 1269d82d-2021-a198-cfa7-174e25a867c3 (at 10.9.101.18@o2ib4) reconnecting Mar 19 19:02:38 fir-md1-s2 kernel: Lustre: Skipped 2 previous similar messages Mar 19 19:02:38 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.101.18@o2ib4) Mar 19 19:02:38 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 19 19:03:00 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 3c080823-2f82-a8fd-3254-84e9f50418c5 (at 10.9.101.6@o2ib4) reconnecting Mar 19 19:03:00 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Mar 19 19:03:00 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 3c080823-2f82-a8fd-3254-84e9f50418c5 (at 10.9.101.6@o2ib4) Mar 19 19:03:00 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Mar 19 19:03:06 fir-md1-s2 kernel: LustreError: 99935:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1553047296, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0001_UUID lock: ffff8ee857f086c0/0xefacb2c130fb2283 lrc: 3/0,1 mode: --/PW res: [0x24000e5e6:0x416a:0x0].0x0 bits 0x2/0x0 rrc: 24 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 99935 timeout: 0 lvb_type: 0 Mar 19 19:03:06 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1553047386.91441 Mar 19 19:03:06 fir-md1-s2 kernel: LustreError: 99935:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 10 previous similar messages Mar 19 19:03:09 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 1269d82d-2021-a198-cfa7-174e25a867c3 (at 10.9.101.18@o2ib4) reconnecting Mar 19 19:03:09 fir-md1-s2 kernel: Lustre: Skipped 2 previous similar messages Mar 19 19:03:09 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.101.18@o2ib4) Mar 19 19:03:09 fir-md1-s2 kernel: Lustre: Skipped 2 previous similar messages Mar 19 19:03:40 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 1269d82d-2021-a198-cfa7-174e25a867c3 (at 10.9.101.18@o2ib4) reconnecting Mar 19 19:03:40 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Mar 19 19:03:40 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.101.18@o2ib4) Mar 19 19:03:40 fir-md1-s2 kernel: Lustre: Skipped 4 previous similar messages Mar 19 19:04:05 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 149s: evicting client at 10.9.101.6@o2ib4 ns: mdt-fir-MDT0001_UUID lock: ffff8edbc1bb7980/0xefacb2c130fb2109 lrc: 3/0,0 mode: PW/PW res: [0x24000e5e6:0x416a:0x0].0x0 bits 0x40/0x0 rrc: 24 type: IBT flags: 0x60200400000020 nid: 10.9.101.6@o2ib4 remote: 0xc4d4d1f1ca70768e expref: 197 pid: 91462 timeout: 891988 lvb_type: 0 Mar 19 19:04:05 fir-md1-s2 kernel: LustreError: 91619:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8ef06de98000 ns: mdt-fir-MDT0001_UUID lock: ffff8ebf0fce4c80/0xefacb2c130fb2244 lrc: 3/0,0 mode: PW/PW res: [0x24000e5e6:0x416a:0x0].0x0 bits 0x40/0x0 rrc: 21 type: IBT flags: 0x50200400000020 nid: 10.9.101.6@o2ib4 remote: 0xc4d4d1f1ca70769c expref: 167 pid: 91619 timeout: 0 lvb_type: 0 Mar 19 19:04:30 fir-md1-s2 kernel: Lustre: 91655:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8eefa5401500 x1626123475250560/t0(0) o101->810fe316-e09a-254c-3020-2540e531f84e@10.9.101.7@o2ib4:5/0 lens 600/3264 e 0 to 0 dl 1553047475 ref 2 fl Interpret:/0/0 rc 0/0 Mar 19 19:04:30 fir-md1-s2 kernel: Lustre: 91655:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 10 previous similar messages Mar 19 19:04:35 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.9.101.18@o2ib4 ns: mdt-fir-MDT0001_UUID lock: ffff8ec770720000/0xefacb2c132bdff22 lrc: 3/0,0 mode: PW/PW res: [0x24000e5e6:0x416a:0x0].0x0 bits 0x40/0x0 rrc: 30 type: IBT flags: 0x60200400000020 nid: 10.9.101.18@o2ib4 remote: 0x51e79ecb4441e89e expref: 29 pid: 90865 timeout: 892018 lvb_type: 0 Mar 19 19:04:36 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client fa565bf9-4e3f-b131-86ef-18427c7a396c (at 10.9.101.24@o2ib4) reconnecting Mar 19 19:04:36 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.101.9@o2ib4) Mar 19 19:04:36 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages Mar 19 19:04:36 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages Mar 19 19:05:01 fir-md1-s2 kernel: Lustre: 91585:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8eeb0cbecb00 x1627854542223648/t0(0) o101->1269d82d-2021-a198-cfa7-174e25a867c3@10.9.101.18@o2ib4:6/0 lens 600/3264 e 0 to 0 dl 1553047506 ref 2 fl Interpret:/0/0 rc 0/0 Mar 19 19:05:01 fir-md1-s2 kernel: Lustre: 91585:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 4 previous similar messages Mar 19 19:05:35 fir-md1-s2 kernel: LustreError: 91445:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1553047445, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0001_UUID lock: ffff8ec5f9783180/0xefacb2c132be01e5 lrc: 3/0,1 mode: --/PW res: [0x24000e5e6:0x416a:0x0].0x0 bits 0x40/0x0 rrc: 28 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 91445 timeout: 0 lvb_type: 0 Mar 19 19:05:35 fir-md1-s2 kernel: LustreError: 91445:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 11 previous similar messages Mar 19 19:06:06 fir-md1-s2 kernel: LustreError: 91655:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1553047476, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0001_UUID lock: ffff8edf71045c40/0xefacb2c1331e6961 lrc: 3/1,0 mode: --/PR res: [0x24000e5e6:0x416a:0x0].0x0 bits 0x13/0x8 rrc: 28 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 91655 timeout: 0 lvb_type: 0 Mar 19 19:06:09 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 1831b60b-c0b3-6e16-f786-e9804146d690 (at 10.9.101.9@o2ib4) reconnecting Mar 19 19:06:09 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Mar 19 19:06:09 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.101.9@o2ib4) Mar 19 19:06:09 fir-md1-s2 kernel: Lustre: Skipped 11 previous similar messages Mar 19 19:07:01 fir-md1-s2 kernel: Lustre: 91433:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8eb935f63000 x1628035752571264/t0(0) o101->f2f4d52c-2807-60dd-8532-99fa3e9aeefa@10.0.10.3@o2ib7:6/0 lens 576/3264 e 0 to 0 dl 1553047626 ref 2 fl Interpret:/0/0 rc 0/0 Mar 19 19:07:05 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.9.101.24@o2ib4 ns: mdt-fir-MDT0001_UUID lock: ffff8ec980329200/0xefacb2c132be01bb lrc: 3/0,0 mode: PW/PW res: [0x24000e5e6:0x416a:0x0].0x0 bits 0x40/0x0 rrc: 31 type: IBT flags: 0x60200400000020 nid: 10.9.101.24@o2ib4 remote: 0xa5d9cf59062c8110 expref: 36405 pid: 91441 timeout: 892168 lvb_type: 0 Mar 19 19:07:05 fir-md1-s2 kernel: LustreError: 91445:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8eebb0678800 ns: mdt-fir-MDT0001_UUID lock: ffff8ec5f9783180/0xefacb2c132be01e5 lrc: 3/0,0 mode: PW/PW res: [0x24000e5e6:0x416a:0x0].0x0 bits 0x40/0x0 rrc: 29 type: IBT flags: 0x50200400000020 nid: 10.9.101.24@o2ib4 remote: 0xa5d9cf59062c8117 expref: 31666 pid: 91445 timeout: 0 lvb_type: 0 Mar 19 19:07:05 fir-md1-s2 kernel: LustreError: 91445:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 2 previous similar messages Mar 19 19:07:05 fir-md1-s2 kernel: Lustre: 91211:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:150s); client may timeout. req@ffff8ebfeaa14500 x1627854542217776/t0(0) o101->1269d82d-2021-a198-cfa7-174e25a867c3@10.9.101.18@o2ib4:5/0 lens 480/536 e 0 to 0 dl 1553047475 ref 1 fl Complete:/0/0 rc -107/-107 Mar 19 19:07:05 fir-md1-s2 kernel: Lustre: 91211:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 3 previous similar messages Mar 19 19:07:30 fir-md1-s2 kernel: Lustre: 91447:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8ec3defd5d00 x1627854542260848/t0(0) o101->1269d82d-2021-a198-cfa7-174e25a867c3@10.9.101.18@o2ib4:5/0 lens 568/0 e 0 to 0 dl 1553047655 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Mar 19 19:07:30 fir-md1-s2 kernel: Lustre: 91447:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages Mar 19 19:08:35 fir-md1-s2 kernel: LustreError: 91543:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1553047625, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0001_UUID lock: ffff8ecfe1519d40/0xefacb2c134deee5a lrc: 3/0,1 mode: --/PW res: [0x24000e5e6:0x416a:0x0].0x0 bits 0x2/0x0 rrc: 19 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 91543 timeout: 0 lvb_type: 0 Mar 19 19:08:35 fir-md1-s2 kernel: LustreError: 91549:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1553047625, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0001_UUID lock: ffff8ed022de0240/0xefacb2c134deee45 lrc: 3/0,1 mode: --/PW res: [0x24000e5e6:0x416a:0x0].0x0 bits 0x40/0x0 rrc: 19 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 91549 timeout: 0 lvb_type: 0 Mar 19 19:08:35 fir-md1-s2 kernel: LustreError: 91549:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 1 previous similar message Mar 19 19:08:35 fir-md1-s2 kernel: LustreError: 91543:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 4 previous similar messages Mar 19 19:08:38 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 1831b60b-c0b3-6e16-f786-e9804146d690 (at 10.9.101.9@o2ib4) reconnecting Mar 19 19:08:38 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.101.18@o2ib4) Mar 19 19:08:38 fir-md1-s2 kernel: Lustre: Skipped 14 previous similar messages Mar 19 19:08:38 fir-md1-s2 kernel: Lustre: Skipped 14 previous similar messages Mar 19 19:09:35 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.9.101.18@o2ib4 ns: mdt-fir-MDT0001_UUID lock: ffff8eec6b3d1680/0xefacb2c134deedc0 lrc: 3/0,0 mode: PW/PW res: [0x24000e5e6:0x416a:0x0].0x0 bits 0x40/0x0 rrc: 18 type: IBT flags: 0x60200400000020 nid: 10.9.101.18@o2ib4 remote: 0x51e79ecb4441f3b1 expref: 48 pid: 114050 timeout: 892318 lvb_type: 0 Mar 19 19:10:00 fir-md1-s2 kernel: Lustre: 91533:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8ec927e9e300 x1626123475881968/t0(0) o101->810fe316-e09a-254c-3020-2540e531f84e@10.9.101.7@o2ib4:5/0 lens 568/0 e 0 to 0 dl 1553047805 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Mar 19 19:10:00 fir-md1-s2 kernel: Lustre: 91533:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 4 previous similar messages Mar 19 19:10:05 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.9.101.7@o2ib4 ns: mdt-fir-MDT0001_UUID lock: ffff8ed022de0240/0xefacb2c134deee45 lrc: 3/0,0 mode: PW/PW res: [0x24000e5e6:0x416a:0x0].0x0 bits 0x40/0x0 rrc: 15 type: IBT flags: 0x60200400000020 nid: 10.9.101.7@o2ib4 remote: 0xc8b1122665006968 expref: 20 pid: 91549 timeout: 892348 lvb_type: 0 Mar 19 19:10:05 fir-md1-s2 kernel: LustreError: 114050:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8edeed6bfc00 ns: mdt-fir-MDT0001_UUID lock: ffff8ef0617b4140/0xefacb2c134def07c lrc: 3/0,0 mode: PW/PW res: [0x24000e5e6:0x416a:0x0].0x0 bits 0x40/0x0 rrc: 9 type: IBT flags: 0x50200400000020 nid: 10.9.101.18@o2ib4 remote: 0x51e79ecb4441f41a expref: 4 pid: 114050 timeout: 0 lvb_type: 0 Mar 19 19:10:05 fir-md1-s2 kernel: LustreError: 114050:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 6 previous similar messages Mar 19 19:10:05 fir-md1-s2 kernel: Lustre: 114050:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (154:26s); client may timeout. req@ffff8ee9846fe600 x1627854542260832/t0(0) o101->1269d82d-2021-a198-cfa7-174e25a867c3@10.9.101.18@o2ib4:5/0 lens 480/536 e 0 to 0 dl 1553047779 ref 1 fl Complete:/0/0 rc -107/-107 Mar 19 19:10:05 fir-md1-s2 kernel: Lustre: 114050:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 1 previous similar message Mar 19 19:17:03 fir-md1-s2 kernel: LustreError: 91696:0:(mdt_io.c:442:mdt_preprw_write()) fir-MDT0003: BRW to missing obj [0x28000f487:0x140f4:0x0] Mar 19 21:11:59 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.11.10@o2ib6) Mar 19 21:11:59 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Mar 19 21:14:40 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 98bc7d1b-b251-d2f4-d279-8f037b579329 (at 10.8.23.21@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8eec7778d400, cur 1553055280 expire 1553055130 last 1553055053 Mar 19 21:14:40 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages Mar 19 21:16:48 fir-md1-s2 kernel: Lustre: 91465:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8eddc828c200 x1626259788001568/t0(0) o101->5eae3cf7-a13b-7336-7e1f-fb9bc4861c5d@10.9.108.56@o2ib4:23/0 lens 576/3264 e 1 to 0 dl 1553055413 ref 2 fl Interpret:/0/0 rc 0/0 Mar 19 21:16:48 fir-md1-s2 kernel: Lustre: 91465:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Mar 19 21:16:54 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 148ae75a-e083-e480-3765-d63daa0c5525 (at 10.9.108.55@o2ib4) reconnecting Mar 19 21:16:54 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 592a51bd-e814-0c15-eeb6-9f1ef8a77f16 (at 10.9.108.38@o2ib4) Mar 19 21:16:54 fir-md1-s2 kernel: Lustre: Skipped 9 previous similar messages Mar 19 21:17:36 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 5eae3cf7-a13b-7336-7e1f-fb9bc4861c5d (at 10.9.108.56@o2ib4) reconnecting Mar 19 21:17:36 fir-md1-s2 kernel: Lustre: Skipped 31 previous similar messages Mar 19 21:17:58 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.108.53@o2ib4) Mar 19 21:17:58 fir-md1-s2 kernel: Lustre: Skipped 68 previous similar messages Mar 19 21:18:03 fir-md1-s2 kernel: LustreError: 90858:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1553055393, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0001_UUID lock: ffff8ece23300000/0xefacb2c18c9ab4a7 lrc: 3/0,1 mode: --/PW res: [0x24000dd55:0x1cca7:0x0].0x0 bits 0x40/0x0 rrc: 79 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 90858 timeout: 0 lvb_type: 0 Mar 19 21:18:03 fir-md1-s2 kernel: LustreError: 91353:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1553055393, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0001_UUID lock: ffff8eba9122a400/0xefacb2c18c9ab3dc lrc: 3/0,1 mode: --/PW res: [0x24000dd55:0x1cca7:0x0].0x0 bits 0x40/0x0 rrc: 82 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 91353 timeout: 0 lvb_type: 0 Mar 19 21:18:03 fir-md1-s2 kernel: LustreError: 91353:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 2 previous similar messages Mar 19 21:18:03 fir-md1-s2 kernel: LustreError: 90858:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 27 previous similar messages Mar 19 21:18:40 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client a42f4a0a-3e8d-4391-375e-5a7dff5ecf36 (at 10.9.108.53@o2ib4) reconnecting Mar 19 21:18:40 fir-md1-s2 kernel: Lustre: Skipped 68 previous similar messages Mar 19 21:19:02 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 149s: evicting client at 10.9.108.46@o2ib4 ns: mdt-fir-MDT0001_UUID lock: ffff8ecffb20f500/0xefacb2c18c9ab3c7 lrc: 3/0,0 mode: PW/PW res: [0x24000dd55:0x1cca7:0x0].0x0 bits 0x40/0x0 rrc: 79 type: IBT flags: 0x60200400000020 nid: 10.9.108.46@o2ib4 remote: 0x8374126d39604757 expref: 38560 pid: 91529 timeout: 900085 lvb_type: 0 Mar 19 21:19:43 fir-md1-s2 kernel: Lustre: 91243:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8eb9af2cb000 x1627853341443984/t0(0) o101->df44ff7c-4e8a-070f-774f-84780b4dab3d@10.9.108.48@o2ib4:18/0 lens 600/3264 e 0 to 0 dl 1553055588 ref 2 fl Interpret:/0/0 rc 0/0 Mar 19 21:19:43 fir-md1-s2 kernel: Lustre: 91243:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 31 previous similar messages Mar 19 21:19:48 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.9.108.44@o2ib4 ns: mdt-fir-MDT0001_UUID lock: ffff8eba9122a400/0xefacb2c18c9ab3dc lrc: 3/0,0 mode: PW/PW res: [0x24000dd55:0x1cca7:0x0].0x0 bits 0x40/0x0 rrc: 77 type: IBT flags: 0x60200400000020 nid: 10.9.108.44@o2ib4 remote: 0xe63432aea7141892 expref: 920 pid: 91353 timeout: 900131 lvb_type: 0 Mar 19 21:19:53 fir-md1-s2 kernel: LNet: Service thread pid 91651 was inactive for 200.05s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Mar 19 21:19:53 fir-md1-s2 kernel: Pid: 91651, comm: mdt03_027 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 19 21:19:53 fir-md1-s2 kernel: Call Trace: Mar 19 21:19:53 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 19 21:19:53 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 19 21:19:53 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Mar 19 21:19:53 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Mar 19 21:19:53 fir-md1-s2 kernel: [] mdt_reint_object_lock+0x2c/0x60 [mdt] Mar 19 21:19:53 fir-md1-s2 kernel: [] mdt_reint_striped_lock+0x8c/0x510 [mdt] Mar 19 21:19:53 fir-md1-s2 kernel: [] mdt_reint_setattr+0x6c8/0x12d0 [mdt] Mar 19 21:19:53 fir-md1-s2 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Mar 19 21:19:53 fir-md1-s2 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Mar 19 21:19:53 fir-md1-s2 kernel: [] mdt_reint+0x67/0x140 [mdt] Mar 19 21:19:53 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 19 21:19:53 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 19 21:19:53 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 19 21:19:53 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 19 21:19:53 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 19 21:19:53 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 19 21:19:53 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1553055593.91651 Mar 19 21:19:54 fir-md1-s2 kernel: LNet: Service thread pid 91587 was inactive for 201.41s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Mar 19 21:19:54 fir-md1-s2 kernel: Pid: 91587, comm: mdt03_023 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 19 21:19:54 fir-md1-s2 kernel: Call Trace: Mar 19 21:19:54 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 19 21:19:54 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 19 21:19:54 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Mar 19 21:19:54 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Mar 19 21:19:54 fir-md1-s2 kernel: [] mdt_reint_object_lock+0x2c/0x60 [mdt] Mar 19 21:19:54 fir-md1-s2 kernel: [] mdt_reint_striped_lock+0x8c/0x510 [mdt] Mar 19 21:19:54 fir-md1-s2 kernel: [] mdt_reint_setattr+0x6c8/0x12d0 [mdt] Mar 19 21:19:54 fir-md1-s2 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Mar 19 21:19:54 fir-md1-s2 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Mar 19 21:19:54 fir-md1-s2 kernel: [] mdt_reint+0x67/0x140 [mdt] Mar 19 21:19:54 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 19 21:19:54 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 19 21:19:54 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 19 21:19:54 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 19 21:19:54 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 19 21:19:54 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 19 21:19:54 fir-md1-s2 kernel: Pid: 91379, comm: mdt00_029 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 19 21:19:54 fir-md1-s2 kernel: Call Trace: Mar 19 21:19:54 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 19 21:19:54 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 19 21:19:54 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Mar 19 21:19:54 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Mar 19 21:19:54 fir-md1-s2 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Mar 19 21:19:54 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Mar 19 21:19:54 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Mar 19 21:19:54 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Mar 19 21:19:54 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Mar 19 21:19:54 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 19 21:19:54 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 19 21:19:54 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 19 21:19:54 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 19 21:19:54 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 19 21:19:54 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 19 21:19:54 fir-md1-s2 kernel: Pid: 90865, comm: mdt03_002 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 19 21:19:54 fir-md1-s2 kernel: Call Trace: Mar 19 21:19:54 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 19 21:19:54 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 19 21:19:54 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Mar 19 21:19:54 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Mar 19 21:19:54 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Mar 19 21:19:54 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Mar 19 21:19:54 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Mar 19 21:19:54 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Mar 19 21:19:54 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Mar 19 21:19:54 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Mar 19 21:19:54 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Mar 19 21:19:54 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 19 21:19:54 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 19 21:19:54 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 19 21:19:54 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 19 21:19:54 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 19 21:19:54 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 19 21:19:54 fir-md1-s2 kernel: Pid: 91588, comm: mdt00_067 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 19 21:19:54 fir-md1-s2 kernel: Call Trace: Mar 19 21:19:54 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 19 21:19:55 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 19 21:19:55 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Mar 19 21:19:55 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Mar 19 21:19:55 fir-md1-s2 kernel: [] mdt_reint_object_lock+0x2c/0x60 [mdt] Mar 19 21:19:55 fir-md1-s2 kernel: [] mdt_reint_striped_lock+0x8c/0x510 [mdt] Mar 19 21:19:55 fir-md1-s2 kernel: [] mdt_reint_setattr+0x6c8/0x12d0 [mdt] Mar 19 21:19:55 fir-md1-s2 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Mar 19 21:19:55 fir-md1-s2 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Mar 19 21:19:55 fir-md1-s2 kernel: [] mdt_reint+0x67/0x140 [mdt] Mar 19 21:19:55 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 19 21:19:55 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 19 21:19:55 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 19 21:19:55 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 19 21:19:55 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 19 21:19:55 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 19 21:19:55 fir-md1-s2 kernel: LNet: Service thread pid 91499 was inactive for 201.90s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Mar 19 21:20:07 fir-md1-s2 kernel: Lustre: 91448:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-29), not sending early reply req@ffff8eb9af2c9e00 x1627854796379984/t0(0) o101->d85ce513-0df3-7797-47aa-9963f15f7b01@10.9.108.44@o2ib4:12/0 lens 568/0 e 0 to 0 dl 1553055612 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Mar 19 21:20:18 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.9.108.53@o2ib4 ns: mdt-fir-MDT0001_UUID lock: ffff8ece23300000/0xefacb2c18c9ab4a7 lrc: 3/0,0 mode: PW/PW res: [0x24000dd55:0x1cca7:0x0].0x0 bits 0x40/0x0 rrc: 77 type: IBT flags: 0x60200400000020 nid: 10.9.108.53@o2ib4 remote: 0xe3c6215e3b154ba4 expref: 19 pid: 90858 timeout: 900161 lvb_type: 0 Mar 19 21:20:18 fir-md1-s2 kernel: LNet: Service thread pid 91665 completed after 225.19s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Mar 19 21:20:18 fir-md1-s2 kernel: LustreError: 91476:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8eeffde44400 ns: mdt-fir-MDT0001_UUID lock: ffff8ec9b1681680/0xefacb2c18c9abac0 lrc: 3/0,0 mode: PR/PR res: [0x24000dd55:0x1cca7:0x0].0x0 bits 0x1b/0x0 rrc: 75 type: IBT flags: 0x50200000000000 nid: 10.9.108.46@o2ib4 remote: 0x8374126d3960477a expref: 6 pid: 91476 timeout: 0 lvb_type: 0 Mar 19 21:20:18 fir-md1-s2 kernel: LustreError: 91476:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 2 previous similar messages Mar 19 21:20:18 fir-md1-s2 kernel: Lustre: 91476:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (167:58s); client may timeout. req@ffff8ec3b9e38f00 x1627853768014176/t0(0) o101->d9b63d00-c5f5-abd8-cb92-9c6858b85bc1@10.9.108.46@o2ib4:23/0 lens 576/1168 e 1 to 0 dl 1553055560 ref 1 fl Complete:/0/0 rc -107/-107 Mar 19 21:20:18 fir-md1-s2 kernel: LNet: Skipped 21 previous similar messages Mar 19 21:20:19 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.108.53@o2ib4) Mar 19 21:20:19 fir-md1-s2 kernel: Lustre: Skipped 105 previous similar messages Mar 19 21:20:43 fir-md1-s2 kernel: Lustre: 91066:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8eca3f7ece00 x1628372251686352/t112361527744(0) o36->8f21b47b-7476-e5b7-9fb9-298f61099e2d@10.9.108.39@o2ib4:18/0 lens 488/3152 e 0 to 0 dl 1553055648 ref 2 fl Interpret:/0/0 rc 0/0 Mar 19 21:20:43 fir-md1-s2 kernel: Lustre: 91066:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages Mar 19 21:20:48 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.9.108.47@o2ib4 ns: mdt-fir-MDT0001_UUID lock: ffff8edef67ae300/0xefacb2c18c9ab819 lrc: 3/0,0 mode: PW/PW res: [0x24000dd55:0x1cca7:0x0].0x0 bits 0x40/0x0 rrc: 75 type: IBT flags: 0x60200400000020 nid: 10.9.108.47@o2ib4 remote: 0x60b73ea2a4b641ad expref: 22 pid: 91472 timeout: 900191 lvb_type: 0 Mar 19 21:20:48 fir-md1-s2 kernel: LNet: Service thread pid 14635 completed after 255.18s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Mar 19 21:20:48 fir-md1-s2 kernel: LustreError: 91534:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8eeffde44400 ns: mdt-fir-MDT0001_UUID lock: ffff8ed185e38d80/0xefacb2c18c9abb5a lrc: 3/0,0 mode: PW/PW res: [0x24000dd55:0x1cca7:0x0].0x0 bits 0x40/0x0 rrc: 70 type: IBT flags: 0x50200400000020 nid: 10.9.108.46@o2ib4 remote: 0x8374126d39604781 expref: 2 pid: 91534 timeout: 0 lvb_type: 0 Mar 19 21:20:48 fir-md1-s2 kernel: LustreError: 91534:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 2 previous similar messages Mar 19 21:20:48 fir-md1-s2 kernel: Lustre: 91534:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (167:88s); client may timeout. req@ffff8eed7d6d7800 x1627853768014192/t0(0) o101->d9b63d00-c5f5-abd8-cb92-9c6858b85bc1@10.9.108.46@o2ib4:23/0 lens 480/536 e 1 to 0 dl 1553055560 ref 1 fl Complete:/0/0 rc -107/-107 Mar 19 21:20:48 fir-md1-s2 kernel: Lustre: 91534:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 2 previous similar messages Mar 19 21:20:48 fir-md1-s2 kernel: LNet: Skipped 5 previous similar messages Mar 19 21:21:19 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client bdaf1d13-5cb5-2423-13f1-07bfc2bf3dd3 (at 10.9.108.40@o2ib4) reconnecting Mar 19 21:21:19 fir-md1-s2 kernel: Lustre: Skipped 81 previous similar messages Mar 19 21:22:04 fir-md1-s2 kernel: Lustre: 91448:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8ebbd1333300 x1628035903391104/t0(0) o101->f2f4d52c-2807-60dd-8532-99fa3e9aeefa@10.0.10.3@o2ib7:9/0 lens 576/3264 e 0 to 0 dl 1553055729 ref 2 fl Interpret:/0/0 rc 0/0 Mar 19 21:22:04 fir-md1-s2 kernel: Lustre: 91448:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 35 previous similar messages Mar 19 21:22:18 fir-md1-s2 kernel: LustreError: 91527:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1553055648, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0001_UUID lock: ffff8ef039b90d80/0xefacb2c18f54e748 lrc: 3/0,1 mode: --/PW res: [0x24000dd55:0x1cca7:0x0].0x0 bits 0x40/0x0 rrc: 57 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 91527 timeout: 0 lvb_type: 0 Mar 19 21:22:18 fir-md1-s2 kernel: LustreError: 91527:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 15 previous similar messages Mar 19 21:23:09 fir-md1-s2 kernel: LustreError: 91566:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1553055699, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0001_UUID lock: ffff8eca5aa24140/0xefacb2c18fe96339 lrc: 3/1,0 mode: --/PR res: [0x24000dd55:0x1cca7:0x0].0x0 bits 0x13/0x8 rrc: 56 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 91566 timeout: 0 lvb_type: 0 Mar 19 21:23:18 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.9.101.24@o2ib4 ns: mdt-fir-MDT0001_UUID lock: ffff8ebfe66c0480/0xefacb2c18f54e477 lrc: 3/0,0 mode: PW/PW res: [0x24000dd55:0x1cca7:0x0].0x0 bits 0x40/0x0 rrc: 56 type: IBT flags: 0x60200400000020 nid: 10.9.101.24@o2ib4 remote: 0xa5d9cf59062d4652 expref: 20 pid: 91587 timeout: 900341 lvb_type: 0 Mar 19 21:23:48 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.9.108.43@o2ib4 ns: mdt-fir-MDT0001_UUID lock: ffff8ef039b90d80/0xefacb2c18f54e748 lrc: 3/0,0 mode: PW/PW res: [0x24000dd55:0x1cca7:0x0].0x0 bits 0x40/0x0 rrc: 57 type: IBT flags: 0x60200400000020 nid: 10.9.108.43@o2ib4 remote: 0xcca4287e346ed9ee expref: 15 pid: 91527 timeout: 900371 lvb_type: 0 Mar 19 21:23:48 fir-md1-s2 kernel: LustreError: 91587:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8ed43ea00400 ns: mdt-fir-MDT0001_UUID lock: ffff8ebfe66c0fc0/0xefacb2c18f54f05c lrc: 3/0,0 mode: PR/PR res: [0x24000dd55:0x1cca7:0x0].0x0 bits 0x1b/0x0 rrc: 43 type: IBT flags: 0x50200000000000 nid: 10.9.101.24@o2ib4 remote: 0xa5d9cf59062d468a expref: 6 pid: 91587 timeout: 0 lvb_type: 0 Mar 19 21:23:48 fir-md1-s2 kernel: LustreError: 91587:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 5 previous similar messages Mar 19 21:23:48 fir-md1-s2 kernel: Lustre: 91587:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (154:26s); client may timeout. req@ffff8eead57c9500 x1627854645321120/t0(0) o101->fa565bf9-4e3f-b131-86ef-18427c7a396c@10.9.101.24@o2ib4:18/0 lens 576/1168 e 0 to 0 dl 1553055802 ref 1 fl Complete:/0/0 rc -107/-107 Mar 19 21:23:48 fir-md1-s2 kernel: Lustre: 91587:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 5 previous similar messages Mar 19 21:24:13 fir-md1-s2 kernel: Lustre: 91339:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8ecfe2740900 x1628050974548560/t0(0) o101->cafd7930-f474-d448-fb1b-a5c0f7fb7020@10.9.108.41@o2ib4:18/0 lens 568/0 e 0 to 0 dl 1553055858 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Mar 19 21:24:13 fir-md1-s2 kernel: Lustre: 91339:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 3 previous similar messages Mar 19 21:24:48 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.9.108.55@o2ib4 ns: mdt-fir-MDT0001_UUID lock: ffff8ed606e433c0/0xefacb2c191463fd2 lrc: 3/0,0 mode: PW/PW res: [0x24000dd55:0x1cca7:0x0].0x0 bits 0x40/0x0 rrc: 47 type: IBT flags: 0x60200400000020 nid: 10.9.108.55@o2ib4 remote: 0xdbd26733d6ea9ad8 expref: 871 pid: 99935 timeout: 900431 lvb_type: 0 Mar 19 21:24:48 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message Mar 19 21:24:49 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.108.45@o2ib4) Mar 19 21:24:49 fir-md1-s2 kernel: Lustre: Skipped 92 previous similar messages Mar 19 21:25:18 fir-md1-s2 kernel: LustreError: 99938:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1553055828, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0001_UUID lock: ffff8ecf9c2eb3c0/0xefacb2c191464455 lrc: 3/0,1 mode: --/PW res: [0x24000dd55:0x1cca7:0x0].0x0 bits 0x40/0x0 rrc: 46 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 99938 timeout: 0 lvb_type: 0 Mar 19 21:25:18 fir-md1-s2 kernel: LustreError: 99938:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 7 previous similar messages Mar 19 21:25:48 fir-md1-s2 kernel: LustreError: 91218:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1553055858, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0001_UUID lock: ffff8ecfd4d02400/0xefacb2c191954562 lrc: 3/1,0 mode: --/PR res: [0x24000dd55:0x1cca7:0x0].0x0 bits 0x13/0x8 rrc: 46 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 91218 timeout: 0 lvb_type: 0 Mar 19 21:25:48 fir-md1-s2 kernel: LustreError: 91218:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 2 previous similar messages Mar 19 21:25:49 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client f6951712-d609-35d5-7d48-8ccf309c970b (at 10.9.101.19@o2ib4) reconnecting Mar 19 21:25:49 fir-md1-s2 kernel: Lustre: Skipped 92 previous similar messages Mar 19 21:26:18 fir-md1-s2 kernel: LustreError: 91521:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1553055888, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0001_UUID lock: ffff8edddee86c00/0xefacb2c191e6dfca lrc: 3/1,0 mode: --/PR res: [0x24000dd55:0x1cca7:0x0].0x0 bits 0x20/0x0 rrc: 46 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 91521 timeout: 0 lvb_type: 0 Mar 19 21:26:18 fir-md1-s2 kernel: LustreError: 91521:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 3 previous similar messages Mar 19 21:26:48 fir-md1-s2 kernel: LustreError: 91524:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1553055918, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0001_UUID lock: ffff8ece91657740/0xefacb2c192385c17 lrc: 3/1,0 mode: --/PR res: [0x24000dd55:0x1cca7:0x0].0x0 bits 0x20/0x0 rrc: 46 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 91524 timeout: 0 lvb_type: 0 Mar 19 21:27:08 fir-md1-s2 kernel: LNet: Service thread pid 91237 was inactive for 200.41s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Mar 19 21:27:08 fir-md1-s2 kernel: LNet: Skipped 3 previous similar messages Mar 19 21:27:08 fir-md1-s2 kernel: Pid: 91237, comm: mdt00_011 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 19 21:27:08 fir-md1-s2 kernel: Call Trace: Mar 19 21:27:08 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 19 21:27:08 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 19 21:27:09 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Mar 19 21:27:09 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Mar 19 21:27:09 fir-md1-s2 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Mar 19 21:27:09 fir-md1-s2 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Mar 19 21:27:09 fir-md1-s2 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Mar 19 21:27:09 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Mar 19 21:27:09 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Mar 19 21:27:09 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Mar 19 21:27:09 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Mar 19 21:27:09 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 19 21:27:09 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 19 21:27:09 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 19 21:27:09 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 19 21:27:09 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 19 21:27:09 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 19 21:27:09 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1553056029.91237 Mar 19 21:27:09 fir-md1-s2 kernel: Pid: 91587, comm: mdt03_023 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 19 21:27:09 fir-md1-s2 kernel: Call Trace: Mar 19 21:27:09 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 19 21:27:09 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 19 21:27:09 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Mar 19 21:27:09 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Mar 19 21:27:09 fir-md1-s2 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Mar 19 21:27:09 fir-md1-s2 kernel: [] mdt_hsm_state_set+0xc9/0x830 [mdt] Mar 19 21:27:09 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 19 21:27:09 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 19 21:27:09 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 19 21:27:09 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 19 21:27:09 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 19 21:27:09 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 19 21:27:09 fir-md1-s2 kernel: Pid: 91467, comm: mdt01_066 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 19 21:27:09 fir-md1-s2 kernel: Call Trace: Mar 19 21:27:09 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 19 21:27:09 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 19 21:27:09 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Mar 19 21:27:09 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Mar 19 21:27:09 fir-md1-s2 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Mar 19 21:27:09 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Mar 19 21:27:09 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Mar 19 21:27:09 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Mar 19 21:27:09 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Mar 19 21:27:09 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 19 21:27:09 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 19 21:27:09 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 19 21:27:09 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 19 21:27:09 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 19 21:27:09 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 19 21:27:09 fir-md1-s2 kernel: Pid: 91418, comm: mdt00_038 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 19 21:27:09 fir-md1-s2 kernel: Call Trace: Mar 19 21:27:09 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 19 21:27:09 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 19 21:27:09 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Mar 19 21:27:09 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Mar 19 21:27:09 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Mar 19 21:27:09 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Mar 19 21:27:09 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Mar 19 21:27:09 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Mar 19 21:27:09 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Mar 19 21:27:09 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Mar 19 21:27:09 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Mar 19 21:27:09 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 19 21:27:09 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 19 21:27:09 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 19 21:27:09 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 19 21:27:09 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 19 21:27:09 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 19 21:27:09 fir-md1-s2 kernel: Pid: 14631, comm: mdt02_107 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 19 21:27:09 fir-md1-s2 kernel: Call Trace: Mar 19 21:27:09 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 19 21:27:09 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 19 21:27:09 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Mar 19 21:27:09 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Mar 19 21:27:09 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Mar 19 21:27:09 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Mar 19 21:27:09 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Mar 19 21:27:09 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Mar 19 21:27:09 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Mar 19 21:27:09 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Mar 19 21:27:09 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Mar 19 21:27:09 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 19 21:27:09 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 19 21:27:09 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 19 21:27:09 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 19 21:27:09 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 19 21:27:09 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 19 21:27:09 fir-md1-s2 kernel: LNet: Service thread pid 99938 was inactive for 201.12s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Mar 19 21:27:09 fir-md1-s2 kernel: LNet: Skipped 22 previous similar messages Mar 19 21:27:38 fir-md1-s2 kernel: LNet: Service thread pid 91665 was inactive for 200.30s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Mar 19 21:27:38 fir-md1-s2 kernel: LNet: Skipped 2 previous similar messages Mar 19 21:27:38 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1553056058.91665 Mar 19 21:27:48 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.9.101.19@o2ib4 ns: mdt-fir-MDT0001_UUID lock: ffff8ed22f6b6780/0xefacb2c191464360 lrc: 3/0,0 mode: PW/PW res: [0x24000dd55:0x1cca7:0x0].0x0 bits 0x40/0x0 rrc: 46 type: IBT flags: 0x60200400000020 nid: 10.9.101.19@o2ib4 remote: 0xcb7379996fd917f9 expref: 20 pid: 91524 timeout: 900611 lvb_type: 0 Mar 19 21:27:48 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message Mar 19 21:27:48 fir-md1-s2 kernel: LNet: Service thread pid 99938 completed after 239.84s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Mar 19 21:28:08 fir-md1-s2 kernel: LNet: Service thread pid 91521 was inactive for 200.44s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Mar 19 21:28:08 fir-md1-s2 kernel: LNet: Skipped 2 previous similar messages Mar 19 21:28:08 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1553056088.91521 Mar 19 21:28:18 fir-md1-s2 kernel: LNet: Service thread pid 91237 completed after 269.79s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Mar 19 21:28:38 fir-md1-s2 kernel: LNet: Service thread pid 91524 was inactive for 200.21s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Mar 19 21:28:38 fir-md1-s2 kernel: LNet: Skipped 3 previous similar messages Mar 19 21:28:38 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1553056118.91524 Mar 19 21:28:43 fir-md1-s2 kernel: Lustre: 91216:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8eee6db06300 x1626170850546800/t0(0) o101->476e933b-2664-4b57-53cf-d95b660fb2b3@10.9.101.5@o2ib4:18/0 lens 568/0 e 0 to 0 dl 1553056128 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Mar 19 21:28:43 fir-md1-s2 kernel: Lustre: 91216:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 24 previous similar messages Mar 19 21:28:48 fir-md1-s2 kernel: LNet: Service thread pid 91621 completed after 299.80s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Mar 19 21:29:18 fir-md1-s2 kernel: LustreError: 91250:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1553056068, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0001_UUID lock: ffff8ec440272ac0/0xefacb2c193d23438 lrc: 3/1,0 mode: --/PR res: [0x24000dd55:0x1cca7:0x0].0x0 bits 0x20/0x0 rrc: 45 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 91250 timeout: 0 lvb_type: 0 Mar 19 21:31:09 fir-md1-s2 kernel: LNet: Service thread pid 91250 was inactive for 200.69s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Mar 19 21:31:09 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1553056269.91250 Mar 19 21:31:18 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.9.108.48@o2ib4 ns: mdt-fir-MDT0001_UUID lock: ffff8ede1e7a4140/0xefacb2c191464693 lrc: 3/0,0 mode: PW/PW res: [0x24000dd55:0x1cca7:0x0].0x0 bits 0x40/0x0 rrc: 44 type: IBT flags: 0x60200400000020 nid: 10.9.108.48@o2ib4 remote: 0x3cfd0642dc592920 expref: 39915 pid: 91621 timeout: 900821 lvb_type: 0 Mar 19 21:31:18 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 2 previous similar messages Mar 19 21:31:18 fir-md1-s2 kernel: LNet: Service thread pid 91623 completed after 450.26s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Mar 19 21:31:18 fir-md1-s2 kernel: LustreError: 91181:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8ef04ff9e000 ns: mdt-fir-MDT0001_UUID lock: ffff8ecfa2855580/0xefacb2c194235eae lrc: 3/0,0 mode: PR/PR res: [0x24000dd55:0x1cca7:0x0].0x0 bits 0x20/0x0 rrc: 36 type: IBT flags: 0x50200000000000 nid: 10.9.101.5@o2ib4 remote: 0xc6cd4826984668c0 expref: 2 pid: 91181 timeout: 0 lvb_type: 0 Mar 19 21:31:18 fir-md1-s2 kernel: LustreError: 91181:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 3 previous similar messages Mar 19 21:31:18 fir-md1-s2 kernel: Lustre: 91181:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:150s); client may timeout. req@ffff8eee6db06300 x1626170850546800/t0(0) o101->476e933b-2664-4b57-53cf-d95b660fb2b3@10.9.101.5@o2ib4:18/0 lens 568/1688 e 0 to 0 dl 1553056128 ref 1 fl Complete:/0/0 rc -107/-107 Mar 19 21:31:18 fir-md1-s2 kernel: LNet: Skipped 13 previous similar messages Mar 19 21:31:18 fir-md1-s2 kernel: LustreError: 90865:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8eef0a359b00 x1628271872173456/t0(0) o104->fir-MDT0001@10.9.108.48@o2ib4:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Mar 19 21:31:52 fir-md1-s2 kernel: LustreError: 91530:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8ee8db33c800 ns: mdt-fir-MDT0001_UUID lock: ffff8ecfe1a6f2c0/0xefacb2c196318820 lrc: 3/0,0 mode: PR/PR res: [0x24000cdb1:0x1e2:0x0].0x0 bits 0x20/0x0 rrc: 17 type: IBT flags: 0x50200000000000 nid: 10.9.101.49@o2ib4 remote: 0x6401da3efac5b6be expref: 9 pid: 91530 timeout: 0 lvb_type: 0 Mar 19 21:31:52 fir-md1-s2 kernel: LustreError: 91530:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 6 previous similar messages Mar 19 21:43:20 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 98bc7d1b-b251-d2f4-d279-8f037b579329 (at 10.8.23.21@o2ib6) Mar 19 21:43:20 fir-md1-s2 kernel: Lustre: Skipped 119 previous similar messages Mar 19 21:45:13 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client ee572418-61fd-bb0f-2f27-b402c449535f (at 10.8.11.9@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8ec904aa8000, cur 1553057113 expire 1553056963 last 1553056886 Mar 19 21:45:13 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 19 22:45:52 fir-md1-s2 kernel: Lustre: 91502:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8eb65c496900 x1626143084349728/t0(0) o101->a9fc6563-6d47-034e-91c7-6822b2414257@10.9.101.26@o2ib4:27/0 lens 576/3264 e 1 to 0 dl 1553060757 ref 2 fl Interpret:/0/0 rc 0/0 Mar 19 22:45:52 fir-md1-s2 kernel: Lustre: 91502:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 17 previous similar messages Mar 19 22:45:58 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client a9fc6563-6d47-034e-91c7-6822b2414257 (at 10.9.101.26@o2ib4) reconnecting Mar 19 22:45:58 fir-md1-s2 kernel: Lustre: Skipped 92 previous similar messages Mar 19 22:45:58 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to a9fc6563-6d47-034e-91c7-6822b2414257 (at 10.9.101.26@o2ib4) Mar 19 22:45:58 fir-md1-s2 kernel: Lustre: Skipped 2 previous similar messages Mar 19 22:47:07 fir-md1-s2 kernel: LustreError: 91609:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1553060737, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0001_UUID lock: ffff8ecac7eb3f00/0xefacb2c1c6e50fff lrc: 3/0,1 mode: --/PW res: [0x24000ddc4:0xffaa:0x0].0x0 bits 0x40/0x0 rrc: 71 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 91609 timeout: 0 lvb_type: 0 Mar 19 22:47:07 fir-md1-s2 kernel: LustreError: 91609:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 29 previous similar messages Mar 19 22:47:10 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client ae2661a8-bdb6-51b2-7483-d53084bfe4c1 (at 10.9.101.27@o2ib4) reconnecting Mar 19 22:47:10 fir-md1-s2 kernel: Lustre: Skipped 51 previous similar messages Mar 19 22:47:22 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to a9fc6563-6d47-034e-91c7-6822b2414257 (at 10.9.101.26@o2ib4) Mar 19 22:47:22 fir-md1-s2 kernel: Lustre: Skipped 57 previous similar messages Mar 19 22:48:07 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.9.101.31@o2ib4 ns: mdt-fir-MDT0001_UUID lock: ffff8ecf7b744ec0/0xefacb2c1c6e50e8c lrc: 3/0,0 mode: PW/PW res: [0x24000ddc4:0xffaa:0x0].0x0 bits 0x40/0x0 rrc: 71 type: IBT flags: 0x60200400000020 nid: 10.9.101.31@o2ib4 remote: 0x5b828eff93c9c231 expref: 169 pid: 91648 timeout: 905430 lvb_type: 0 Mar 19 22:48:07 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message Mar 19 22:48:32 fir-md1-s2 kernel: Lustre: 91437:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8ec4aa6d0300 x1626604637614544/t0(0) o101->c72dbe07-bbb6-4034-feaa-0d7e6ff81ec3@10.9.101.47@o2ib4:7/0 lens 600/3264 e 0 to 0 dl 1553060917 ref 2 fl Interpret:/0/0 rc 0/0 Mar 19 22:48:32 fir-md1-s2 kernel: Lustre: 91437:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 10 previous similar messages Mar 19 22:48:58 fir-md1-s2 kernel: LNet: Service thread pid 91211 was inactive for 200.70s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Mar 19 22:48:58 fir-md1-s2 kernel: LNet: Skipped 4 previous similar messages Mar 19 22:48:58 fir-md1-s2 kernel: Pid: 91211, comm: mdt00_005 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 19 22:48:58 fir-md1-s2 kernel: Call Trace: Mar 19 22:48:58 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 19 22:48:58 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 19 22:48:58 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Mar 19 22:48:58 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Mar 19 22:48:58 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Mar 19 22:48:58 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Mar 19 22:48:58 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Mar 19 22:48:58 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Mar 19 22:48:58 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Mar 19 22:48:58 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Mar 19 22:48:58 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Mar 19 22:48:58 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 19 22:48:58 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 19 22:48:58 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 19 22:48:58 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 19 22:48:58 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 19 22:48:58 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 19 22:48:58 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1553060938.91211 Mar 19 22:48:59 fir-md1-s2 kernel: LNet: Service thread pid 91498 was inactive for 201.90s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Mar 19 22:48:59 fir-md1-s2 kernel: Pid: 91498, comm: mdt00_050 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 19 22:48:59 fir-md1-s2 kernel: Call Trace: Mar 19 22:48:59 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 19 22:48:59 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 19 22:48:59 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Mar 19 22:48:59 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Mar 19 22:48:59 fir-md1-s2 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Mar 19 22:48:59 fir-md1-s2 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Mar 19 22:48:59 fir-md1-s2 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Mar 19 22:48:59 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Mar 19 22:48:59 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Mar 19 22:48:59 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Mar 19 22:48:59 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Mar 19 22:48:59 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 19 22:48:59 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 19 22:48:59 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 19 22:48:59 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 19 22:48:59 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 19 22:48:59 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 19 22:48:59 fir-md1-s2 kernel: Pid: 91534, comm: mdt03_015 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 19 22:48:59 fir-md1-s2 kernel: Call Trace: Mar 19 22:48:59 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 19 22:48:59 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 19 22:48:59 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Mar 19 22:48:59 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Mar 19 22:48:59 fir-md1-s2 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Mar 19 22:48:59 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Mar 19 22:48:59 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Mar 19 22:48:59 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Mar 19 22:48:59 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Mar 19 22:48:59 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 19 22:48:59 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 19 22:48:59 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 19 22:48:59 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 19 22:48:59 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 19 22:48:59 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 19 22:48:59 fir-md1-s2 kernel: Pid: 91240, comm: mdt03_007 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 19 22:48:59 fir-md1-s2 kernel: Call Trace: Mar 19 22:48:59 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 19 22:48:59 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 19 22:49:00 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Mar 19 22:49:00 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Mar 19 22:49:00 fir-md1-s2 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Mar 19 22:49:00 fir-md1-s2 kernel: [] mdt_hsm_state_set+0xc9/0x830 [mdt] Mar 19 22:49:00 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 19 22:49:00 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 19 22:49:00 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 19 22:49:00 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 19 22:49:00 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 19 22:49:00 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 19 22:49:00 fir-md1-s2 kernel: Pid: 91339, comm: mdt01_032 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 19 22:49:00 fir-md1-s2 kernel: Call Trace: Mar 19 22:49:00 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 19 22:49:00 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 19 22:49:00 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Mar 19 22:49:00 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Mar 19 22:49:00 fir-md1-s2 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Mar 19 22:49:00 fir-md1-s2 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Mar 19 22:49:00 fir-md1-s2 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Mar 19 22:49:00 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Mar 19 22:49:00 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Mar 19 22:49:00 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Mar 19 22:49:00 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Mar 19 22:49:00 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 19 22:49:00 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 19 22:49:00 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 19 22:49:00 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 19 22:49:00 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 19 22:49:00 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 19 22:49:00 fir-md1-s2 kernel: LNet: Service thread pid 91455 was inactive for 202.37s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Mar 19 22:49:07 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.9.101.25@o2ib4 ns: mdt-fir-MDT0001_UUID lock: ffff8ed5ec205a00/0xefacb2c1c6e511b8 lrc: 3/0,0 mode: PW/PW res: [0x24000ddc4:0xffaa:0x0].0x0 bits 0x40/0x0 rrc: 65 type: IBT flags: 0x60200400000020 nid: 10.9.101.25@o2ib4 remote: 0x130990a3ab37e94e expref: 38394 pid: 91521 timeout: 905490 lvb_type: 0 Mar 19 22:49:07 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message Mar 19 22:49:07 fir-md1-s2 kernel: LNet: Service thread pid 91544 completed after 210.08s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Mar 19 22:49:07 fir-md1-s2 kernel: LustreError: 91459:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8eea18fd4400 ns: mdt-fir-MDT0001_UUID lock: ffff8ed997e9a400/0xefacb2c1c6e513a2 lrc: 3/0,0 mode: PR/PR res: [0x24000ddc4:0xffaa:0x0].0x0 bits 0x20/0x0 rrc: 55 type: IBT flags: 0x50200400000020 nid: 10.9.101.31@o2ib4 remote: 0x5b828eff93c9c246 expref: 2 pid: 91459 timeout: 0 lvb_type: 0 Mar 19 22:49:07 fir-md1-s2 kernel: Lustre: 91459:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (168:42s); client may timeout. req@ffff8ed8cd052700 x1628086894204352/t0(0) o101->09e9e1d9-5be8-aaa1-5ac3-b00b2b9bfed3@10.9.101.31@o2ib4:27/0 lens 568/1688 e 1 to 0 dl 1553060905 ref 1 fl Complete:/0/0 rc -107/-107 Mar 19 22:49:07 fir-md1-s2 kernel: Lustre: 91459:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 5 previous similar messages Mar 19 22:49:07 fir-md1-s2 kernel: LNet: Skipped 10 previous similar messages Mar 19 22:49:09 fir-md1-s2 kernel: LustreError: 91609:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8ec25b33b900 x1628271923322000/t0(0) o104->fir-MDT0001@10.9.101.25@o2ib4:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Mar 19 22:49:29 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 38e3b9a9-e795-c201-f4b8-b5225592d91e (at 10.9.101.42@o2ib4) reconnecting Mar 19 22:49:29 fir-md1-s2 kernel: Lustre: Skipped 71 previous similar messages Mar 19 22:49:37 fir-md1-s2 kernel: LNet: Service thread pid 91498 completed after 239.52s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Mar 19 22:49:37 fir-md1-s2 kernel: LustreError: 91523:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8ebf8e142400 ns: mdt-fir-MDT0001_UUID lock: ffff8ed472a1b600/0xefacb2c1c8839a44 lrc: 3/0,0 mode: PR/PR res: [0x24000ddc4:0xffaa:0x0].0x0 bits 0x1b/0x0 rrc: 45 type: IBT flags: 0x50200400000020 nid: 10.9.101.33@o2ib4 remote: 0x1484515c26add67a expref: 5 pid: 91523 timeout: 0 lvb_type: 0 Mar 19 22:49:37 fir-md1-s2 kernel: Lustre: 91181:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:60s); client may timeout. req@ffff8eec597ea400 x1627859963698336/t0(0) o55->fae90558-7103-8632-b167-b669fa6c1d82@10.9.101.33@o2ib4:7/0 lens 472/192 e 0 to 0 dl 1553060917 ref 1 fl Complete:/0/0 rc -22/-22 Mar 19 22:49:37 fir-md1-s2 kernel: LNet: Skipped 2 previous similar messages Mar 20 00:25:53 fir-md1-s2 kernel: Lustre: 91432:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8ec81ff55400 x1626123514250256/t0(0) o55->810fe316-e09a-254c-3020-2540e531f84e@10.9.101.7@o2ib4:28/0 lens 472/224 e 1 to 0 dl 1553066758 ref 2 fl Interpret:/0/0 rc 0/0 Mar 20 00:25:53 fir-md1-s2 kernel: Lustre: 91432:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 17 previous similar messages Mar 20 00:25:59 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 7f326dd9-6b3c-45f1-d1dc-35b8a959cac2 (at 10.9.101.16@o2ib4) reconnecting Mar 20 00:25:59 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 20 00:25:59 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 810fe316-e09a-254c-3020-2540e531f84e (at 10.9.101.7@o2ib4) Mar 20 00:25:59 fir-md1-s2 kernel: Lustre: Skipped 71 previous similar messages Mar 20 00:26:07 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.9.101.9@o2ib4 ns: mdt-fir-MDT0001_UUID lock: ffff8eda0731da00/0xefacb2c21603a80f lrc: 3/0,0 mode: PW/PW res: [0x24000e4ad:0xc014:0x0].0x0 bits 0x40/0x0 rrc: 41 type: IBT flags: 0x60200400000020 nid: 10.9.101.9@o2ib4 remote: 0x20d2718795e3a3be expref: 24 pid: 91583 timeout: 911310 lvb_type: 0 Mar 20 00:26:07 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message Mar 20 00:26:07 fir-md1-s2 kernel: LustreError: 91556:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8ee8cfbdc000 ns: mdt-fir-MDT0001_UUID lock: ffff8eba07e44140/0xefacb2c21603a855 lrc: 3/0,0 mode: PW/PW res: [0x24000e4ad:0xc014:0x0].0x0 bits 0x40/0x0 rrc: 38 type: IBT flags: 0x50200400000020 nid: 10.9.101.9@o2ib4 remote: 0x20d2718795e3a3c5 expref: 8 pid: 91556 timeout: 0 lvb_type: 0 Mar 20 00:26:07 fir-md1-s2 kernel: LustreError: 91556:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 3 previous similar messages Mar 20 00:26:20 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 810fe316-e09a-254c-3020-2540e531f84e (at 10.9.101.7@o2ib4) Mar 20 00:26:20 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages Mar 20 00:26:24 fir-md1-s2 kernel: Lustre: 91607:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-26), not sending early reply req@ffff8ede8873a700 x1626131808591472/t0(0) o101->7f326dd9-6b3c-45f1-d1dc-35b8a959cac2@10.9.101.16@o2ib4:29/0 lens 576/3264 e 0 to 0 dl 1553066789 ref 2 fl Interpret:/0/0 rc 0/0 Mar 20 00:26:24 fir-md1-s2 kernel: Lustre: 91607:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 11 previous similar messages Mar 20 00:26:38 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 56f5734c-97c1-4112-5a94-6bf865e15363 (at 10.9.101.1@o2ib4) reconnecting Mar 20 00:26:38 fir-md1-s2 kernel: Lustre: Skipped 8 previous similar messages Mar 20 00:27:01 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.101.16@o2ib4) Mar 20 00:27:01 fir-md1-s2 kernel: Lustre: Skipped 10 previous similar messages Mar 20 00:27:08 fir-md1-s2 kernel: LustreError: 91446:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1553066738, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0001_UUID lock: ffff8ed7c62b8b40/0xefacb2c21603a943 lrc: 3/0,1 mode: --/PW res: [0x24000e4ad:0xc014:0x0].0x0 bits 0x2/0x0 rrc: 41 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 91446 timeout: 0 lvb_type: 0 Mar 20 00:27:08 fir-md1-s2 kernel: LustreError: 91446:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 13 previous similar messages Mar 20 00:27:37 fir-md1-s2 kernel: LustreError: 91585:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1553066767, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0001_UUID lock: ffff8ec0417cec00/0xefacb2c2166183ea lrc: 3/1,0 mode: --/PR res: [0x24000e4ad:0xc014:0x0].0x0 bits 0x20/0x0 rrc: 41 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 91585 timeout: 0 lvb_type: 0 Mar 20 00:27:37 fir-md1-s2 kernel: LustreError: 91585:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 1 previous similar message Mar 20 00:27:42 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 89beb8b3-791a-f6c6-2375-0d31fb61e8aa (at 10.9.101.11@o2ib4) reconnecting Mar 20 00:27:42 fir-md1-s2 kernel: Lustre: Skipped 25 previous similar messages Mar 20 00:28:26 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 810fe316-e09a-254c-3020-2540e531f84e (at 10.9.101.7@o2ib4) Mar 20 00:28:26 fir-md1-s2 kernel: Lustre: Skipped 30 previous similar messages Mar 20 00:28:37 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.9.101.16@o2ib4 ns: mdt-fir-MDT0001_UUID lock: ffff8eca83679200/0xefacb2c21603a86a lrc: 3/0,0 mode: PW/PW res: [0x24000e4ad:0xc014:0x0].0x0 bits 0x40/0x0 rrc: 41 type: IBT flags: 0x60200400000020 nid: 10.9.101.16@o2ib4 remote: 0x3c8988643a9a4b2d expref: 53 pid: 91443 timeout: 911460 lvb_type: 0 Mar 20 00:28:37 fir-md1-s2 kernel: LustreError: 14624:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8ef04aab6000 ns: mdt-fir-MDT0001_UUID lock: ffff8eb64f6f2400/0xefacb2c21603a8f6 lrc: 3/0,0 mode: --/PW res: [0x24000e4ad:0xc014:0x0].0x0 bits 0x40/0x0 rrc: 40 type: IBT flags: 0x54a01400000020 nid: 10.9.101.16@o2ib4 remote: 0x3c8988643a9a4b34 expref: 33 pid: 14624 timeout: 0 lvb_type: 0 Mar 20 00:28:37 fir-md1-s2 kernel: Lustre: 14610:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:149s); client may timeout. req@ffff8ebd4cb4ec00 x1626174027348944/t0(0) o101->1831b60b-c0b3-6e16-f786-e9804146d690@10.9.101.9@o2ib4:8/0 lens 576/1168 e 0 to 0 dl 1553066768 ref 1 fl Complete:/0/0 rc -107/-107 Mar 20 00:28:37 fir-md1-s2 kernel: Lustre: 14610:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 4 previous similar messages Mar 20 00:29:02 fir-md1-s2 kernel: Lustre: 91612:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8ecdefff3300 x1626174027370400/t0(0) o101->1831b60b-c0b3-6e16-f786-e9804146d690@10.9.101.9@o2ib4:7/0 lens 568/0 e 0 to 0 dl 1553066947 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Mar 20 00:29:02 fir-md1-s2 kernel: Lustre: 91612:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 21 previous similar messages Mar 20 00:30:07 fir-md1-s2 kernel: LustreError: 99935:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1553066917, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0001_UUID lock: ffff8eec4dac9680/0xefacb2c21844d50b lrc: 3/0,1 mode: --/PW res: [0x24000e4ad:0xc014:0x0].0x0 bits 0x40/0x0 rrc: 52 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 99935 timeout: 0 lvb_type: 0 Mar 20 00:30:07 fir-md1-s2 kernel: LustreError: 99935:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 22 previous similar messages Mar 20 00:30:10 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client f6951712-d609-35d5-7d48-8ccf309c970b (at 10.9.101.19@o2ib4) reconnecting Mar 20 00:30:10 fir-md1-s2 kernel: Lustre: Skipped 37 previous similar messages Mar 20 00:31:05 fir-md1-s2 kernel: Lustre: 91215:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8ebcf4722700 x1628036147407824/t0(0) o101->f2f4d52c-2807-60dd-8532-99fa3e9aeefa@10.0.10.3@o2ib7:10/0 lens 576/3264 e 0 to 0 dl 1553067070 ref 2 fl Interpret:/0/0 rc 0/0 Mar 20 00:31:05 fir-md1-s2 kernel: Lustre: 91215:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 6 previous similar messages Mar 20 00:31:07 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.9.101.9@o2ib4 ns: mdt-fir-MDT0001_UUID lock: ffff8ecdfdb8b180/0xefacb2c21844d4d3 lrc: 3/0,0 mode: PW/PW res: [0x24000e4ad:0xc014:0x0].0x0 bits 0x40/0x0 rrc: 54 type: IBT flags: 0x60200400000020 nid: 10.9.101.9@o2ib4 remote: 0x20d2718795e3a611 expref: 27 pid: 91585 timeout: 911610 lvb_type: 0 Mar 20 00:31:11 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to f2f4d52c-2807-60dd-8532-99fa3e9aeefa (at 10.0.10.3@o2ib7) Mar 20 00:31:11 fir-md1-s2 kernel: Lustre: Skipped 42 previous similar messages Mar 20 00:31:37 fir-md1-s2 kernel: LustreError: 99932:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8ebd4ca86000 ns: mdt-fir-MDT0001_UUID lock: ffff8ecdfdb8b3c0/0xefacb2c21844d93a lrc: 3/0,0 mode: PW/PW res: [0x24000e4ad:0xc014:0x0].0x0 bits 0x40/0x0 rrc: 48 type: IBT flags: 0x50200400000020 nid: 10.9.101.9@o2ib4 remote: 0x20d2718795e3a618 expref: 7 pid: 99932 timeout: 0 lvb_type: 0 Mar 20 00:31:37 fir-md1-s2 kernel: LustreError: 99932:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 4 previous similar messages Mar 20 00:31:37 fir-md1-s2 kernel: Lustre: 99932:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (154:26s); client may timeout. req@ffff8ee91ae7f500 x1626174027370272/t0(0) o101->1831b60b-c0b3-6e16-f786-e9804146d690@10.9.101.9@o2ib4:7/0 lens 480/536 e 0 to 0 dl 1553067071 ref 1 fl Complete:/0/0 rc -107/-107 Mar 20 00:31:37 fir-md1-s2 kernel: Lustre: 99932:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 3 previous similar messages Mar 20 00:32:37 fir-md1-s2 kernel: LustreError: 91446:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1553067067, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0001_UUID lock: ffff8ec01981d580/0xefacb2c21a193905 lrc: 3/1,0 mode: --/PR res: [0x24000e4ad:0xc014:0x0].0x0 bits 0x20/0x0 rrc: 42 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 91446 timeout: 0 lvb_type: 0 Mar 20 00:32:37 fir-md1-s2 kernel: LustreError: 91446:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 3 previous similar messages Mar 20 00:33:07 fir-md1-s2 kernel: LustreError: 90855:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1553067097, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0001_UUID lock: ffff8ec29ef2b840/0xefacb2c21a739a97 lrc: 3/1,0 mode: --/PR res: [0x24000e4ad:0xc014:0x0].0x0 bits 0x13/0x8 rrc: 42 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 90855 timeout: 0 lvb_type: 0 Mar 20 00:33:07 fir-md1-s2 kernel: LustreError: 90855:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 7 previous similar messages Mar 20 00:34:07 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.9.101.7@o2ib4 ns: mdt-fir-MDT0001_UUID lock: ffff8eddccbac140/0xefacb2c21844dae5 lrc: 3/0,0 mode: PW/PW res: [0x24000e4ad:0xc014:0x0].0x0 bits 0x40/0x0 rrc: 42 type: IBT flags: 0x60200400000020 nid: 10.9.101.7@o2ib4 remote: 0xc8b112266502dd46 expref: 42 pid: 91246 timeout: 911790 lvb_type: 0 Mar 20 00:34:07 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message Mar 20 00:34:07 fir-md1-s2 kernel: LustreError: 91440:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8eea3c748c00 ns: mdt-fir-MDT0001_UUID lock: ffff8ece9c6df980/0xefacb2c21a1938f7 lrc: 3/0,0 mode: PW/PW res: [0x24000e4ad:0xc014:0x0].0x0 bits 0x40/0x0 rrc: 38 type: IBT flags: 0x50200400000020 nid: 10.9.101.7@o2ib4 remote: 0xc8b112266502de8f expref: 30 pid: 91440 timeout: 0 lvb_type: 0 Mar 20 00:34:07 fir-md1-s2 kernel: LustreError: 91440:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 2 previous similar messages Mar 20 00:34:07 fir-md1-s2 kernel: Lustre: 91446:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:150s); client may timeout. req@ffff8eb94de8e900 x1626208314343984/t0(0) o101->34c03761-c352-7fa2-a35d-e32b61a041a3@10.9.101.53@o2ib4:7/0 lens 568/1688 e 0 to 0 dl 1553067097 ref 1 fl Complete:/0/0 rc -107/-107 Mar 20 00:34:32 fir-md1-s2 kernel: Lustre: 91254:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8ec953e63c00 x1626131808641344/t0(0) o101->7f326dd9-6b3c-45f1-d1dc-35b8a959cac2@10.9.101.16@o2ib4:7/0 lens 568/0 e 0 to 0 dl 1553067277 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Mar 20 00:34:32 fir-md1-s2 kernel: Lustre: 91254:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 18 previous similar messages Mar 20 00:34:38 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 56f5734c-97c1-4112-5a94-6bf865e15363 (at 10.9.101.1@o2ib4) reconnecting Mar 20 00:34:38 fir-md1-s2 kernel: Lustre: Skipped 73 previous similar messages Mar 20 00:35:37 fir-md1-s2 kernel: LustreError: 99936:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1553067247, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0001_UUID lock: ffff8eec462a4a40/0xefacb2c21c5db2b5 lrc: 3/0,1 mode: --/PW res: [0x24000e4ad:0xc014:0x0].0x0 bits 0x21/0x0 rrc: 39 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 99936 timeout: 0 lvb_type: 0 Mar 20 00:35:37 fir-md1-s2 kernel: LustreError: 90864:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1553067247, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0001_UUID lock: ffff8eef69742880/0xefacb2c21c5db2ae lrc: 3/0,1 mode: --/PW res: [0x24000e4ad:0xc014:0x0].0x0 bits 0x40/0x0 rrc: 41 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 90864 timeout: 0 lvb_type: 0 Mar 20 00:35:37 fir-md1-s2 kernel: LustreError: 91533:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1553067247, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0001_UUID lock: ffff8ed076c56c00/0xefacb2c21c5db292 lrc: 3/0,1 mode: --/PW res: [0x24000e4ad:0xc014:0x0].0x0 bits 0x40/0x0 rrc: 38 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 91533 timeout: 0 lvb_type: 0 Mar 20 00:35:37 fir-md1-s2 kernel: LustreError: 90864:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 4 previous similar messages Mar 20 00:35:37 fir-md1-s2 kernel: LustreError: 91533:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 4 previous similar messages Mar 20 00:35:37 fir-md1-s2 kernel: LustreError: 99936:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 7 previous similar messages Mar 20 00:36:11 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.101.9@o2ib4) Mar 20 00:36:11 fir-md1-s2 kernel: Lustre: Skipped 86 previous similar messages Mar 20 00:36:37 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.9.101.19@o2ib4 ns: mdt-fir-MDT0001_UUID lock: ffff8ed0237fda00/0xefacb2c21c5db28b lrc: 3/0,0 mode: PW/PW res: [0x24000e4ad:0xc014:0x0].0x0 bits 0x40/0x0 rrc: 38 type: IBT flags: 0x60200400000020 nid: 10.9.101.19@o2ib4 remote: 0xcb7379996fda9f97 expref: 44 pid: 91223 timeout: 911940 lvb_type: 0 Mar 20 00:37:28 fir-md1-s2 kernel: LNet: Service thread pid 99934 was inactive for 200.61s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Mar 20 00:37:28 fir-md1-s2 kernel: LNet: Skipped 3 previous similar messages Mar 20 00:37:28 fir-md1-s2 kernel: Pid: 99934, comm: mdt03_031 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 20 00:37:28 fir-md1-s2 kernel: Call Trace: Mar 20 00:37:28 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 20 00:37:28 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 20 00:37:28 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Mar 20 00:37:28 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Mar 20 00:37:28 fir-md1-s2 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Mar 20 00:37:28 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Mar 20 00:37:28 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Mar 20 00:37:28 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Mar 20 00:37:28 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Mar 20 00:37:28 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 20 00:37:28 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 20 00:37:28 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 20 00:37:28 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 20 00:37:28 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 20 00:37:28 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 20 00:37:28 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1553067448.99934 Mar 20 00:37:29 fir-md1-s2 kernel: LNet: Service thread pid 91590 was inactive for 202.02s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Mar 20 00:37:29 fir-md1-s2 kernel: Pid: 91590, comm: mdt00_069 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 20 00:37:29 fir-md1-s2 kernel: Call Trace: Mar 20 00:37:29 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 20 00:37:29 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 20 00:37:29 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Mar 20 00:37:29 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Mar 20 00:37:29 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Mar 20 00:37:29 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Mar 20 00:37:29 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Mar 20 00:37:29 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Mar 20 00:37:29 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Mar 20 00:37:29 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Mar 20 00:37:29 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Mar 20 00:37:29 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 20 00:37:29 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 20 00:37:29 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 20 00:37:29 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 20 00:37:29 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 20 00:37:29 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 20 00:37:29 fir-md1-s2 kernel: Pid: 99932, comm: mdt03_029 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 20 00:37:29 fir-md1-s2 kernel: Call Trace: Mar 20 00:37:29 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 20 00:37:29 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 20 00:37:29 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Mar 20 00:37:29 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Mar 20 00:37:29 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Mar 20 00:37:29 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Mar 20 00:37:29 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Mar 20 00:37:29 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Mar 20 00:37:29 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Mar 20 00:37:29 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Mar 20 00:37:29 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Mar 20 00:37:29 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 20 00:37:29 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 20 00:37:29 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 20 00:37:29 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 20 00:37:29 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 20 00:37:29 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 20 00:37:29 fir-md1-s2 kernel: Pid: 91244, comm: mdt03_009 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 20 00:37:29 fir-md1-s2 kernel: Call Trace: Mar 20 00:37:29 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 20 00:37:29 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 20 00:37:29 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Mar 20 00:37:29 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Mar 20 00:37:29 fir-md1-s2 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Mar 20 00:37:29 fir-md1-s2 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Mar 20 00:37:29 fir-md1-s2 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Mar 20 00:37:29 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Mar 20 00:37:29 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Mar 20 00:37:29 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Mar 20 00:37:29 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Mar 20 00:37:29 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 20 00:37:29 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 20 00:37:29 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 20 00:37:29 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 20 00:37:29 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 20 00:37:29 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 20 00:37:29 fir-md1-s2 kernel: Pid: 91512, comm: mdt01_081 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 20 00:37:29 fir-md1-s2 kernel: Call Trace: Mar 20 00:37:29 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 20 00:37:29 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 20 00:37:29 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Mar 20 00:37:29 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Mar 20 00:37:29 fir-md1-s2 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Mar 20 00:37:29 fir-md1-s2 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Mar 20 00:37:29 fir-md1-s2 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Mar 20 00:37:29 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Mar 20 00:37:29 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Mar 20 00:37:30 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Mar 20 00:37:30 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Mar 20 00:37:30 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 20 00:37:30 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 20 00:37:30 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 20 00:37:30 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 20 00:37:30 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 20 00:37:30 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 20 00:37:30 fir-md1-s2 kernel: LNet: Service thread pid 91486 was inactive for 202.58s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Mar 20 00:37:30 fir-md1-s2 kernel: LNet: Skipped 8 previous similar messages Mar 20 00:38:07 fir-md1-s2 kernel: LustreError: 91619:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1553067397, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0001_UUID lock: ffff8ec04cdc6780/0xefacb2c21e4fac31 lrc: 3/1,0 mode: --/PR res: [0x24000e4ad:0xc014:0x0].0x0 bits 0x20/0x0 rrc: 37 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 91619 timeout: 0 lvb_type: 0 Mar 20 00:39:07 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.9.101.1@o2ib4 ns: mdt-fir-MDT0001_UUID lock: ffff8ed076c56c00/0xefacb2c21c5db292 lrc: 3/0,0 mode: PW/PW res: [0x24000e4ad:0xc014:0x0].0x0 bits 0x40/0x0 rrc: 37 type: IBT flags: 0x60200400000020 nid: 10.9.101.1@o2ib4 remote: 0x10c07d3d7727a4fc expref: 1159 pid: 91533 timeout: 912090 lvb_type: 0 Mar 20 00:39:07 fir-md1-s2 kernel: LNet: Service thread pid 91244 completed after 299.91s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Mar 20 00:39:32 fir-md1-s2 kernel: Lustre: 91623:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8ebce0473000 x1626208314436448/t0(0) o101->34c03761-c352-7fa2-a35d-e32b61a041a3@10.9.101.53@o2ib4:7/0 lens 568/0 e 0 to 0 dl 1553067577 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Mar 20 00:39:32 fir-md1-s2 kernel: Lustre: 91623:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 14 previous similar messages Mar 20 00:39:57 fir-md1-s2 kernel: LNet: Service thread pid 91619 was inactive for 200.21s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Mar 20 00:39:57 fir-md1-s2 kernel: LNet: Skipped 7 previous similar messages Mar 20 00:39:57 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1553067597.91619 Mar 20 00:40:37 fir-md1-s2 kernel: LustreError: 91381:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1553067547, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0001_UUID lock: ffff8ebd5ae44a40/0xefacb2c2202b925a lrc: 3/1,0 mode: --/PR res: [0x24000e4ad:0xc014:0x0].0x0 bits 0x20/0x0 rrc: 37 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 91381 timeout: 0 lvb_type: 0 Mar 20 00:41:37 fir-md1-s2 kernel: LNet: Service thread pid 90864 completed after 449.86s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Mar 20 00:41:37 fir-md1-s2 kernel: LNet: Service thread pid 99936 completed after 449.86s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Mar 20 00:41:37 fir-md1-s2 kernel: LNet: Skipped 2 previous similar messages Mar 20 00:42:28 fir-md1-s2 kernel: LNet: Service thread pid 91381 was inactive for 200.73s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Mar 20 00:42:28 fir-md1-s2 kernel: LNet: Skipped 3 previous similar messages Mar 20 00:42:28 fir-md1-s2 kernel: Pid: 91381, comm: mdt00_030 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 20 00:42:28 fir-md1-s2 kernel: Call Trace: Mar 20 00:42:28 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 20 00:42:28 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 20 00:42:28 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Mar 20 00:42:28 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Mar 20 00:42:28 fir-md1-s2 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Mar 20 00:42:28 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Mar 20 00:42:28 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Mar 20 00:42:28 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Mar 20 00:42:28 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Mar 20 00:42:28 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 20 00:42:28 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 20 00:42:28 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 20 00:42:28 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 20 00:42:28 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 20 00:42:28 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 20 00:42:28 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1553067748.91381 Mar 20 00:42:41 fir-md1-s2 kernel: LNet: Service thread pid 14634 was inactive for 200.20s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Mar 20 00:42:41 fir-md1-s2 kernel: Pid: 14634, comm: mdt00_103 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 20 00:42:41 fir-md1-s2 kernel: Call Trace: Mar 20 00:42:41 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 20 00:42:41 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 20 00:42:41 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Mar 20 00:42:41 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Mar 20 00:42:41 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x11d/0x1c30 [mdt] Mar 20 00:42:41 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Mar 20 00:42:41 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Mar 20 00:42:41 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Mar 20 00:42:41 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Mar 20 00:42:41 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Mar 20 00:42:41 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 20 00:42:41 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 20 00:42:41 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 20 00:42:41 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 20 00:42:41 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 20 00:42:41 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 20 00:42:41 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1553067761.14634 Mar 20 00:43:07 fir-md1-s2 kernel: LustreError: 91246:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1553067697, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0001_UUID lock: ffff8ecdff2ada00/0xefacb2c221fbbee1 lrc: 3/0,1 mode: --/PW res: [0x24000e4ad:0xc014:0x0].0x0 bits 0x40/0x0 rrc: 35 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 91246 timeout: 0 lvb_type: 0 Mar 20 00:43:07 fir-md1-s2 kernel: LustreError: 91246:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 2 previous similar messages Mar 20 00:43:25 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 1831b60b-c0b3-6e16-f786-e9804146d690 (at 10.9.101.9@o2ib4) reconnecting Mar 20 00:43:25 fir-md1-s2 kernel: Lustre: Skipped 144 previous similar messages Mar 20 00:44:07 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.9.101.16@o2ib4 ns: mdt-fir-MDT0001_UUID lock: ffff8ebff45e8000/0xefacb2c21c5db2f4 lrc: 3/0,0 mode: PW/PW res: [0x24000e4ad:0xc014:0x0].0x0 bits 0x40/0x0 rrc: 35 type: IBT flags: 0x60200400000020 nid: 10.9.101.16@o2ib4 remote: 0x3c8988643a9a4f39 expref: 40 pid: 14615 timeout: 912390 lvb_type: 0 Mar 20 00:44:07 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message Mar 20 00:44:07 fir-md1-s2 kernel: LNet: Service thread pid 91512 completed after 599.91s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Mar 20 00:44:07 fir-md1-s2 kernel: LustreError: 91381:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8ee953764800 ns: mdt-fir-MDT0001_UUID lock: ffff8ebd5ae44a40/0xefacb2c2202b925a lrc: 3/0,0 mode: PR/PR res: [0x24000e4ad:0xc014:0x0].0x0 bits 0x20/0x0 rrc: 32 type: IBT flags: 0x50200000000000 nid: 10.9.101.53@o2ib4 remote: 0x29f37b2e54154496 expref: 2 pid: 91381 timeout: 0 lvb_type: 0 Mar 20 00:44:07 fir-md1-s2 kernel: LustreError: 91381:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 3 previous similar messages Mar 20 00:44:07 fir-md1-s2 kernel: Lustre: 91381:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (154:146s); client may timeout. req@ffff8ebce0473000 x1626208314436448/t0(0) o101->34c03761-c352-7fa2-a35d-e32b61a041a3@10.9.101.53@o2ib4:7/0 lens 568/1688 e 0 to 0 dl 1553067701 ref 1 fl Complete:/0/0 rc -107/-107 Mar 20 00:44:07 fir-md1-s2 kernel: LNet: Skipped 10 previous similar messages Mar 20 00:56:09 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 8a895c8b-0904-d7a9-2e7a-626cb7240359 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8ec5ea7e3000, cur 1553068569 expire 1553068419 last 1553068342 Mar 20 00:56:09 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages Mar 20 00:56:13 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 8a895c8b-0904-d7a9-2e7a-626cb7240359 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8ec332764c00, cur 1553068573 expire 1553068423 last 1553068346 Mar 20 00:58:24 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 2efa0721-d988-fc25-739d-9be810de2f6e (at 10.8.27.23@o2ib6) Mar 20 00:58:24 fir-md1-s2 kernel: Lustre: Skipped 137 previous similar messages Mar 20 01:31:44 fir-md1-s2 kernel: LustreError: 91577:0:(mdt_io.c:442:mdt_preprw_write()) fir-MDT0003: BRW to missing obj [0x28000e492:0xe4e:0x0] Mar 20 01:36:21 fir-md1-s2 kernel: Lustre: 14615:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1553070974/real 1553070974] req@ffff8eb823b54200 x1628272093512064/t0(0) o104->fir-MDT0001@10.8.11.4@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1553070981 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Mar 20 01:36:21 fir-md1-s2 kernel: Lustre: 14615:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Mar 20 01:36:28 fir-md1-s2 kernel: Lustre: 14615:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1553070981/real 1553070981] req@ffff8eb823b54200 x1628272093512064/t0(0) o104->fir-MDT0001@10.8.11.4@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1553070988 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Mar 20 01:36:35 fir-md1-s2 kernel: Lustre: 14615:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1553070988/real 1553070988] req@ffff8eb823b54200 x1628272093512064/t0(0) o104->fir-MDT0001@10.8.11.4@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1553070995 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Mar 20 01:36:39 fir-md1-s2 kernel: Lustre: 91488:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8ec891a01800 x1627848346959168/t0(0) o101->2d7c1da3-f558-5926-b626-eca8459eef0c@10.8.20.1@o2ib6:14/0 lens 592/3264 e 0 to 0 dl 1553071004 ref 2 fl Interpret:/0/0 rc 0/0 Mar 20 01:36:39 fir-md1-s2 kernel: Lustre: 91488:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 3 previous similar messages Mar 20 01:36:42 fir-md1-s2 kernel: Lustre: 14615:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1553070995/real 1553070995] req@ffff8eb823b54200 x1628272093512064/t0(0) o104->fir-MDT0001@10.8.11.4@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1553071002 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Mar 20 01:36:45 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 0c234f9d-15bc-4bac-1920-daec432d55cb (at 10.8.20.30@o2ib6) reconnecting Mar 20 01:36:45 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.22.2@o2ib6) Mar 20 01:36:45 fir-md1-s2 kernel: Lustre: Skipped 33 previous similar messages Mar 20 01:36:56 fir-md1-s2 kernel: Lustre: 14615:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1553071009/real 1553071009] req@ffff8eb823b54200 x1628272093512064/t0(0) o104->fir-MDT0001@10.8.11.4@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1553071016 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Mar 20 01:36:56 fir-md1-s2 kernel: Lustre: 14615:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Mar 20 01:37:16 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.0.64@o2ib4) Mar 20 01:37:16 fir-md1-s2 kernel: Lustre: Skipped 18 previous similar messages Mar 20 01:37:17 fir-md1-s2 kernel: Lustre: 14615:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1553071030/real 1553071030] req@ffff8eb823b54200 x1628272093512064/t0(0) o104->fir-MDT0001@10.8.11.4@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1553071037 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Mar 20 01:37:17 fir-md1-s2 kernel: Lustre: 14615:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Mar 20 01:37:44 fir-md1-s2 kernel: LustreError: 91540:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1553070974, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0001_UUID lock: ffff8ec8a07206c0/0xefacb2c25eeee219 lrc: 3/1,0 mode: --/PR res: [0x24000cd10:0x2429:0x0].0x0 bits 0x13/0x0 rrc: 28 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 91540 timeout: 0 lvb_type: 0 Mar 20 01:37:44 fir-md1-s2 kernel: LustreError: 91540:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 15 previous similar messages Mar 20 01:37:47 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.0.64@o2ib4) Mar 20 01:37:47 fir-md1-s2 kernel: Lustre: Skipped 19 previous similar messages Mar 20 01:37:48 fir-md1-s2 kernel: Lustre: 91499:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8ed1d6c39200 x1626223691519712/t0(0) o101->4f343c6c-b349-2002-fbcc-234c62ad21b0@10.8.11.13@o2ib6:23/0 lens 592/3264 e 0 to 0 dl 1553071073 ref 2 fl Interpret:/0/0 rc 0/0 Mar 20 01:37:48 fir-md1-s2 kernel: Lustre: 91499:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 19 previous similar messages Mar 20 01:37:52 fir-md1-s2 kernel: Lustre: 14615:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1553071065/real 1553071065] req@ffff8eb823b54200 x1628272093512064/t0(0) o104->fir-MDT0001@10.8.11.4@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1553071072 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Mar 20 01:37:52 fir-md1-s2 kernel: Lustre: 14615:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages Mar 20 01:38:16 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 5b5b9014-7e20-c609-9658-f3d8dc126e0d (at 10.8.10.23@o2ib6) reconnecting Mar 20 01:38:16 fir-md1-s2 kernel: Lustre: Skipped 39 previous similar messages Mar 20 01:38:25 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.11.13@o2ib6) Mar 20 01:38:25 fir-md1-s2 kernel: Lustre: Skipped 38 previous similar messages Mar 20 01:38:48 fir-md1-s2 kernel: LustreError: 14615:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.11.4@o2ib6) failed to reply to blocking AST (req@ffff8eb823b54200 x1628272093512064 status 0 rc -110), evict it ns: mdt-fir-MDT0001_UUID lock: ffff8ec87332e780/0xefacb2c25df623eb lrc: 4/0,0 mode: PR/PR res: [0x24000cd10:0x2429:0x0].0x0 bits 0x13/0x0 rrc: 25 type: IBT flags: 0x60200400000020 nid: 10.8.11.4@o2ib6 remote: 0x4b35ae9538a29a3a expref: 26 pid: 91269 timeout: 915814 lvb_type: 0 Mar 20 01:38:48 fir-md1-s2 kernel: LustreError: 138-a: fir-MDT0001: A client on nid 10.8.11.4@o2ib6 was evicted due to a lock blocking callback time out: rc -110 Mar 20 01:38:48 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 154s: evicting client at 10.8.11.4@o2ib6 ns: mdt-fir-MDT0001_UUID lock: ffff8ec87332e780/0xefacb2c25df623eb lrc: 3/0,0 mode: PR/PR res: [0x24000cd10:0x2429:0x0].0x0 bits 0x13/0x0 rrc: 25 type: IBT flags: 0x60200400000020 nid: 10.8.11.4@o2ib6 remote: 0x4b35ae9538a29a3a expref: 27 pid: 91269 timeout: 0 lvb_type: 0 Mar 20 01:38:49 fir-md1-s2 kernel: Lustre: 91614:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (142:1s); client may timeout. req@ffff8ec7f8f01800 x1627867820823568/t0(0) o101->09660f1c-7705-2a63-52d0-beaa996b06ba@10.9.0.64@o2ib4:15/0 lens 592/536 e 0 to 0 dl 1553071128 ref 1 fl Complete:/0/0 rc 0/0 Mar 20 01:38:49 fir-md1-s2 kernel: Lustre: 91614:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 21 previous similar messages Mar 20 01:38:54 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client a7de0fce-472e-35aa-c25d-8ab8c8ad252b (at 10.8.11.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8eec55fa9c00, cur 1553071134 expire 1553070984 last 1553070907 Mar 20 01:41:54 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to a7de0fce-472e-35aa-c25d-8ab8c8ad252b (at 10.8.11.4@o2ib6) Mar 20 01:41:54 fir-md1-s2 kernel: Lustre: Skipped 2 previous similar messages Mar 20 01:51:08 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client d411c561-adfa-7f78-e534-2dbceed521e6 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8ed0f0d0e800, cur 1553071868 expire 1553071718 last 1553071641 Mar 20 01:53:19 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 2efa0721-d988-fc25-739d-9be810de2f6e (at 10.8.27.23@o2ib6) Mar 20 01:53:19 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 20 02:23:49 fir-md1-s2 kernel: Lustre: 91411:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8edda779a400 x1626181405818496/t0(0) o101->3c080823-2f82-a8fd-3254-84e9f50418c5@10.9.101.6@o2ib4:24/0 lens 480/568 e 1 to 0 dl 1553073834 ref 2 fl Interpret:/0/0 rc 0/0 Mar 20 02:23:49 fir-md1-s2 kernel: Lustre: 91411:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 6 previous similar messages Mar 20 02:23:55 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 3c080823-2f82-a8fd-3254-84e9f50418c5 (at 10.9.101.6@o2ib4) reconnecting Mar 20 02:23:55 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 89beb8b3-791a-f6c6-2375-0d31fb61e8aa (at 10.9.101.11@o2ib4) Mar 20 02:23:55 fir-md1-s2 kernel: Lustre: Skipped 22 previous similar messages Mar 20 02:24:04 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.9.101.11@o2ib4 ns: mdt-fir-MDT0001_UUID lock: ffff8eea5a63a880/0xefacb2c28716e988 lrc: 3/0,0 mode: PW/PW res: [0x240005ab2:0x1e49e:0x0].0x0 bits 0x40/0x0 rrc: 43 type: IBT flags: 0x60200400000020 nid: 10.9.101.11@o2ib4 remote: 0x1b949336f306b48f expref: 51 pid: 90863 timeout: 918387 lvb_type: 0 Mar 20 02:24:04 fir-md1-s2 kernel: LustreError: 91509:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8ee8af3aac00 ns: mdt-fir-MDT0001_UUID lock: ffff8edb7c209f80/0xefacb2c28716ea3e lrc: 3/0,0 mode: PW/PW res: [0x240005ab2:0x1e49e:0x0].0x0 bits 0x40/0x0 rrc: 40 type: IBT flags: 0x50200400000020 nid: 10.9.101.11@o2ib4 remote: 0x1b949336f306b496 expref: 28 pid: 91509 timeout: 0 lvb_type: 0 Mar 20 02:24:04 fir-md1-s2 kernel: LustreError: 91509:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 5 previous similar messages Mar 20 02:24:29 fir-md1-s2 kernel: Lustre: 91593:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8ec556ed9200 x1626131824321600/t0(0) o101->7f326dd9-6b3c-45f1-d1dc-35b8a959cac2@10.9.101.16@o2ib4:4/0 lens 600/3264 e 0 to 0 dl 1553073874 ref 2 fl Interpret:/0/0 rc 0/0 Mar 20 02:24:29 fir-md1-s2 kernel: Lustre: 91593:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 10 previous similar messages Mar 20 02:24:35 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client fa565bf9-4e3f-b131-86ef-18427c7a396c (at 10.9.101.24@o2ib4) reconnecting Mar 20 02:24:35 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 810fe316-e09a-254c-3020-2540e531f84e (at 10.9.101.7@o2ib4) Mar 20 02:24:35 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages Mar 20 02:24:35 fir-md1-s2 kernel: Lustre: Skipped 7 previous similar messages Mar 20 02:25:34 fir-md1-s2 kernel: LustreError: 91528:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1553073844, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0001_UUID lock: ffff8edc302886c0/0xefacb2c28790ed6b lrc: 3/0,1 mode: --/PW res: [0x240005ab2:0x1e49e:0x0].0x0 bits 0x40/0x0 rrc: 47 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 91528 timeout: 0 lvb_type: 0 Mar 20 02:25:34 fir-md1-s2 kernel: LustreError: 91361:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1553073844, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0001_UUID lock: ffff8ed606ffcc80/0xefacb2c28790ede2 lrc: 3/1,0 mode: --/PR res: [0x240005ab2:0x1e49e:0x0].0x0 bits 0x13/0x8 rrc: 47 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 91361 timeout: 0 lvb_type: 0 Mar 20 02:25:34 fir-md1-s2 kernel: LustreError: 91361:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 7 previous similar messages Mar 20 02:25:34 fir-md1-s2 kernel: LustreError: 91528:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 11 previous similar messages Mar 20 02:25:37 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 810fe316-e09a-254c-3020-2540e531f84e (at 10.9.101.7@o2ib4) reconnecting Mar 20 02:25:37 fir-md1-s2 kernel: Lustre: Skipped 13 previous similar messages Mar 20 02:26:08 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.101.9@o2ib4) Mar 20 02:26:08 fir-md1-s2 kernel: Lustre: Skipped 26 previous similar messages Mar 20 02:26:34 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.9.101.9@o2ib4 ns: mdt-fir-MDT0001_UUID lock: ffff8ec1fd7f2880/0xefacb2c28790e935 lrc: 3/0,0 mode: PW/PW res: [0x240005ab2:0x1e49e:0x0].0x0 bits 0x40/0x0 rrc: 45 type: IBT flags: 0x60200400000020 nid: 10.9.101.9@o2ib4 remote: 0x20d2718795e556b3 expref: 91 pid: 91367 timeout: 918537 lvb_type: 0 Mar 20 02:26:59 fir-md1-s2 kernel: Lustre: 99936:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8eef7c7fbc00 x1627854694307568/t0(0) o101->fa565bf9-4e3f-b131-86ef-18427c7a396c@10.9.101.24@o2ib4:4/0 lens 568/0 e 0 to 0 dl 1553074024 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Mar 20 02:26:59 fir-md1-s2 kernel: Lustre: 99936:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 22 previous similar messages Mar 20 02:27:05 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client fa565bf9-4e3f-b131-86ef-18427c7a396c (at 10.9.101.24@o2ib4) reconnecting Mar 20 02:27:05 fir-md1-s2 kernel: Lustre: Skipped 17 previous similar messages Mar 20 02:27:24 fir-md1-s2 kernel: LNet: Service thread pid 91371 was inactive for 200.25s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Mar 20 02:27:24 fir-md1-s2 kernel: Pid: 91371, comm: mdt00_024 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 20 02:27:24 fir-md1-s2 kernel: Call Trace: Mar 20 02:27:24 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 20 02:27:24 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 20 02:27:24 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Mar 20 02:27:24 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Mar 20 02:27:24 fir-md1-s2 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Mar 20 02:27:24 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Mar 20 02:27:24 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Mar 20 02:27:24 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Mar 20 02:27:24 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Mar 20 02:27:24 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 20 02:27:24 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 20 02:27:24 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 20 02:27:24 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 20 02:27:24 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 20 02:27:24 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 20 02:27:24 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1553074044.91371 Mar 20 02:27:26 fir-md1-s2 kernel: LNet: Service thread pid 91634 was inactive for 201.72s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Mar 20 02:27:26 fir-md1-s2 kernel: Pid: 91634, comm: mdt03_025 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 20 02:27:26 fir-md1-s2 kernel: Call Trace: Mar 20 02:27:26 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 20 02:27:26 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 20 02:27:26 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Mar 20 02:27:26 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Mar 20 02:27:26 fir-md1-s2 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Mar 20 02:27:26 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Mar 20 02:27:26 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Mar 20 02:27:26 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Mar 20 02:27:26 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Mar 20 02:27:26 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 20 02:27:26 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 20 02:27:26 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 20 02:27:26 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 20 02:27:26 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 20 02:27:26 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 20 02:27:26 fir-md1-s2 kernel: Pid: 91605, comm: mdt02_079 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 20 02:27:26 fir-md1-s2 kernel: Call Trace: Mar 20 02:27:26 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 20 02:27:26 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 20 02:27:26 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Mar 20 02:27:26 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Mar 20 02:27:26 fir-md1-s2 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Mar 20 02:27:26 fir-md1-s2 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Mar 20 02:27:26 fir-md1-s2 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Mar 20 02:27:26 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Mar 20 02:27:26 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Mar 20 02:27:26 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Mar 20 02:27:26 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Mar 20 02:27:26 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 20 02:27:26 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 20 02:27:26 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 20 02:27:26 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 20 02:27:26 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 20 02:27:26 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 20 02:27:26 fir-md1-s2 kernel: Pid: 91070, comm: mdt01_009 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 20 02:27:26 fir-md1-s2 kernel: Call Trace: Mar 20 02:27:26 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 20 02:27:26 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 20 02:27:26 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Mar 20 02:27:26 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Mar 20 02:27:26 fir-md1-s2 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Mar 20 02:27:26 fir-md1-s2 kernel: [] mdt_hsm_state_set+0xc9/0x830 [mdt] Mar 20 02:27:26 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 20 02:27:26 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 20 02:27:26 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 20 02:27:26 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 20 02:27:26 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 20 02:27:26 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 20 02:27:26 fir-md1-s2 kernel: Pid: 14607, comm: mdt00_089 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 20 02:27:26 fir-md1-s2 kernel: Call Trace: Mar 20 02:27:26 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 20 02:27:26 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 20 02:27:26 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Mar 20 02:27:26 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Mar 20 02:27:26 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Mar 20 02:27:26 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Mar 20 02:27:26 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Mar 20 02:27:26 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Mar 20 02:27:26 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Mar 20 02:27:26 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Mar 20 02:27:26 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Mar 20 02:27:26 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 20 02:27:26 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 20 02:27:26 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 20 02:27:26 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 20 02:27:26 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 20 02:27:26 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 20 02:27:26 fir-md1-s2 kernel: LNet: Service thread pid 91583 was inactive for 202.19s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Mar 20 02:28:04 fir-md1-s2 kernel: LustreError: 91651:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1553073994, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0001_UUID lock: ffff8ec389203f00/0xefacb2c289de968f lrc: 3/1,0 mode: --/PR res: [0x240005ab2:0x1e49e:0x0].0x0 bits 0x13/0x8 rrc: 46 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 91651 timeout: 0 lvb_type: 0 Mar 20 02:28:04 fir-md1-s2 kernel: LustreError: 91651:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 4 previous similar messages Mar 20 02:28:43 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.101.16@o2ib4) Mar 20 02:28:43 fir-md1-s2 kernel: Lustre: Skipped 34 previous similar messages Mar 20 02:29:01 fir-md1-s2 kernel: Lustre: 91619:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8eb6d1362700 x1628036465046576/t0(0) o101->f2f4d52c-2807-60dd-8532-99fa3e9aeefa@10.0.10.3@o2ib7:6/0 lens 576/3264 e 0 to 0 dl 1553074146 ref 2 fl Interpret:/0/0 rc 0/0 Mar 20 02:29:01 fir-md1-s2 kernel: Lustre: 91619:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Mar 20 02:29:04 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.9.101.24@o2ib4 ns: mdt-fir-MDT0001_UUID lock: ffff8edc302886c0/0xefacb2c28790ed6b lrc: 3/0,0 mode: PW/PW res: [0x240005ab2:0x1e49e:0x0].0x0 bits 0x40/0x0 rrc: 48 type: IBT flags: 0x60200400000020 nid: 10.9.101.24@o2ib4 remote: 0xa5d9cf5906310789 expref: 55 pid: 91528 timeout: 918687 lvb_type: 0 Mar 20 02:29:04 fir-md1-s2 kernel: LNet: Service thread pid 91605 completed after 299.93s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Mar 20 02:29:04 fir-md1-s2 kernel: LustreError: 91503:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8ed27c205800 ns: mdt-fir-MDT0001_UUID lock: ffff8edf643b6780/0xefacb2c28790ee3d lrc: 3/0,0 mode: PR/PR res: [0x240005ab2:0x1e49e:0x0].0x0 bits 0x1b/0x0 rrc: 45 type: IBT flags: 0x50200400000020 nid: 10.9.101.9@o2ib4 remote: 0x20d2718795e556c1 expref: 6 pid: 91503 timeout: 0 lvb_type: 0 Mar 20 02:29:04 fir-md1-s2 kernel: LustreError: 91503:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 2 previous similar messages Mar 20 02:29:04 fir-md1-s2 kernel: Lustre: 91503:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (154:146s); client may timeout. req@ffff8edd4761e300 x1626174038218000/t0(0) o101->1831b60b-c0b3-6e16-f786-e9804146d690@10.9.101.9@o2ib4:4/0 lens 576/1168 e 0 to 0 dl 1553073998 ref 1 fl Complete:/0/0 rc -107/-107 Mar 20 02:29:04 fir-md1-s2 kernel: LNet: Skipped 3 previous similar messages Mar 20 02:29:35 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 836c1478-78ed-7dca-ef04-91a42cb19ef4 (at 10.9.101.23@o2ib4) reconnecting Mar 20 02:29:35 fir-md1-s2 kernel: Lustre: Skipped 32 previous similar messages Mar 20 02:29:54 fir-md1-s2 kernel: LNet: Service thread pid 91533 was inactive for 200.33s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Mar 20 02:29:54 fir-md1-s2 kernel: LNet: Skipped 9 previous similar messages Mar 20 02:29:54 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1553074194.91533 Mar 20 02:30:06 fir-md1-s2 kernel: LustreError: 91515:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1553074116, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0001_UUID lock: ffff8ebfa7355c40/0xefacb2c28bb0ed97 lrc: 3/1,0 mode: --/PR res: [0x240005ab2:0x1e49e:0x0].0x0 bits 0x12/0x0 rrc: 46 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 91515 timeout: 0 lvb_type: 0 Mar 20 02:30:34 fir-md1-s2 kernel: LustreError: 91068:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1553074144, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0001_UUID lock: ffff8ec16c3cec00/0xefacb2c28c18a768 lrc: 3/1,0 mode: --/PR res: [0x240005ab2:0x1e49e:0x0].0x0 bits 0x20/0x0 rrc: 46 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 91068 timeout: 0 lvb_type: 0 Mar 20 02:30:34 fir-md1-s2 kernel: LustreError: 91068:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 2 previous similar messages Mar 20 02:31:34 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.9.101.7@o2ib4 ns: mdt-fir-MDT0001_UUID lock: ffff8ed1ba689440/0xefacb2c28790ed9c lrc: 3/0,0 mode: PW/PW res: [0x240005ab2:0x1e49e:0x0].0x0 bits 0x40/0x0 rrc: 46 type: IBT flags: 0x60200400000020 nid: 10.9.101.7@o2ib4 remote: 0xc8b1122665040c84 expref: 82 pid: 91605 timeout: 918837 lvb_type: 0 Mar 20 02:31:34 fir-md1-s2 kernel: LustreError: 99934:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8ed27c205800 ns: mdt-fir-MDT0001_UUID lock: ffff8ebbbca9f980/0xefacb2c28790efe1 lrc: 3/0,0 mode: PR/PR res: [0x240005ab2:0x1e49e:0x0].0x0 bits 0x20/0x0 rrc: 41 type: IBT flags: 0x50200000000000 nid: 10.9.101.9@o2ib4 remote: 0x20d2718795e556c8 expref: 4 pid: 99934 timeout: 0 lvb_type: 0 Mar 20 02:31:34 fir-md1-s2 kernel: Lustre: 91070:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (154:296s); client may timeout. req@ffff8ec556e2aa00 x1626174038218032/t0(0) o55->1831b60b-c0b3-6e16-f786-e9804146d690@10.9.101.9@o2ib4:4/0 lens 472/192 e 0 to 0 dl 1553073998 ref 1 fl Complete:/0/0 rc -22/-22 Mar 20 02:31:34 fir-md1-s2 kernel: LNet: Service thread pid 91634 completed after 449.93s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Mar 20 03:07:28 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.11.10@o2ib6) Mar 20 03:07:28 fir-md1-s2 kernel: Lustre: Skipped 40 previous similar messages Mar 20 03:22:11 fir-md1-s2 kernel: Lustre: 91518:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8edc97a95100 x1626167313726240/t0(0) o101->44242005-5dc4-8fcc-7a32-cd2d0aa5949c@10.8.3.20@o2ib6:16/0 lens 480/568 e 1 to 0 dl 1553077336 ref 2 fl Interpret:/0/0 rc 0/0 Mar 20 03:22:11 fir-md1-s2 kernel: Lustre: 91518:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 3 previous similar messages Mar 20 03:38:27 fir-md1-s2 kernel: Lustre: 91476:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8ec188a3ec00 x1626145415430720/t0(0) o101->89beb8b3-791a-f6c6-2375-0d31fb61e8aa@10.9.101.11@o2ib4:2/0 lens 576/3264 e 1 to 0 dl 1553078312 ref 2 fl Interpret:/0/0 rc 0/0 Mar 20 03:38:33 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 89beb8b3-791a-f6c6-2375-0d31fb61e8aa (at 10.9.101.11@o2ib4) reconnecting Mar 20 03:38:33 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 5535dd63-6e9b-7280-e906-63ff0acc4285 (at 10.9.101.37@o2ib4) Mar 20 03:38:33 fir-md1-s2 kernel: Lustre: Skipped 29 previous similar messages Mar 20 03:38:37 fir-md1-s2 kernel: Lustre: 91593:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8ec188a3ad00 x1626698316571904/t0(0) o101->21c1d1e0-a07e-9376-435e-d7152822fd0e@10.9.101.20@o2ib4:12/0 lens 576/3264 e 0 to 0 dl 1553078322 ref 2 fl Interpret:/0/0 rc 0/0 Mar 20 03:38:37 fir-md1-s2 kernel: Lustre: 91593:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 19 previous similar messages Mar 20 03:38:43 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.101.51@o2ib4) Mar 20 03:38:43 fir-md1-s2 kernel: Lustre: Skipped 11 previous similar messages Mar 20 03:38:54 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 5535dd63-6e9b-7280-e906-63ff0acc4285 (at 10.9.101.37@o2ib4) Mar 20 03:38:54 fir-md1-s2 kernel: Lustre: Skipped 11 previous similar messages Mar 20 03:39:14 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client c2df7375-12df-1926-64d0-f8c7d18d8591 (at 10.9.108.45@o2ib4) reconnecting Mar 20 03:39:14 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.101.51@o2ib4) Mar 20 03:39:14 fir-md1-s2 kernel: Lustre: Skipped 10 previous similar messages Mar 20 03:39:14 fir-md1-s2 kernel: Lustre: Skipped 36 previous similar messages Mar 20 03:39:42 fir-md1-s2 kernel: LustreError: 99933:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1553078292, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0001_UUID lock: ffff8eb906b2fbc0/0xefacb2c2c48904d0 lrc: 3/0,1 mode: --/PW res: [0x24000ecde:0x10f:0x0].0x0 bits 0x40/0x0 rrc: 89 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 99933 timeout: 0 lvb_type: 0 Mar 20 03:39:42 fir-md1-s2 kernel: LustreError: 99933:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 36 previous similar messages Mar 20 03:39:56 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.108.55@o2ib4) Mar 20 03:39:56 fir-md1-s2 kernel: Lustre: Skipped 45 previous similar messages Mar 20 03:40:38 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 148ae75a-e083-e480-3765-d63daa0c5525 (at 10.9.108.55@o2ib4) reconnecting Mar 20 03:40:38 fir-md1-s2 kernel: Lustre: Skipped 74 previous similar messages Mar 20 03:40:42 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.9.101.37@o2ib4 ns: mdt-fir-MDT0001_UUID lock: ffff8eb06c539f80/0xefacb2c2c4890333 lrc: 3/0,0 mode: PW/PW res: [0x24000ecde:0x10f:0x0].0x0 bits 0x40/0x0 rrc: 89 type: IBT flags: 0x60200400000020 nid: 10.9.101.37@o2ib4 remote: 0xcf080b98105da174 expref: 35192 pid: 14641 timeout: 922985 lvb_type: 0 Mar 20 03:41:07 fir-md1-s2 kernel: Lustre: 91259:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8eeac7a5ad00 x1626180873719040/t0(0) o101->8f537e20-4055-050d-294f-70ac1a22847e@10.9.101.34@o2ib4:12/0 lens 568/0 e 0 to 0 dl 1553078472 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Mar 20 03:41:07 fir-md1-s2 kernel: Lustre: 91259:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 11 previous similar messages Mar 20 03:41:13 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.101.34@o2ib4) Mar 20 03:41:13 fir-md1-s2 kernel: Lustre: Skipped 68 previous similar messages Mar 20 03:41:32 fir-md1-s2 kernel: LNet: Service thread pid 91345 was inactive for 200.28s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Mar 20 03:41:32 fir-md1-s2 kernel: LNet: Skipped 3 previous similar messages Mar 20 03:41:32 fir-md1-s2 kernel: Pid: 91345, comm: mdt02_022 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 20 03:41:32 fir-md1-s2 kernel: Call Trace: Mar 20 03:41:32 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 20 03:41:33 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 20 03:41:33 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Mar 20 03:41:33 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Mar 20 03:41:33 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Mar 20 03:41:33 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Mar 20 03:41:33 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Mar 20 03:41:33 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Mar 20 03:41:33 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Mar 20 03:41:33 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Mar 20 03:41:33 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Mar 20 03:41:33 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 20 03:41:33 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 20 03:41:33 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 20 03:41:33 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 20 03:41:33 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 20 03:41:33 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 20 03:41:33 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1553078493.91345 Mar 20 03:41:34 fir-md1-s2 kernel: LNet: Service thread pid 90858 was inactive for 201.60s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Mar 20 03:41:34 fir-md1-s2 kernel: Pid: 90858, comm: mdt01_001 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 20 03:41:34 fir-md1-s2 kernel: Call Trace: Mar 20 03:41:34 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 20 03:41:34 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 20 03:41:34 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Mar 20 03:41:34 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Mar 20 03:41:34 fir-md1-s2 kernel: [] mdt_reint_object_lock+0x2c/0x60 [mdt] Mar 20 03:41:34 fir-md1-s2 kernel: [] mdt_reint_striped_lock+0x8c/0x510 [mdt] Mar 20 03:41:34 fir-md1-s2 kernel: [] mdt_reint_setattr+0x6c8/0x12d0 [mdt] Mar 20 03:41:34 fir-md1-s2 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Mar 20 03:41:34 fir-md1-s2 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Mar 20 03:41:34 fir-md1-s2 kernel: [] mdt_reint+0x67/0x140 [mdt] Mar 20 03:41:34 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 20 03:41:34 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 20 03:41:34 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 20 03:41:34 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 20 03:41:34 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 20 03:41:34 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 20 03:41:34 fir-md1-s2 kernel: Pid: 91245, comm: mdt01_021 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 20 03:41:34 fir-md1-s2 kernel: Call Trace: Mar 20 03:41:34 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 20 03:41:34 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 20 03:41:34 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Mar 20 03:41:34 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Mar 20 03:41:34 fir-md1-s2 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Mar 20 03:41:34 fir-md1-s2 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Mar 20 03:41:34 fir-md1-s2 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Mar 20 03:41:34 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Mar 20 03:41:34 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Mar 20 03:41:34 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Mar 20 03:41:34 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Mar 20 03:41:34 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 20 03:41:34 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 20 03:41:34 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 20 03:41:34 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 20 03:41:34 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 20 03:41:34 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 20 03:41:34 fir-md1-s2 kernel: Pid: 91521, comm: mdt02_067 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 20 03:41:34 fir-md1-s2 kernel: Call Trace: Mar 20 03:41:34 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 20 03:41:34 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 20 03:41:34 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Mar 20 03:41:34 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Mar 20 03:41:34 fir-md1-s2 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Mar 20 03:41:34 fir-md1-s2 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Mar 20 03:41:34 fir-md1-s2 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Mar 20 03:41:34 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Mar 20 03:41:34 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Mar 20 03:41:34 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Mar 20 03:41:34 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Mar 20 03:41:34 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 20 03:41:34 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 20 03:41:34 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 20 03:41:34 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 20 03:41:34 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 20 03:41:34 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 20 03:41:34 fir-md1-s2 kernel: Pid: 91615, comm: mdt02_082 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 20 03:41:34 fir-md1-s2 kernel: Call Trace: Mar 20 03:41:34 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 20 03:41:34 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 20 03:41:34 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Mar 20 03:41:34 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Mar 20 03:41:34 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Mar 20 03:41:34 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Mar 20 03:41:34 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Mar 20 03:41:34 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Mar 20 03:41:34 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Mar 20 03:41:34 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Mar 20 03:41:34 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Mar 20 03:41:34 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 20 03:41:34 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 20 03:41:34 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 20 03:41:34 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 20 03:41:34 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 20 03:41:34 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 20 03:41:34 fir-md1-s2 kernel: LNet: Service thread pid 91452 was inactive for 202.11s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Mar 20 03:41:34 fir-md1-s2 kernel: LNet: Skipped 4 previous similar messages Mar 20 03:42:12 fir-md1-s2 kernel: LustreError: 91181:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1553078442, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0001_UUID lock: ffff8eec942fe780/0xefacb2c2c635018d lrc: 3/1,0 mode: --/PR res: [0x24000ecde:0x10f:0x0].0x0 bits 0x20/0x0 rrc: 88 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 91181 timeout: 0 lvb_type: 0 Mar 20 03:43:12 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.9.101.34@o2ib4 ns: mdt-fir-MDT0001_UUID lock: ffff8eb77d7c6c00/0xefacb2c2c4890498 lrc: 3/0,0 mode: PW/PW res: [0x24000ecde:0x10f:0x0].0x0 bits 0x40/0x0 rrc: 88 type: IBT flags: 0x60200400000020 nid: 10.9.101.34@o2ib4 remote: 0x3841a2e2f8a839fd expref: 77 pid: 91181 timeout: 923135 lvb_type: 0 Mar 20 03:43:12 fir-md1-s2 kernel: LNet: Service thread pid 99933 completed after 299.70s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Mar 20 03:43:12 fir-md1-s2 kernel: LNet: Skipped 19 previous similar messages Mar 20 03:43:22 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 7dbe5d6b-9a6b-7c31-ad8a-ef556ccf4c10 (at 10.9.101.51@o2ib4) reconnecting Mar 20 03:43:22 fir-md1-s2 kernel: Lustre: Skipped 142 previous similar messages Mar 20 03:43:37 fir-md1-s2 kernel: Lustre: 91502:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8eb578fa1e00 x1626282819280560/t0(0) o101->2b5ea6bb-9bbc-8164-5ec8-7cffe32aebcd@10.9.108.42@o2ib4:12/0 lens 568/0 e 0 to 0 dl 1553078622 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Mar 20 03:43:44 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to f2f4d52c-2807-60dd-8532-99fa3e9aeefa (at 10.0.10.3@o2ib7) Mar 20 03:43:44 fir-md1-s2 kernel: Lustre: Skipped 131 previous similar messages Mar 20 03:44:03 fir-md1-s2 kernel: LNet: Service thread pid 91181 was inactive for 200.52s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Mar 20 03:44:03 fir-md1-s2 kernel: LNet: Skipped 31 previous similar messages Mar 20 03:44:03 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1553078643.91181 Mar 20 03:44:42 fir-md1-s2 kernel: LustreError: 91442:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1553078592, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0001_UUID lock: ffff8eba0ebdca40/0xefacb2c2c7799db9 lrc: 3/1,0 mode: --/PR res: [0x24000ecde:0x10f:0x0].0x0 bits 0x20/0x0 rrc: 93 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 91442 timeout: 0 lvb_type: 0 Mar 20 03:44:42 fir-md1-s2 kernel: LustreError: 91442:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 4 previous similar messages Mar 20 03:45:42 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.9.108.42@o2ib4 ns: mdt-fir-MDT0001_UUID lock: ffff8eb906b2fbc0/0xefacb2c2c48904d0 lrc: 3/0,0 mode: PW/PW res: [0x24000ecde:0x10f:0x0].0x0 bits 0x40/0x0 rrc: 93 type: IBT flags: 0x60200400000020 nid: 10.9.108.42@o2ib4 remote: 0x929aa80d14fb97cb expref: 17 pid: 99933 timeout: 923285 lvb_type: 0 Mar 20 03:45:42 fir-md1-s2 kernel: LNet: Service thread pid 91265 completed after 449.70s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Mar 20 03:45:42 fir-md1-s2 kernel: LNet: Skipped 1 previous similar message Mar 20 03:46:07 fir-md1-s2 kernel: Lustre: 91355:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8ed3746bef00 x1626170866255008/t0(0) o101->476e933b-2664-4b57-53cf-d95b660fb2b3@10.9.101.5@o2ib4:12/0 lens 576/3264 e 0 to 0 dl 1553078772 ref 2 fl Interpret:/0/0 rc 0/0 Mar 20 03:46:07 fir-md1-s2 kernel: Lustre: 91355:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 5 previous similar messages Mar 20 03:46:12 fir-md1-s2 kernel: LNet: Service thread pid 91634 completed after 479.66s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Mar 20 03:46:12 fir-md1-s2 kernel: LNet: Skipped 1 previous similar message Mar 20 03:46:33 fir-md1-s2 kernel: LNet: Service thread pid 91351 was inactive for 200.60s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Mar 20 03:46:33 fir-md1-s2 kernel: LNet: Skipped 3 previous similar messages Mar 20 03:46:33 fir-md1-s2 kernel: Pid: 91351, comm: mdt01_038 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 20 03:46:33 fir-md1-s2 kernel: Call Trace: Mar 20 03:46:33 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 20 03:46:33 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 20 03:46:33 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Mar 20 03:46:33 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Mar 20 03:46:33 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Mar 20 03:46:33 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Mar 20 03:46:33 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Mar 20 03:46:33 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Mar 20 03:46:33 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Mar 20 03:46:33 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Mar 20 03:46:33 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Mar 20 03:46:33 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 20 03:46:33 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 20 03:46:33 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 20 03:46:33 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 20 03:46:33 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 20 03:46:33 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 20 03:46:33 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1553078793.91351 Mar 20 03:46:33 fir-md1-s2 kernel: Pid: 91442, comm: mdt00_044 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 20 03:46:33 fir-md1-s2 kernel: Call Trace: Mar 20 03:46:33 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 20 03:46:33 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 20 03:46:33 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Mar 20 03:46:33 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Mar 20 03:46:33 fir-md1-s2 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Mar 20 03:46:33 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Mar 20 03:46:33 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Mar 20 03:46:33 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Mar 20 03:46:33 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Mar 20 03:46:33 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 20 03:46:33 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 20 03:46:33 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 20 03:46:33 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 20 03:46:33 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 20 03:46:33 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 20 03:46:33 fir-md1-s2 kernel: Pid: 91345, comm: mdt02_022 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 20 03:46:33 fir-md1-s2 kernel: Call Trace: Mar 20 03:46:33 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 20 03:46:33 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 20 03:46:33 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Mar 20 03:46:33 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Mar 20 03:46:33 fir-md1-s2 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Mar 20 03:46:33 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Mar 20 03:46:33 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Mar 20 03:46:33 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Mar 20 03:46:33 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Mar 20 03:46:33 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 20 03:46:33 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 20 03:46:33 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 20 03:46:33 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 20 03:46:33 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 20 03:46:33 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 20 03:46:33 fir-md1-s2 kernel: Pid: 91587, comm: mdt03_023 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 20 03:46:33 fir-md1-s2 kernel: Call Trace: Mar 20 03:46:33 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 20 03:46:33 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 20 03:46:33 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Mar 20 03:46:33 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Mar 20 03:46:33 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Mar 20 03:46:33 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Mar 20 03:46:33 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Mar 20 03:46:33 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Mar 20 03:46:33 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Mar 20 03:46:33 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Mar 20 03:46:33 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Mar 20 03:46:33 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 20 03:46:33 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 20 03:46:33 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 20 03:46:33 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 20 03:46:33 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 20 03:46:33 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 20 03:46:33 fir-md1-s2 kernel: Pid: 91608, comm: mdt00_071 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 20 03:46:33 fir-md1-s2 kernel: Call Trace: Mar 20 03:46:33 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 20 03:46:33 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 20 03:46:33 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Mar 20 03:46:33 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Mar 20 03:46:33 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x11d/0x1c30 [mdt] Mar 20 03:46:33 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Mar 20 03:46:33 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Mar 20 03:46:33 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Mar 20 03:46:33 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Mar 20 03:46:33 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Mar 20 03:46:33 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 20 03:46:33 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 20 03:46:33 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 20 03:46:33 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 20 03:46:33 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 20 03:46:33 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 20 03:46:33 fir-md1-s2 kernel: LNet: Service thread pid 91262 was inactive for 201.28s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Mar 20 03:46:42 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.9.108.49@o2ib4 ns: mdt-fir-MDT0001_UUID lock: ffff8ecfd08e5580/0xefacb2c2c48906b3 lrc: 3/0,0 mode: PW/PW res: [0x24000ecde:0x10f:0x0].0x0 bits 0x40/0x0 rrc: 90 type: IBT flags: 0x60200400000020 nid: 10.9.108.49@o2ib4 remote: 0x4e240f9fd7fe9f7e expref: 31 pid: 91060 timeout: 923345 lvb_type: 0 Mar 20 03:46:42 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message Mar 20 03:46:42 fir-md1-s2 kernel: LNet: Service thread pid 14629 completed after 509.71s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Mar 20 03:47:07 fir-md1-s2 kernel: Lustre: 91476:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8ec1743c6600 x1626174045729520/t0(0) o101->1831b60b-c0b3-6e16-f786-e9804146d690@10.9.101.9@o2ib4:12/0 lens 568/0 e 0 to 0 dl 1553078832 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Mar 20 03:47:07 fir-md1-s2 kernel: Lustre: 91476:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages Mar 20 03:47:12 fir-md1-s2 kernel: LustreError: 91380:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1553078742, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0001_UUID lock: ffff8ec24bbb69c0/0xefacb2c2c8e7b8b2 lrc: 3/1,0 mode: --/PR res: [0x24000ecde:0x10f:0x0].0x0 bits 0x20/0x0 rrc: 89 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 91380 timeout: 0 lvb_type: 0 Mar 20 03:47:12 fir-md1-s2 kernel: LustreError: 91380:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 2 previous similar messages Mar 20 03:47:42 fir-md1-s2 kernel: LustreError: 91237:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1553078772, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0001_UUID lock: ffff8ebee94e4380/0xefacb2c2c936b995 lrc: 3/1,0 mode: --/PR res: [0x24000ecde:0x10f:0x0].0x0 bits 0x20/0x0 rrc: 89 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 91237 timeout: 0 lvb_type: 0 Mar 20 03:48:12 fir-md1-s2 kernel: LustreError: 91256:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1553078802, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0001_UUID lock: ffff8ecfcff27bc0/0xefacb2c2c988be0e lrc: 3/1,0 mode: --/PR res: [0x24000ecde:0x10f:0x0].0x0 bits 0x20/0x0 rrc: 89 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 91256 timeout: 0 lvb_type: 0 Mar 20 03:48:23 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client f2f4d52c-2807-60dd-8532-99fa3e9aeefa (at 10.0.10.3@o2ib7) reconnecting Mar 20 03:48:23 fir-md1-s2 kernel: Lustre: Skipped 240 previous similar messages Mar 20 03:48:46 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.101.9@o2ib4) Mar 20 03:48:46 fir-md1-s2 kernel: Lustre: Skipped 236 previous similar messages Mar 20 03:49:02 fir-md1-s2 kernel: LNet: Service thread pid 91380 was inactive for 200.11s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Mar 20 03:49:02 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1553078942.91380 Mar 20 03:49:12 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.9.101.9@o2ib4 ns: mdt-fir-MDT0001_UUID lock: ffff8ed1b8731b00/0xefacb2c2c48906d6 lrc: 3/0,0 mode: PW/PW res: [0x24000ecde:0x10f:0x0].0x0 bits 0x40/0x0 rrc: 89 type: IBT flags: 0x60200400000020 nid: 10.9.101.9@o2ib4 remote: 0x20d2718795e5aee2 expref: 851 pid: 14629 timeout: 923495 lvb_type: 0 Mar 20 03:49:12 fir-md1-s2 kernel: LNet: Service thread pid 91521 completed after 659.70s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Mar 20 03:49:12 fir-md1-s2 kernel: LNet: Skipped 4 previous similar messages Mar 20 03:49:29 fir-md1-s2 kernel: Lustre: 90855:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-657), not sending early reply req@ffff8eba9540ec00 x1627858255854416/t0(0) o101->8cf3b5bb-dd39-b59b-e6ac-1071ddb19c6e@10.9.108.54@o2ib4:4/0 lens 480/568 e 0 to 0 dl 1553078974 ref 2 fl Interpret:/0/0 rc 0/0 Mar 20 03:49:32 fir-md1-s2 kernel: LNet: Service thread pid 91237 was inactive for 200.35s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Mar 20 03:49:32 fir-md1-s2 kernel: LNet: Skipped 1 previous similar message Mar 20 03:49:32 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1553078972.91237 Mar 20 03:50:02 fir-md1-s2 kernel: LNet: Service thread pid 91256 was inactive for 200.51s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Mar 20 03:50:02 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1553079002.91256 Mar 20 03:50:42 fir-md1-s2 kernel: LustreError: 91466:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1553078952, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0001_UUID lock: ffff8ed47861c5c0/0xefacb2c2cb0cabbd lrc: 3/0,1 mode: --/PW res: [0x24000ecde:0x10f:0x0].0x0 bits 0x40/0x0 rrc: 78 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 91466 timeout: 0 lvb_type: 0 Mar 20 03:50:42 fir-md1-s2 kernel: LustreError: 91466:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 2 previous similar messages Mar 20 03:51:42 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.9.108.45@o2ib4 ns: mdt-fir-MDT0001_UUID lock: ffff8ecc8e739440/0xefacb2c2c4890777 lrc: 3/0,0 mode: PW/PW res: [0x24000ecde:0x10f:0x0].0x0 bits 0x40/0x0 rrc: 78 type: IBT flags: 0x60200400000020 nid: 10.9.108.45@o2ib4 remote: 0x382e5ec80e8a748d expref: 1015 pid: 91245 timeout: 923645 lvb_type: 0 Mar 20 03:51:42 fir-md1-s2 kernel: LNet: Service thread pid 91364 completed after 809.70s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Mar 20 03:51:42 fir-md1-s2 kernel: LNet: Skipped 5 previous similar messages Mar 20 03:52:07 fir-md1-s2 kernel: Lustre: 91568:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8ed3392e3c00 x1626170880712000/t0(0) o101->98e83ed5-6d59-446c-8f7b-05df06bf758a@10.9.101.36@o2ib4:12/0 lens 576/3264 e 0 to 0 dl 1553079132 ref 2 fl Interpret:/0/0 rc 0/0 Mar 20 03:52:07 fir-md1-s2 kernel: Lustre: 91568:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages Mar 20 03:52:12 fir-md1-s2 kernel: LNet: Service thread pid 91268 completed after 839.66s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Mar 20 03:52:12 fir-md1-s2 kernel: LNet: Skipped 1 previous similar message Mar 20 03:52:32 fir-md1-s2 kernel: LNet: Service thread pid 91245 was inactive for 200.01s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Mar 20 03:52:32 fir-md1-s2 kernel: LNet: Skipped 4 previous similar messages Mar 20 03:52:32 fir-md1-s2 kernel: Pid: 91245, comm: mdt01_021 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 20 03:52:32 fir-md1-s2 kernel: Call Trace: Mar 20 03:52:32 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 20 03:52:32 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 20 03:52:32 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Mar 20 03:52:32 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Mar 20 03:52:32 fir-md1-s2 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Mar 20 03:52:32 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Mar 20 03:52:32 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Mar 20 03:52:32 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Mar 20 03:52:32 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Mar 20 03:52:32 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 20 03:52:32 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 20 03:52:32 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 20 03:52:32 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 20 03:52:32 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 20 03:52:32 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 20 03:52:32 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1553079152.91245 Mar 20 03:52:32 fir-md1-s2 kernel: Pid: 91579, comm: mdt00_065 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 20 03:52:32 fir-md1-s2 kernel: Call Trace: Mar 20 03:52:32 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 20 03:52:32 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 20 03:52:32 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Mar 20 03:52:32 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Mar 20 03:52:32 fir-md1-s2 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Mar 20 03:52:32 fir-md1-s2 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Mar 20 03:52:32 fir-md1-s2 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Mar 20 03:52:32 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Mar 20 03:52:32 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Mar 20 03:52:32 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Mar 20 03:52:32 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Mar 20 03:52:32 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 20 03:52:32 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 20 03:52:32 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 20 03:52:32 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 20 03:52:32 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 20 03:52:32 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 20 03:52:32 fir-md1-s2 kernel: Pid: 91466, comm: mdt02_048 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 20 03:52:32 fir-md1-s2 kernel: Call Trace: Mar 20 03:52:32 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 20 03:52:32 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 20 03:52:32 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Mar 20 03:52:32 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Mar 20 03:52:32 fir-md1-s2 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Mar 20 03:52:32 fir-md1-s2 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Mar 20 03:52:32 fir-md1-s2 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Mar 20 03:52:32 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Mar 20 03:52:32 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Mar 20 03:52:32 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Mar 20 03:52:32 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Mar 20 03:52:32 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 20 03:52:32 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 20 03:52:32 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 20 03:52:32 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 20 03:52:32 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 20 03:52:32 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 20 03:53:12 fir-md1-s2 kernel: LustreError: 91507:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1553079102, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0001_UUID lock: ffff8ecf6bbec800/0xefacb2c2cc810a26 lrc: 3/1,0 mode: --/PR res: [0x24000ecde:0x10f:0x0].0x0 bits 0x13/0x8 rrc: 78 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 91507 timeout: 0 lvb_type: 0 Mar 20 03:53:12 fir-md1-s2 kernel: LustreError: 91507:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 2 previous similar messages Mar 20 03:54:42 fir-md1-s2 kernel: LNet: Service thread pid 91244 completed after 989.66s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Mar 20 03:54:42 fir-md1-s2 kernel: LustreError: 91441:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8eebb0679400 ns: mdt-fir-MDT0001_UUID lock: ffff8ecfd820d7c0/0xefacb2c2c4890953 lrc: 3/0,0 mode: PW/PW res: [0x24000ecde:0x10f:0x0].0x0 bits 0x40/0x0 rrc: 68 type: IBT flags: 0x50200400000020 nid: 10.9.101.37@o2ib4 remote: 0xcf080b98105da182 expref: 7 pid: 91441 timeout: 0 lvb_type: 0 Mar 20 03:54:42 fir-md1-s2 kernel: LustreError: 91441:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 4 previous similar messages Mar 20 03:54:42 fir-md1-s2 kernel: Lustre: 91441:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (167:823s); client may timeout. req@ffff8ec187ed4200 x1627854540433904/t0(0) o101->5535dd63-6e9b-7280-e906-63ff0acc4285@10.9.101.37@o2ib4:2/0 lens 480/536 e 1 to 0 dl 1553078459 ref 1 fl Complete:/0/0 rc -107/-107 Mar 20 03:54:42 fir-md1-s2 kernel: Lustre: 91441:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 4 previous similar messages Mar 20 03:54:42 fir-md1-s2 kernel: LNet: Skipped 5 previous similar messages Mar 20 03:55:02 fir-md1-s2 kernel: LNet: Service thread pid 91240 was inactive for 200.55s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Mar 20 03:55:02 fir-md1-s2 kernel: LNet: Skipped 2 previous similar messages Mar 20 03:55:02 fir-md1-s2 kernel: Pid: 91240, comm: mdt03_007 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 20 03:55:02 fir-md1-s2 kernel: Call Trace: Mar 20 03:55:02 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 20 03:55:02 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 20 03:55:02 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Mar 20 03:55:02 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Mar 20 03:55:03 fir-md1-s2 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Mar 20 03:55:03 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Mar 20 03:55:03 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Mar 20 03:55:03 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Mar 20 03:55:03 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Mar 20 03:55:03 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 20 03:55:03 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 20 03:55:03 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 20 03:55:03 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 20 03:55:03 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 20 03:55:03 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 20 03:55:03 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1553079303.91240 Mar 20 03:55:03 fir-md1-s2 kernel: Pid: 91507, comm: mdt00_054 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 20 03:55:03 fir-md1-s2 kernel: Call Trace: Mar 20 03:55:03 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 20 03:55:03 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 20 03:55:03 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Mar 20 03:55:03 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Mar 20 03:55:03 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Mar 20 03:55:03 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Mar 20 03:55:03 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Mar 20 03:55:03 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Mar 20 03:55:03 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Mar 20 03:55:03 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Mar 20 03:55:03 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Mar 20 03:55:03 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 20 03:55:03 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 20 03:55:03 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 20 03:55:03 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 20 03:55:03 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 20 03:55:03 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 20 03:55:03 fir-md1-s2 kernel: LNet: Service thread pid 91427 was inactive for 200.90s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Mar 20 03:55:32 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1553079332.91526 Mar 20 03:58:25 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 98e83ed5-6d59-446c-8f7b-05df06bf758a (at 10.9.101.36@o2ib4) reconnecting Mar 20 03:58:25 fir-md1-s2 kernel: Lustre: Skipped 311 previous similar messages Mar 20 03:58:46 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.108.55@o2ib4) Mar 20 03:58:46 fir-md1-s2 kernel: Lustre: Skipped 310 previous similar messages Mar 20 04:08:26 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 476e933b-2664-4b57-53cf-d95b660fb2b3 (at 10.9.101.5@o2ib4) reconnecting Mar 20 04:08:26 fir-md1-s2 kernel: Lustre: Skipped 249 previous similar messages Mar 20 04:08:57 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 476e933b-2664-4b57-53cf-d95b660fb2b3 (at 10.9.101.5@o2ib4) Mar 20 04:08:57 fir-md1-s2 kernel: Lustre: Skipped 252 previous similar messages Mar 20 04:18:27 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 89beb8b3-791a-f6c6-2375-0d31fb61e8aa (at 10.9.101.11@o2ib4) reconnecting Mar 20 04:18:27 fir-md1-s2 kernel: Lustre: Skipped 251 previous similar messages Mar 20 04:19:02 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.101.51@o2ib4) Mar 20 04:19:02 fir-md1-s2 kernel: Lustre: Skipped 253 previous similar messages Mar 20 04:28:35 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 476e933b-2664-4b57-53cf-d95b660fb2b3 (at 10.9.101.5@o2ib4) reconnecting Mar 20 04:28:35 fir-md1-s2 kernel: Lustre: Skipped 251 previous similar messages Mar 20 04:29:06 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 476e933b-2664-4b57-53cf-d95b660fb2b3 (at 10.9.101.5@o2ib4) Mar 20 04:29:06 fir-md1-s2 kernel: Lustre: Skipped 251 previous similar messages Mar 20 04:38:40 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 7dbe5d6b-9a6b-7c31-ad8a-ef556ccf4c10 (at 10.9.101.51@o2ib4) reconnecting Mar 20 04:38:40 fir-md1-s2 kernel: Lustre: Skipped 253 previous similar messages Mar 20 04:39:07 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 89beb8b3-791a-f6c6-2375-0d31fb61e8aa (at 10.9.101.11@o2ib4) Mar 20 04:39:07 fir-md1-s2 kernel: Lustre: Skipped 251 previous similar messages Mar 20 04:43:16 fir-md1-s2 kernel: LNet: Service thread pid 91262 completed after 3603.54s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Mar 20 04:43:16 fir-md1-s2 kernel: Lustre: 91216:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (167:3736s); client may timeout. req@ffff8eedd2ab9e00 x1627854540433968/t0(0) o55->5535dd63-6e9b-7280-e906-63ff0acc4285@10.9.101.37@o2ib4:2/0 lens 472/192 e 1 to 0 dl 1553078459 ref 1 fl Complete:/0/0 rc -22/-22 Mar 20 04:43:16 fir-md1-s2 kernel: Lustre: 91216:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 1 previous similar message Mar 20 04:43:16 fir-md1-s2 kernel: LustreError: 91256:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8eed9126b000 ns: mdt-fir-MDT0001_UUID lock: ffff8ecfcff27bc0/0xefacb2c2c988be0e lrc: 3/0,0 mode: PR/PR res: [0x24000ecde:0x10f:0x0].0x0 bits 0x20/0x0 rrc: 58 type: IBT flags: 0x50200000000000 nid: 10.9.101.9@o2ib4 remote: 0x20d2718795e5aee9 expref: 2 pid: 91256 timeout: 0 lvb_type: 0 Mar 20 04:43:16 fir-md1-s2 kernel: LustreError: 91256:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 1 previous similar message Mar 20 04:43:16 fir-md1-s2 kernel: LNet: Skipped 23 previous similar messages Mar 20 05:20:46 fir-md1-s2 kernel: Lustre: 91265:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8ecbcded9500 x1627854565293296/t112474941905(0) o36->1269d82d-2021-a198-cfa7-174e25a867c3@10.9.101.18@o2ib4:21/0 lens 488/3152 e 1 to 0 dl 1553084451 ref 2 fl Interpret:/0/0 rc 0/0 Mar 20 05:20:46 fir-md1-s2 kernel: Lustre: 91265:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 6 previous similar messages Mar 20 05:20:52 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 1269d82d-2021-a198-cfa7-174e25a867c3 (at 10.9.101.18@o2ib4) reconnecting Mar 20 05:20:52 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.101.23@o2ib4) Mar 20 05:20:52 fir-md1-s2 kernel: Lustre: Skipped 103 previous similar messages Mar 20 05:20:52 fir-md1-s2 kernel: Lustre: Skipped 122 previous similar messages Mar 20 05:22:01 fir-md1-s2 kernel: LustreError: 91579:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1553084431, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0001_UUID lock: ffff8edb6c54bcc0/0xefacb2c2ff3bf3a8 lrc: 3/0,1 mode: --/PW res: [0x24000ed04:0x2aa:0x0].0x0 bits 0x40/0x0 rrc: 59 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 91579 timeout: 0 lvb_type: 0 Mar 20 05:22:01 fir-md1-s2 kernel: LustreError: 91579:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 24 previous similar messages Mar 20 05:22:16 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 836c1478-78ed-7dca-ef04-91a42cb19ef4 (at 10.9.101.23@o2ib4) reconnecting Mar 20 05:22:16 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.101.1@o2ib4) Mar 20 05:22:16 fir-md1-s2 kernel: Lustre: Skipped 39 previous similar messages Mar 20 05:22:16 fir-md1-s2 kernel: Lustre: Skipped 35 previous similar messages Mar 20 05:23:01 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.9.101.7@o2ib4 ns: mdt-fir-MDT0001_UUID lock: ffff8eed79367bc0/0xefacb2c2ff3bf34d lrc: 3/0,0 mode: PW/PW res: [0x24000ed04:0x2aa:0x0].0x0 bits 0x40/0x0 rrc: 59 type: IBT flags: 0x60200400000020 nid: 10.9.101.7@o2ib4 remote: 0xc8b112266504c47c expref: 1145 pid: 91259 timeout: 929124 lvb_type: 0 Mar 20 05:23:01 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 2 previous similar messages Mar 20 05:23:01 fir-md1-s2 kernel: LustreError: 91245:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8ec9caaff400 ns: mdt-fir-MDT0001_UUID lock: ffff8eced7057500/0xefacb2c2ff3bf85c lrc: 3/0,0 mode: PR/PR res: [0x24000ed04:0x2aa:0x0].0x0 bits 0x20/0x0 rrc: 52 type: IBT flags: 0x50200000000000 nid: 10.9.101.7@o2ib4 remote: 0xc8b112266504c498 expref: 9 pid: 91245 timeout: 0 lvb_type: 0 Mar 20 05:23:01 fir-md1-s2 kernel: LustreError: 91245:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 9 previous similar messages Mar 20 05:23:26 fir-md1-s2 kernel: Lustre: 91526:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8ec1bfeead00 x1627853436122720/t0(0) o101->b436319f-69f7-dd07-932a-7b1aa38ec017@10.9.101.12@o2ib4:1/0 lens 480/568 e 0 to 0 dl 1553084611 ref 2 fl Interpret:/0/0 rc 0/0 Mar 20 05:23:26 fir-md1-s2 kernel: Lustre: 91526:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 38 previous similar messages Mar 20 05:24:31 fir-md1-s2 kernel: LustreError: 91565:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1553084581, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0001_UUID lock: ffff8eead43f9b00/0xefacb2c300aa64fb lrc: 3/0,1 mode: --/PW res: [0x24000ed04:0x2aa:0x0].0x0 bits 0x40/0x0 rrc: 51 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 91565 timeout: 0 lvb_type: 0 Mar 20 05:24:31 fir-md1-s2 kernel: LustreError: 91565:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 20 previous similar messages Mar 20 05:25:05 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 56f5734c-97c1-4112-5a94-6bf865e15363 (at 10.9.101.1@o2ib4) reconnecting Mar 20 05:25:05 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.101.18@o2ib4) Mar 20 05:25:05 fir-md1-s2 kernel: Lustre: Skipped 57 previous similar messages Mar 20 05:25:05 fir-md1-s2 kernel: Lustre: Skipped 57 previous similar messages Mar 20 05:25:31 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.9.101.18@o2ib4 ns: mdt-fir-MDT0001_UUID lock: ffff8ed00edc7bc0/0xefacb2c300aa47b2 lrc: 3/0,0 mode: PW/PW res: [0x24000ed04:0x2aa:0x0].0x0 bits 0x40/0x0 rrc: 51 type: IBT flags: 0x60200400000020 nid: 10.9.101.18@o2ib4 remote: 0x51e79ecb44439795 expref: 51 pid: 91608 timeout: 929274 lvb_type: 0 Mar 20 05:25:31 fir-md1-s2 kernel: LustreError: 91615:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8ebfeaa10c00 ns: mdt-fir-MDT0001_UUID lock: ffff8ee01df27500/0xefacb2c300aa660c lrc: 3/0,0 mode: --/PR res: [0x24000ed04:0x2aa:0x0].0x0 bits 0x1b/0x0 rrc: 47 type: IBT flags: 0x54a01400000020 nid: 10.9.101.18@o2ib4 remote: 0x51e79ecb444397b1 expref: 23 pid: 91615 timeout: 0 lvb_type: 0 Mar 20 05:25:31 fir-md1-s2 kernel: LustreError: 91615:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 2 previous similar messages Mar 20 05:25:56 fir-md1-s2 kernel: Lustre: 91532:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8eebf8f4ec00 x1627854565328432/t0(0) o101->1269d82d-2021-a198-cfa7-174e25a867c3@10.9.101.18@o2ib4:1/0 lens 600/3264 e 0 to 0 dl 1553084761 ref 2 fl Interpret:/0/0 rc 0/0 Mar 20 05:25:56 fir-md1-s2 kernel: Lustre: 91532:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 20 previous similar messages Mar 20 05:27:01 fir-md1-s2 kernel: LustreError: 91534:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1553084731, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0001_UUID lock: ffff8eebc5e2d7c0/0xefacb2c30227f1b9 lrc: 3/0,1 mode: --/PW res: [0x24000ed04:0x2aa:0x0].0x0 bits 0x40/0x0 rrc: 53 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 91534 timeout: 0 lvb_type: 0 Mar 20 05:27:01 fir-md1-s2 kernel: LustreError: 91534:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 20 previous similar messages Mar 20 05:28:31 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.9.101.16@o2ib4 ns: mdt-fir-MDT0001_UUID lock: ffff8eeaad6bb3c0/0xefacb2c30227f1ab lrc: 3/0,0 mode: PW/PW res: [0x24000ed04:0x2aa:0x0].0x0 bits 0x40/0x0 rrc: 53 type: IBT flags: 0x60200400000020 nid: 10.9.101.16@o2ib4 remote: 0x3c8988643a9cf04a expref: 1175 pid: 91495 timeout: 929454 lvb_type: 0 Mar 20 05:28:31 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message Mar 20 05:28:31 fir-md1-s2 kernel: LustreError: 14628:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8ec580ee3800 ns: mdt-fir-MDT0001_UUID lock: ffff8eba5826f980/0xefacb2c30227f1ea lrc: 3/0,0 mode: PW/PW res: [0x24000ed04:0x2aa:0x0].0x0 bits 0x40/0x0 rrc: 49 type: IBT flags: 0x50200400000020 nid: 10.9.101.16@o2ib4 remote: 0x3c8988643a9cf051 expref: 666 pid: 14628 timeout: 0 lvb_type: 0 Mar 20 05:28:31 fir-md1-s2 kernel: LustreError: 14628:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 1 previous similar message Mar 20 05:28:31 fir-md1-s2 kernel: Lustre: 99936:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:150s); client may timeout. req@ffff8ed380617500 x1626186690459888/t0(0) o101->15a58f54-b525-70df-3389-43a1f6518031@10.9.101.17@o2ib4:1/0 lens 568/1688 e 0 to 0 dl 1553084761 ref 1 fl Complete:/0/0 rc -107/-107 Mar 20 05:28:31 fir-md1-s2 kernel: Lustre: 99936:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 10 previous similar messages Mar 20 05:28:56 fir-md1-s2 kernel: Lustre: 91449:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8ebe4dea0900 x1627853436165792/t0(0) o101->b436319f-69f7-dd07-932a-7b1aa38ec017@10.9.101.12@o2ib4:1/0 lens 568/0 e 0 to 0 dl 1553084941 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Mar 20 05:28:56 fir-md1-s2 kernel: Lustre: 91449:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 24 previous similar messages Mar 20 05:29:31 fir-md1-s2 kernel: LustreError: 91181:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8eec7778c000 ns: mdt-fir-MDT0001_UUID lock: ffff8ec4fc72b3c0/0xefacb2c303ea9c32 lrc: 3/0,0 mode: PW/PW res: [0x24000ed04:0x2aa:0x0].0x0 bits 0x40/0x0 rrc: 27 type: IBT flags: 0x50200400000020 nid: 10.9.101.12@o2ib4 remote: 0xcd92587e7074bc5a expref: 7 pid: 91181 timeout: 0 lvb_type: 0 Mar 20 05:29:31 fir-md1-s2 kernel: LustreError: 91181:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 2 previous similar messages Mar 20 05:29:31 fir-md1-s2 kernel: Lustre: 91447:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:30s); client may timeout. req@ffff8ec376e74500 x1626160149798784/t0(0) o101->251ee163-1302-af54-569b-06691a518665@10.9.101.3@o2ib4:1/0 lens 568/1688 e 0 to 0 dl 1553084941 ref 1 fl Complete:/0/0 rc -107/-107 Mar 20 05:30:33 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 836c1478-78ed-7dca-ef04-91a42cb19ef4 (at 10.9.101.23@o2ib4) reconnecting Mar 20 05:30:33 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to f6951712-d609-35d5-7d48-8ccf309c970b (at 10.9.101.19@o2ib4) Mar 20 05:30:33 fir-md1-s2 kernel: Lustre: Skipped 69 previous similar messages Mar 20 05:30:33 fir-md1-s2 kernel: Lustre: Skipped 61 previous similar messages Mar 20 05:31:01 fir-md1-s2 kernel: LustreError: 91615:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1553084971, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0001_UUID lock: ffff8ed2b2fe8000/0xefacb2c3047bc8cf lrc: 3/0,1 mode: --/PW res: [0x24000ed04:0x2aa:0x0].0x0 bits 0x40/0x0 rrc: 19 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 91615 timeout: 0 lvb_type: 0 Mar 20 05:31:01 fir-md1-s2 kernel: LustreError: 91615:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 9 previous similar messages Mar 20 05:32:01 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.9.101.18@o2ib4 ns: mdt-fir-MDT0001_UUID lock: ffff8ee8caf34140/0xefacb2c3047bc755 lrc: 3/0,0 mode: PW/PW res: [0x24000ed04:0x2aa:0x0].0x0 bits 0x40/0x0 rrc: 19 type: IBT flags: 0x60200400000020 nid: 10.9.101.18@o2ib4 remote: 0x51e79ecb44439af2 expref: 32 pid: 91358 timeout: 929664 lvb_type: 0 Mar 20 05:32:01 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 2 previous similar messages Mar 20 05:32:31 fir-md1-s2 kernel: LustreError: 91465:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8eebb3282000 ns: mdt-fir-MDT0001_UUID lock: ffff8ec536f76e40/0xefacb2c3047bca81 lrc: 3/0,0 mode: PR/PR res: [0x24000ed04:0x2aa:0x0].0x0 bits 0x20/0x0 rrc: 14 type: IBT flags: 0x50200000000000 nid: 10.9.101.18@o2ib4 remote: 0x51e79ecb44439af9 expref: 2 pid: 91465 timeout: 0 lvb_type: 0 Mar 20 05:32:31 fir-md1-s2 kernel: LustreError: 91465:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 3 previous similar messages Mar 20 05:32:31 fir-md1-s2 kernel: Lustre: 91465:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (154:26s); client may timeout. req@ffff8ed44b2dc500 x1627854565352080/t0(0) o101->1269d82d-2021-a198-cfa7-174e25a867c3@10.9.101.18@o2ib4:1/0 lens 568/1688 e 0 to 0 dl 1553085125 ref 1 fl Complete:/0/0 rc -107/-107 Mar 20 06:16:07 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 9e716467-9dbf-5768-e1d9-bc12bbba7e1b (at 10.8.11.10@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8ec164e87400, cur 1553087767 expire 1553087617 last 1553087540 Mar 20 06:16:07 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 20 06:18:56 fir-md1-s2 kernel: Lustre: 14624:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 Mar 20 06:23:55 fir-md1-s2 kernel: Lustre: 91496:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 Mar 20 06:26:01 fir-md1-s2 kernel: Lustre: 91366:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 Mar 20 06:27:01 fir-md1-s2 kernel: Lustre: 91599:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 Mar 20 06:27:01 fir-md1-s2 kernel: Lustre: 91599:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 3 previous similar messages Mar 20 06:28:38 fir-md1-s2 kernel: Lustre: 14628:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 Mar 20 06:28:38 fir-md1-s2 kernel: Lustre: 14628:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1 previous similar message Mar 20 06:28:47 fir-md1-s2 kernel: Lustre: 91538:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 Mar 20 06:28:47 fir-md1-s2 kernel: Lustre: 91538:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 3 previous similar messages Mar 20 06:29:24 fir-md1-s2 kernel: Lustre: 91446:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 Mar 20 06:29:24 fir-md1-s2 kernel: Lustre: 91446:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 2 previous similar messages Mar 20 06:29:38 fir-md1-s2 kernel: LNetError: 90654:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Mar 20 06:31:37 fir-md1-s2 kernel: Lustre: 91377:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 Mar 20 06:31:37 fir-md1-s2 kernel: Lustre: 91377:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1 previous similar message Mar 20 06:32:50 fir-md1-s2 kernel: Lustre: 91673:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 Mar 20 06:32:50 fir-md1-s2 kernel: Lustre: 91673:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 1 previous similar message Mar 20 06:35:21 fir-md1-s2 kernel: Lustre: 91507:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 Mar 20 06:35:21 fir-md1-s2 kernel: Lustre: 91507:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 8 previous similar messages Mar 20 06:35:41 fir-md1-s2 kernel: Lustre: 91502:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8ef077239200 x1628201812650208/t0(0) o101->debb416f-0fe0-cd8f-042c-05dad2a71948@10.8.6.29@o2ib6:16/0 lens 480/568 e 1 to 0 dl 1553088946 ref 2 fl Interpret:/0/0 rc 0/0 Mar 20 06:35:41 fir-md1-s2 kernel: Lustre: 91502:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 38 previous similar messages Mar 20 06:35:47 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 3e7bdb61-7d03-0856-ad48-42f575f88c45 (at 10.8.3.34@o2ib6) reconnecting Mar 20 06:35:47 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.3.19@o2ib6) Mar 20 06:35:47 fir-md1-s2 kernel: Lustre: Skipped 19 previous similar messages Mar 20 06:35:47 fir-md1-s2 kernel: Lustre: Skipped 22 previous similar messages Mar 20 06:35:56 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.8.3.36@o2ib6 ns: mdt-fir-MDT0001_UUID lock: ffff8ec9ac246300/0xefacb2c3479906b8 lrc: 3/0,0 mode: PW/PW res: [0x24000de34:0x9031:0x0].0x0 bits 0x40/0x0 rrc: 88 type: IBT flags: 0x60200400000020 nid: 10.8.3.36@o2ib6 remote: 0x7b02a2d2b581e459 expref: 1742 pid: 91431 timeout: 933499 lvb_type: 0 Mar 20 06:35:56 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message Mar 20 06:36:21 fir-md1-s2 kernel: Lustre: 91620:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8eb8980dd400 x1628373123521552/t0(0) o101->176f37b7-0202-8b9c-67fe-4ffe9dd08f19@10.8.3.5@o2ib6:26/0 lens 568/0 e 0 to 0 dl 1553088986 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Mar 20 06:36:21 fir-md1-s2 kernel: Lustre: 91620:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 9 previous similar messages Mar 20 06:36:29 fir-md1-s2 kernel: Lustre: 91496:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (62:1s); client may timeout. req@ffff8eb7a7590000 x1626167729750864/t0(0) o101->cee702f6-e89d-9dd2-9c7a-a601e6ab4052@10.8.3.19@o2ib6:16/0 lens 480/536 e 1 to 0 dl 1553088988 ref 1 fl Complete:/0/0 rc 0/0 Mar 20 06:36:29 fir-md1-s2 kernel: Lustre: 91496:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 6 previous similar messages Mar 20 06:36:56 fir-md1-s2 kernel: LustreError: 91545:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1553088926, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0001_UUID lock: ffff8ec61ee4b840/0xefacb2c34799188c lrc: 3/0,1 mode: --/PW res: [0x24000de34:0x9031:0x0].0x0 bits 0x40/0x0 rrc: 85 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 91545 timeout: 0 lvb_type: 0 Mar 20 06:36:56 fir-md1-s2 kernel: LustreError: 91545:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 21 previous similar messages Mar 20 06:36:58 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.3.27@o2ib6 ns: mdt-fir-MDT0001_UUID lock: ffff8ec5a4e7f980/0xefacb2c347991100 lrc: 3/0,0 mode: PW/PW res: [0x24000de34:0x9031:0x0].0x0 bits 0x40/0x0 rrc: 83 type: IBT flags: 0x60200400000020 nid: 10.8.3.27@o2ib6 remote: 0xe28779d6fec6aeae expref: 23 pid: 91065 timeout: 933561 lvb_type: 0 Mar 20 06:36:58 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message Mar 20 06:37:11 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 075e30df-93a5-2467-3241-a81e9d8dd63a (at 10.8.3.27@o2ib6) Mar 20 06:37:11 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 5657a600-a491-e40c-8bc7-921d13a84c42 (at 10.8.3.31@o2ib6) reconnecting Mar 20 06:37:11 fir-md1-s2 kernel: Lustre: Skipped 51 previous similar messages Mar 20 06:37:11 fir-md1-s2 kernel: Lustre: Skipped 62 previous similar messages Mar 20 06:37:26 fir-md1-s2 kernel: LustreError: 91442:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1553088956, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0001_UUID lock: ffff8ebd3fe7b840/0xefacb2c3487e7b8c lrc: 3/1,0 mode: --/PR res: [0x24000de34:0x9031:0x0].0x0 bits 0x13/0x8 rrc: 82 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 91442 timeout: 0 lvb_type: 0 Mar 20 06:37:26 fir-md1-s2 kernel: LustreError: 91442:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 6 previous similar messages Mar 20 06:37:46 fir-md1-s2 kernel: Lustre: 91526:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-28), not sending early reply req@ffff8ec52eab3300 x1628102348819984/t0(0) o101->3e7bdb61-7d03-0856-ad48-42f575f88c45@10.8.3.34@o2ib6:21/0 lens 568/0 e 0 to 0 dl 1553089071 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Mar 20 06:37:46 fir-md1-s2 kernel: Lustre: 91526:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 9 previous similar messages Mar 20 06:37:59 fir-md1-s2 kernel: LustreError: 91621:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1553088989, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0001_UUID lock: ffff8edb376ea640/0xefacb2c3497dac55 lrc: 3/1,0 mode: --/PR res: [0x24000de34:0x9031:0x0].0x0 bits 0x13/0x8 rrc: 82 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 91621 timeout: 0 lvb_type: 0 Mar 20 06:37:59 fir-md1-s2 kernel: LustreError: 91621:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 4 previous similar messages Mar 20 06:38:47 fir-md1-s2 kernel: LNet: Service thread pid 91222 was inactive for 200.64s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Mar 20 06:38:47 fir-md1-s2 kernel: LNet: Skipped 1 previous similar message Mar 20 06:38:47 fir-md1-s2 kernel: Pid: 91222, comm: mdt01_017 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 20 06:38:47 fir-md1-s2 kernel: Call Trace: Mar 20 06:38:47 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 20 06:38:47 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 20 06:38:47 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Mar 20 06:38:47 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Mar 20 06:38:47 fir-md1-s2 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Mar 20 06:38:47 fir-md1-s2 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Mar 20 06:38:47 fir-md1-s2 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Mar 20 06:38:47 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Mar 20 06:38:47 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Mar 20 06:38:47 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Mar 20 06:38:47 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Mar 20 06:38:47 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 20 06:38:47 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 20 06:38:47 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 20 06:38:47 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 20 06:38:47 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 20 06:38:47 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 20 06:38:47 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1553089127.91222 Mar 20 06:38:48 fir-md1-s2 kernel: LNet: Service thread pid 90859 was inactive for 202.17s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Mar 20 06:38:48 fir-md1-s2 kernel: Pid: 90859, comm: mdt01_002 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 20 06:38:48 fir-md1-s2 kernel: Call Trace: Mar 20 06:38:48 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 20 06:38:48 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 20 06:38:48 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Mar 20 06:38:48 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Mar 20 06:38:48 fir-md1-s2 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Mar 20 06:38:48 fir-md1-s2 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Mar 20 06:38:48 fir-md1-s2 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Mar 20 06:38:48 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Mar 20 06:38:48 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Mar 20 06:38:48 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Mar 20 06:38:48 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Mar 20 06:38:48 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 20 06:38:48 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 20 06:38:48 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 20 06:38:48 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 20 06:38:48 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 20 06:38:48 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 20 06:38:48 fir-md1-s2 kernel: Pid: 91252, comm: mdt01_025 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 20 06:38:48 fir-md1-s2 kernel: Call Trace: Mar 20 06:38:48 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 20 06:38:48 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 20 06:38:48 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Mar 20 06:38:48 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Mar 20 06:38:48 fir-md1-s2 kernel: [] mdt_reint_object_lock+0x2c/0x60 [mdt] Mar 20 06:38:48 fir-md1-s2 kernel: [] mdt_reint_striped_lock+0x8c/0x510 [mdt] Mar 20 06:38:48 fir-md1-s2 kernel: [] mdt_reint_setattr+0x6c8/0x12d0 [mdt] Mar 20 06:38:48 fir-md1-s2 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Mar 20 06:38:48 fir-md1-s2 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Mar 20 06:38:48 fir-md1-s2 kernel: [] mdt_reint+0x67/0x140 [mdt] Mar 20 06:38:48 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 20 06:38:48 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 20 06:38:48 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 20 06:38:48 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 20 06:38:48 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 20 06:38:48 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 20 06:38:48 fir-md1-s2 kernel: Pid: 91611, comm: mdt01_109 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 20 06:38:48 fir-md1-s2 kernel: Call Trace: Mar 20 06:38:48 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 20 06:38:49 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 20 06:38:49 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Mar 20 06:38:49 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Mar 20 06:38:49 fir-md1-s2 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Mar 20 06:38:49 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Mar 20 06:38:49 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Mar 20 06:38:49 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Mar 20 06:38:49 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Mar 20 06:38:49 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 20 06:38:49 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 20 06:38:49 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 20 06:38:49 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 20 06:38:49 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 20 06:38:49 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 20 06:38:49 fir-md1-s2 kernel: Pid: 91432, comm: mdt01_056 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 20 06:38:49 fir-md1-s2 kernel: Call Trace: Mar 20 06:38:49 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 20 06:38:49 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 20 06:38:49 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Mar 20 06:38:49 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Mar 20 06:38:49 fir-md1-s2 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Mar 20 06:38:49 fir-md1-s2 kernel: [] mdt_hsm_state_set+0xc9/0x830 [mdt] Mar 20 06:38:49 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 20 06:38:49 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 20 06:38:49 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 20 06:38:49 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 20 06:38:49 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 20 06:38:49 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 20 06:38:49 fir-md1-s2 kernel: LNet: Service thread pid 90857 was inactive for 202.63s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Mar 20 06:38:49 fir-md1-s2 kernel: LNet: Skipped 3 previous similar messages Mar 20 06:39:16 fir-md1-s2 kernel: LNet: Service thread pid 91409 was inactive for 200.22s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Mar 20 06:39:16 fir-md1-s2 kernel: LNet: Skipped 16 previous similar messages Mar 20 06:39:16 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1553089156.91409 Mar 20 06:39:28 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.8.3.34@o2ib6 ns: mdt-fir-MDT0001_UUID lock: ffff8ecc267be780/0xefacb2c3479914b1 lrc: 3/0,0 mode: PW/PW res: [0x24000de34:0x9031:0x0].0x0 bits 0x40/0x0 rrc: 84 type: IBT flags: 0x60200400000020 nid: 10.8.3.34@o2ib6 remote: 0xab6513b52e018294 expref: 21 pid: 91245 timeout: 933711 lvb_type: 0 Mar 20 06:39:28 fir-md1-s2 kernel: LustreError: 91480:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8ec1f434f800 ns: mdt-fir-MDT0001_UUID lock: ffff8ed01fa5bf00/0xefacb2c34799159f lrc: 3/0,0 mode: PW/PW res: [0x24000de34:0x9031:0x0].0x0 bits 0x40/0x0 rrc: 81 type: IBT flags: 0x50200400000020 nid: 10.8.3.30@o2ib6 remote: 0x4fa3d464e4ae4b01 expref: 5 pid: 91480 timeout: 0 lvb_type: 0 Mar 20 06:39:28 fir-md1-s2 kernel: LustreError: 91480:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 1 previous similar message Mar 20 06:39:28 fir-md1-s2 kernel: Lustre: 91480:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (62:180s); client may timeout. req@ffff8ec481e32400 x1628361117212304/t0(0) o101->27ef6658-3f92-cdfd-8d7e-76544a62a641@10.8.3.30@o2ib6:16/0 lens 480/536 e 1 to 0 dl 1553088988 ref 1 fl Complete:/0/0 rc -107/-107 Mar 20 06:39:28 fir-md1-s2 kernel: LNet: Service thread pid 14635 completed after 241.87s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Mar 20 06:39:28 fir-md1-s2 kernel: LNet: Skipped 2 previous similar messages Mar 20 06:39:28 fir-md1-s2 kernel: Lustre: 91480:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 1 previous similar message Mar 20 06:39:49 fir-md1-s2 kernel: LNet: Service thread pid 91672 was inactive for 200.48s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Mar 20 06:39:49 fir-md1-s2 kernel: LNet: Skipped 6 previous similar messages Mar 20 06:39:49 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1553089189.91672 Mar 20 06:39:50 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 3e7bdb61-7d03-0856-ad48-42f575f88c45 (at 10.8.3.34@o2ib6) Mar 20 06:39:50 fir-md1-s2 kernel: Lustre: Skipped 93 previous similar messages Mar 20 06:39:52 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client f2f4d52c-2807-60dd-8532-99fa3e9aeefa (at 10.0.10.3@o2ib7) reconnecting Mar 20 06:39:52 fir-md1-s2 kernel: Lustre: Skipped 94 previous similar messages Mar 20 06:39:58 fir-md1-s2 kernel: LNet: Service thread pid 91566 completed after 271.79s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Mar 20 06:39:58 fir-md1-s2 kernel: LNet: Skipped 7 previous similar messages Mar 20 06:40:03 fir-md1-s2 kernel: Lustre: 91055:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-227), not sending early reply req@ffff8ec699b22d00 x1628201812657904/t0(0) o55->debb416f-0fe0-cd8f-042c-05dad2a71948@10.8.6.29@o2ib6:8/0 lens 472/224 e 0 to 0 dl 1553089208 ref 2 fl Interpret:/0/0 rc 0/0 Mar 20 06:40:03 fir-md1-s2 kernel: Lustre: 91055:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 4 previous similar messages Mar 20 06:40:18 fir-md1-s2 kernel: LNet: Service thread pid 91064 was inactive for 200.44s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Mar 20 06:40:18 fir-md1-s2 kernel: LNet: Skipped 4 previous similar messages Mar 20 06:40:18 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1553089218.91064 Mar 20 06:40:20 fir-md1-s2 kernel: LustreError: 14607:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1553089130, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0001_UUID lock: ffff8ebe72ed3f00/0xefacb2c34dc0f82a lrc: 3/1,0 mode: --/PR res: [0x24000de34:0x9031:0x0].0x0 bits 0x12/0x0 rrc: 77 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 14607 timeout: 0 lvb_type: 0 Mar 20 06:40:20 fir-md1-s2 kernel: LustreError: 14607:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 1 previous similar message Mar 20 06:40:28 fir-md1-s2 kernel: Lustre: 91586:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (104:198s); client may timeout. req@ffff8ec1a5fbb600 x1626135947321840/t0(0) o55->075e30df-93a5-2467-3241-a81e9d8dd63a@10.8.3.27@o2ib6:16/0 lens 472/192 e 1 to 0 dl 1553089030 ref 1 fl Complete:/0/0 rc -22/-22 Mar 20 06:40:28 fir-md1-s2 kernel: LNet: Service thread pid 91355 completed after 302.17s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Mar 20 06:40:28 fir-md1-s2 kernel: LustreError: 91066:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8eea5ceb1400 ns: mdt-fir-MDT0001_UUID lock: ffff8ecec1678240/0xefacb2c3479928ca lrc: 3/0,0 mode: PW/PW res: [0x24000de34:0x9031:0x0].0x0 bits 0x40/0x0 rrc: 70 type: IBT flags: 0x50200400000020 nid: 10.8.3.27@o2ib6 remote: 0xe28779d6fec6aebc expref: 7 pid: 91066 timeout: 0 lvb_type: 0 Mar 20 06:40:28 fir-md1-s2 kernel: Lustre: 91586:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 5 previous similar messages Mar 20 06:40:34 fir-md1-s2 kernel: LustreError: 91357:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8eefcb45c400 ns: mdt-fir-MDT0001_UUID lock: ffff8ed995d0de80/0xefacb2c347992ad7 lrc: 3/0,0 mode: PW/PW res: [0x24000de34:0x9031:0x0].0x0 bits 0x40/0x0 rrc: 64 type: IBT flags: 0x50200400000020 nid: 10.8.3.2@o2ib6 remote: 0xcb007b693335d9d8 expref: 1458 pid: 91357 timeout: 0 lvb_type: 0 Mar 20 06:40:34 fir-md1-s2 kernel: LustreError: 91357:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 3 previous similar messages Mar 20 06:40:34 fir-md1-s2 kernel: LNet: Service thread pid 91357 completed after 307.77s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Mar 20 06:40:34 fir-md1-s2 kernel: Lustre: 91507:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (272:36s); client may timeout. req@ffff8eba6d480900 x1626189305863984/t0(0) o101->5657a600-a491-e40c-8bc7-921d13a84c42@10.8.3.31@o2ib6:16/0 lens 576/1168 e 1 to 0 dl 1553089198 ref 1 fl Complete:/0/0 rc -107/-107 Mar 20 06:40:34 fir-md1-s2 kernel: LNet: Skipped 23 previous similar messages Mar 20 06:40:47 fir-md1-s2 kernel: Lustre: 91371:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 Mar 20 06:40:47 fir-md1-s2 kernel: Lustre: 91371:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 7 previous similar messages Mar 20 06:41:35 fir-md1-s2 kernel: LustreError: 91468:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8eea152a6c00 ns: mdt-fir-MDT0001_UUID lock: ffff8ec886b75580/0xefacb2c350f65f04 lrc: 3/0,0 mode: PR/PR res: [0x24000ed30:0x190c:0x0].0x0 bits 0x1b/0x0 rrc: 26 type: IBT flags: 0x50200400000020 nid: 10.8.3.19@o2ib6 remote: 0x48582971d201e885 expref: 4 pid: 91468 timeout: 0 lvb_type: 0 Mar 20 06:41:35 fir-md1-s2 kernel: LustreError: 91468:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 7 previous similar messages Mar 20 06:41:35 fir-md1-s2 kernel: Lustre: 91468:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:30s); client may timeout. req@ffff8ebe709d0000 x1626167731126320/t0(0) o101->cee702f6-e89d-9dd2-9c7a-a601e6ab4052@10.8.3.19@o2ib6:5/0 lens 592/1168 e 0 to 0 dl 1553089265 ref 1 fl Complete:/0/0 rc -107/-107 Mar 20 06:41:35 fir-md1-s2 kernel: Lustre: 91468:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 7 previous similar messages Mar 20 06:43:05 fir-md1-s2 kernel: LustreError: 91621:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1553089295, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0001_UUID lock: ffff8edd2127ad00/0xefacb2c352d0bf41 lrc: 3/0,1 mode: --/PW res: [0x24000ed30:0x190c:0x0].0x0 bits 0x40/0x0 rrc: 34 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 91621 timeout: 0 lvb_type: 0 Mar 20 06:43:05 fir-md1-s2 kernel: LustreError: 91621:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 4 previous similar messages Mar 20 06:44:05 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.8.3.34@o2ib6 ns: mdt-fir-MDT0001_UUID lock: ffff8ecfa8a5f740/0xefacb2c352d0bc93 lrc: 3/0,0 mode: PW/PW res: [0x24000ed30:0x190c:0x0].0x0 bits 0x40/0x0 rrc: 36 type: IBT flags: 0x60200400000020 nid: 10.8.3.34@o2ib6 remote: 0xab6513b52e018dca expref: 45 pid: 91256 timeout: 933988 lvb_type: 0 Mar 20 06:44:05 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 4 previous similar messages Mar 20 06:44:30 fir-md1-s2 kernel: Lustre: 91256:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8ed3d63ee600 x1628124057284544/t0(0) o101->31e2cd6e-16e4-6dad-21cc-5cce31cb70de@10.8.3.8@o2ib6:5/0 lens 568/0 e 0 to 0 dl 1553089475 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Mar 20 06:44:30 fir-md1-s2 kernel: Lustre: 91256:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 29 previous similar messages Mar 20 06:44:51 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to f2f4d52c-2807-60dd-8532-99fa3e9aeefa (at 10.0.10.3@o2ib7) Mar 20 06:44:51 fir-md1-s2 kernel: Lustre: Skipped 86 previous similar messages Mar 20 06:44:55 fir-md1-s2 kernel: LNet: Service thread pid 91252 was inactive for 200.05s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Mar 20 06:44:55 fir-md1-s2 kernel: LNet: Skipped 3 previous similar messages Mar 20 06:44:55 fir-md1-s2 kernel: Pid: 91252, comm: mdt01_025 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 20 06:44:55 fir-md1-s2 kernel: Call Trace: Mar 20 06:44:55 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 20 06:44:55 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 20 06:44:55 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Mar 20 06:44:55 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Mar 20 06:44:55 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Mar 20 06:44:55 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Mar 20 06:44:55 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Mar 20 06:44:55 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Mar 20 06:44:55 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Mar 20 06:44:55 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Mar 20 06:44:55 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Mar 20 06:44:55 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 20 06:44:55 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 20 06:44:55 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 20 06:44:55 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 20 06:44:55 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 20 06:44:55 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 20 06:44:55 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1553089495.91252 Mar 20 06:44:57 fir-md1-s2 kernel: Pid: 91597, comm: mdt01_106 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 20 06:44:57 fir-md1-s2 kernel: Call Trace: Mar 20 06:44:57 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 20 06:44:57 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 20 06:44:57 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Mar 20 06:44:57 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Mar 20 06:44:57 fir-md1-s2 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Mar 20 06:44:57 fir-md1-s2 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Mar 20 06:44:57 fir-md1-s2 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Mar 20 06:44:57 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Mar 20 06:44:57 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Mar 20 06:44:57 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Mar 20 06:44:57 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Mar 20 06:44:57 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 20 06:44:57 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 20 06:44:57 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 20 06:44:57 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 20 06:44:57 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 20 06:44:57 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 20 06:44:57 fir-md1-s2 kernel: Pid: 14612, comm: mdt00_094 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 20 06:44:57 fir-md1-s2 kernel: Call Trace: Mar 20 06:44:57 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 20 06:44:57 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 20 06:44:57 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Mar 20 06:44:57 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Mar 20 06:44:57 fir-md1-s2 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Mar 20 06:44:57 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Mar 20 06:44:57 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Mar 20 06:44:57 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Mar 20 06:44:57 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Mar 20 06:44:57 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 20 06:44:57 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 20 06:44:57 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 20 06:44:57 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 20 06:44:57 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 20 06:44:57 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 20 06:44:57 fir-md1-s2 kernel: Pid: 91576, comm: mdt01_103 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 20 06:44:57 fir-md1-s2 kernel: Call Trace: Mar 20 06:44:57 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 20 06:44:57 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 20 06:44:57 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Mar 20 06:44:57 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Mar 20 06:44:57 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Mar 20 06:44:57 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Mar 20 06:44:57 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Mar 20 06:44:57 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Mar 20 06:44:57 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Mar 20 06:44:57 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Mar 20 06:44:57 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Mar 20 06:44:57 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 20 06:44:57 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 20 06:44:57 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 20 06:44:57 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 20 06:44:57 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 20 06:44:57 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 20 06:44:57 fir-md1-s2 kernel: Pid: 91628, comm: mdt00_075 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 20 06:44:57 fir-md1-s2 kernel: Call Trace: Mar 20 06:44:57 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 20 06:44:57 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 20 06:44:57 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Mar 20 06:44:57 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Mar 20 06:44:57 fir-md1-s2 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Mar 20 06:44:57 fir-md1-s2 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Mar 20 06:44:57 fir-md1-s2 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Mar 20 06:44:57 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Mar 20 06:44:57 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Mar 20 06:44:57 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Mar 20 06:44:57 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Mar 20 06:44:57 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 20 06:44:57 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 20 06:44:57 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 20 06:44:57 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 20 06:44:57 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 20 06:44:57 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 20 06:44:57 fir-md1-s2 kernel: LNet: Service thread pid 91260 was inactive for 202.03s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Mar 20 06:45:05 fir-md1-s2 kernel: LNet: Service thread pid 91597 completed after 209.63s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Mar 20 06:45:12 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 6bd865bf-f4c2-4c15-6221-33ec5e006a7a (at 10.8.3.26@o2ib6) reconnecting Mar 20 06:45:12 fir-md1-s2 kernel: Lustre: Skipped 81 previous similar messages Mar 20 06:45:35 fir-md1-s2 kernel: LNet: Service thread pid 91628 completed after 239.61s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Mar 20 06:45:35 fir-md1-s2 kernel: LustreError: 91484:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1553089445, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0001_UUID lock: ffff8ecf28aae780/0xefacb2c3576173d0 lrc: 3/1,0 mode: --/PR res: [0x24000ed30:0x190c:0x0].0x0 bits 0x20/0x0 rrc: 31 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 91484 timeout: 0 lvb_type: 0 Mar 20 06:45:35 fir-md1-s2 kernel: LustreError: 91484:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 3 previous similar messages Mar 20 06:46:05 fir-md1-s2 kernel: LustreError: 111821:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8ebc88522400 ns: mdt-fir-MDT0001_UUID lock: ffff8ebbfd3bfbc0/0xefacb2c35a22d55d lrc: 3/0,0 mode: PR/PR res: [0x24000ed30:0x190c:0x0].0x0 bits 0x20/0x0 rrc: 28 type: IBT flags: 0x50200000000000 nid: 10.8.3.12@o2ib6 remote: 0xca67744c22db79e5 expref: 1729 pid: 111821 timeout: 0 lvb_type: 0 Mar 20 06:46:05 fir-md1-s2 kernel: LustreError: 111821:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 2 previous similar messages Mar 20 06:46:05 fir-md1-s2 kernel: Lustre: 91452:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:30s); client may timeout. req@ffff8ec311e33300 x1626191684138384/t0(0) o101->070c5d4d-518c-dc6c-5dcf-f6e61b582aee@10.8.3.28@o2ib6:5/0 lens 568/1680 e 0 to 0 dl 1553089535 ref 1 fl Complete:/0/0 rc -107/-107 Mar 20 06:46:05 fir-md1-s2 kernel: Lustre: 91452:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 3 previous similar messages Mar 20 06:46:27 fir-md1-s2 kernel: LustreError: 91404:0:(mdt_io.c:442:mdt_preprw_write()) fir-MDT0001: BRW to missing obj [0x240006091:0xbe2f:0x0] Mar 20 06:49:45 fir-md1-s2 kernel: Lustre: 14607:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 Mar 20 06:49:45 fir-md1-s2 kernel: Lustre: 14607:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 43 previous similar messages Mar 20 06:59:46 fir-md1-s2 kernel: Lustre: 14609:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 Mar 20 06:59:46 fir-md1-s2 kernel: Lustre: 14609:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 12 previous similar messages Mar 20 07:00:53 fir-md1-s2 kernel: LustreError: 91765:0:(mdt_io.c:442:mdt_preprw_write()) fir-MDT0001: BRW to missing obj [0x24000ece3:0x519e:0x0] Mar 20 07:12:50 fir-md1-s2 kernel: Lustre: 91382:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 Mar 20 07:12:50 fir-md1-s2 kernel: Lustre: 91382:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 17 previous similar messages Mar 20 07:17:38 fir-md1-s2 kernel: LustreError: 91690:0:(mdt_io.c:442:mdt_preprw_write()) fir-MDT0001: BRW to missing obj [0x24000602e:0x6a45:0x0] Mar 20 07:23:15 fir-md1-s2 kernel: Lustre: 14616:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 Mar 20 07:23:15 fir-md1-s2 kernel: Lustre: 14616:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 28 previous similar messages Mar 20 07:26:40 fir-md1-s2 kernel: LustreError: 90898:0:(mdt_io.c:442:mdt_preprw_write()) fir-MDT0001: BRW to missing obj [0x24000e5fe:0x7721:0x0] Mar 20 07:34:03 fir-md1-s2 kernel: Lustre: 91619:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 Mar 20 07:34:03 fir-md1-s2 kernel: Lustre: 91619:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 29 previous similar messages Mar 20 07:34:31 fir-md1-s2 kernel: Lustre: 91237:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8eb2bbaf0900 x1626123579763344/t0(0) o101->810fe316-e09a-254c-3020-2540e531f84e@10.9.101.7@o2ib4:6/0 lens 480/568 e 1 to 0 dl 1553092476 ref 2 fl Interpret:/0/0 rc 0/0 Mar 20 07:34:31 fir-md1-s2 kernel: Lustre: 91237:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 3 previous similar messages Mar 20 07:34:37 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 810fe316-e09a-254c-3020-2540e531f84e (at 10.9.101.7@o2ib4) reconnecting Mar 20 07:34:37 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to b436319f-69f7-dd07-932a-7b1aa38ec017 (at 10.9.101.12@o2ib4) Mar 20 07:34:37 fir-md1-s2 kernel: Lustre: Skipped 14 previous similar messages Mar 20 07:34:37 fir-md1-s2 kernel: Lustre: Skipped 14 previous similar messages Mar 20 07:34:46 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.9.101.3@o2ib4 ns: mdt-fir-MDT0001_UUID lock: ffff8ed928a51680/0xefacb2c3b34f3f9d lrc: 3/0,0 mode: PW/PW res: [0x24000ed23:0xf4:0x0].0x0 bits 0x40/0x0 rrc: 34 type: IBT flags: 0x60200400000020 nid: 10.9.101.3@o2ib4 remote: 0x58d12bfdf409870e expref: 263 pid: 14629 timeout: 937029 lvb_type: 0 Mar 20 07:34:46 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 4 previous similar messages Mar 20 07:34:46 fir-md1-s2 kernel: LustreError: 91466:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8edd2eaf2000 ns: mdt-fir-MDT0001_UUID lock: ffff8edfb202b3c0/0xefacb2c3b34f5863 lrc: 3/0,0 mode: PW/PW res: [0x24000ed23:0xf4:0x0].0x0 bits 0x40/0x0 rrc: 35 type: IBT flags: 0x50200400000020 nid: 10.9.101.3@o2ib4 remote: 0x58d12bfdf4098715 expref: 7 pid: 91466 timeout: 0 lvb_type: 0 Mar 20 07:34:46 fir-md1-s2 kernel: LustreError: 91466:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 4 previous similar messages Mar 20 07:35:46 fir-md1-s2 kernel: LustreError: 91630:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1553092456, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0001_UUID lock: ffff8ed29be61680/0xefacb2c3b34f5b8f lrc: 3/0,1 mode: --/PW res: [0x24000ed23:0xf4:0x0].0x0 bits 0x40/0x0 rrc: 34 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 91630 timeout: 0 lvb_type: 0 Mar 20 07:35:46 fir-md1-s2 kernel: LustreError: 91630:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 1 previous similar message Mar 20 07:36:01 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client fa565bf9-4e3f-b131-86ef-18427c7a396c (at 10.9.101.24@o2ib4) reconnecting Mar 20 07:36:01 fir-md1-s2 kernel: Lustre: Skipped 16 previous similar messages Mar 20 07:36:01 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.101.24@o2ib4) Mar 20 07:36:01 fir-md1-s2 kernel: Lustre: Skipped 22 previous similar messages Mar 20 07:37:36 fir-md1-s2 kernel: LNet: Service thread pid 91630 was inactive for 200.56s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Mar 20 07:37:36 fir-md1-s2 kernel: LNet: Skipped 4 previous similar messages Mar 20 07:37:36 fir-md1-s2 kernel: Pid: 91630, comm: mdt02_091 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 20 07:37:36 fir-md1-s2 kernel: Call Trace: Mar 20 07:37:36 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 20 07:37:36 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 20 07:37:36 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Mar 20 07:37:36 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Mar 20 07:37:36 fir-md1-s2 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Mar 20 07:37:36 fir-md1-s2 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Mar 20 07:37:37 fir-md1-s2 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Mar 20 07:37:37 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Mar 20 07:37:37 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Mar 20 07:37:37 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Mar 20 07:37:37 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Mar 20 07:37:37 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 20 07:37:37 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 20 07:37:37 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 20 07:37:37 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 20 07:37:37 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 20 07:37:37 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 20 07:37:37 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1553092657.91630 Mar 20 07:37:38 fir-md1-s2 kernel: LNet: Service thread pid 91438 was inactive for 202.11s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Mar 20 07:37:38 fir-md1-s2 kernel: Pid: 91438, comm: mdt00_043 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 20 07:37:38 fir-md1-s2 kernel: Call Trace: Mar 20 07:37:38 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 20 07:37:38 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 20 07:37:38 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Mar 20 07:37:38 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Mar 20 07:37:38 fir-md1-s2 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Mar 20 07:37:38 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Mar 20 07:37:38 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Mar 20 07:37:38 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Mar 20 07:37:38 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Mar 20 07:37:38 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 20 07:37:38 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 20 07:37:38 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 20 07:37:38 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 20 07:37:38 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 20 07:37:38 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 20 07:38:06 fir-md1-s2 kernel: LNet: Service thread pid 91242 was inactive for 200.22s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Mar 20 07:38:06 fir-md1-s2 kernel: Pid: 91242, comm: mdt03_008 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 20 07:38:06 fir-md1-s2 kernel: Call Trace: Mar 20 07:38:06 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 20 07:38:06 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 20 07:38:06 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Mar 20 07:38:06 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Mar 20 07:38:06 fir-md1-s2 kernel: [] mdt_reint_object_lock+0x2c/0x60 [mdt] Mar 20 07:38:06 fir-md1-s2 kernel: [] mdt_reint_striped_lock+0x8c/0x510 [mdt] Mar 20 07:38:06 fir-md1-s2 kernel: [] mdt_reint_setattr+0x6c8/0x12d0 [mdt] Mar 20 07:38:06 fir-md1-s2 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Mar 20 07:38:06 fir-md1-s2 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Mar 20 07:38:06 fir-md1-s2 kernel: [] mdt_reint+0x67/0x140 [mdt] Mar 20 07:38:06 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 20 07:38:06 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 20 07:38:06 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 20 07:38:06 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 20 07:38:06 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 20 07:38:06 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 20 07:38:06 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1553092686.91242 Mar 20 07:38:07 fir-md1-s2 kernel: Pid: 91513, comm: mdt00_056 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 20 07:38:07 fir-md1-s2 kernel: Call Trace: Mar 20 07:38:07 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 20 07:38:07 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 20 07:38:07 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Mar 20 07:38:07 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Mar 20 07:38:07 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Mar 20 07:38:07 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Mar 20 07:38:07 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Mar 20 07:38:07 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Mar 20 07:38:07 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Mar 20 07:38:07 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Mar 20 07:38:07 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Mar 20 07:38:07 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 20 07:38:07 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 20 07:38:07 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 20 07:38:07 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 20 07:38:07 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 20 07:38:07 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 20 07:38:07 fir-md1-s2 kernel: Pid: 91651, comm: mdt03_027 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 20 07:38:07 fir-md1-s2 kernel: Call Trace: Mar 20 07:38:07 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 20 07:38:07 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 20 07:38:07 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Mar 20 07:38:07 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Mar 20 07:38:07 fir-md1-s2 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Mar 20 07:38:07 fir-md1-s2 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Mar 20 07:38:07 fir-md1-s2 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Mar 20 07:38:07 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Mar 20 07:38:07 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Mar 20 07:38:07 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Mar 20 07:38:07 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Mar 20 07:38:07 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 20 07:38:07 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 20 07:38:07 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 20 07:38:07 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 20 07:38:07 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 20 07:38:07 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 20 07:38:07 fir-md1-s2 kernel: LNet: Service thread pid 99933 was inactive for 201.03s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Mar 20 07:38:07 fir-md1-s2 kernel: LNet: Skipped 2 previous similar messages Mar 20 07:38:16 fir-md1-s2 kernel: Lustre: 91215:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8eb506c2e300 x1628037138187472/t0(0) o101->f2f4d52c-2807-60dd-8532-99fa3e9aeefa@10.0.10.3@o2ib7:21/0 lens 576/3264 e 0 to 0 dl 1553092701 ref 2 fl Interpret:/0/0 rc 0/0 Mar 20 07:38:16 fir-md1-s2 kernel: Lustre: 91215:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 25 previous similar messages Mar 20 07:38:49 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client fa565bf9-4e3f-b131-86ef-18427c7a396c (at 10.9.101.24@o2ib4) reconnecting Mar 20 07:38:49 fir-md1-s2 kernel: Lustre: Skipped 38 previous similar messages Mar 20 07:38:49 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.101.24@o2ib4) Mar 20 07:38:49 fir-md1-s2 kernel: Lustre: Skipped 38 previous similar messages Mar 20 07:39:21 fir-md1-s2 kernel: LustreError: 91645:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1553092671, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0001_UUID lock: ffff8eb1d0bca880/0xefacb2c3bbaf27c8 lrc: 3/1,0 mode: --/PR res: [0x24000ed23:0xf4:0x0].0x0 bits 0x12/0x0 rrc: 36 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 91645 timeout: 0 lvb_type: 0 Mar 20 07:39:21 fir-md1-s2 kernel: LustreError: 91645:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 11 previous similar messages Mar 20 07:41:11 fir-md1-s2 kernel: LNet: Service thread pid 91645 was inactive for 200.29s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Mar 20 07:41:11 fir-md1-s2 kernel: LNet: Skipped 8 previous similar messages Mar 20 07:41:11 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1553092871.91645 Mar 20 07:44:04 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client f2f4d52c-2807-60dd-8532-99fa3e9aeefa (at 10.0.10.3@o2ib7) reconnecting Mar 20 07:44:04 fir-md1-s2 kernel: Lustre: Skipped 84 previous similar messages Mar 20 07:44:04 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to f2f4d52c-2807-60dd-8532-99fa3e9aeefa (at 10.0.10.3@o2ib7) Mar 20 07:44:04 fir-md1-s2 kernel: Lustre: Skipped 84 previous similar messages Mar 20 07:47:08 fir-md1-s2 kernel: Lustre: 91586:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8ec28eb3ef00 x1626187259707520/t0(0) o101->36e39237-5b34-0fcd-d3aa-6021119926c9@10.8.2.3@o2ib6:13/0 lens 480/568 e 1 to 0 dl 1553093233 ref 2 fl Interpret:/0/0 rc 0/0 Mar 20 07:47:22 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.8.3.3@o2ib6 ns: mdt-fir-MDT0001_UUID lock: ffff8eced62a4800/0xefacb2c3cf37b45e lrc: 3/0,0 mode: PW/PW res: [0x24000ed40:0xd:0x0].0x0 bits 0x40/0x0 rrc: 72 type: IBT flags: 0x60200400000020 nid: 10.8.3.3@o2ib6 remote: 0x1e9e366442a3626a expref: 53290 pid: 91362 timeout: 937785 lvb_type: 0 Mar 20 07:47:23 fir-md1-s2 kernel: LustreError: 90860:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8eb6b4a2b600 x1628272719347808/t0(0) o104->fir-MDT0001@10.8.3.3@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Mar 20 07:47:25 fir-md1-s2 kernel: LustreError: 91495:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8ed906ac8c00 x1628272719510560/t0(0) o104->fir-MDT0001@10.8.3.3@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Mar 20 07:47:28 fir-md1-s2 kernel: LustreError: 91591:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8ed906ac8900 x1628272719763824/t0(0) o104->fir-MDT0001@10.8.3.3@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Mar 20 07:48:23 fir-md1-s2 kernel: LustreError: 14634:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1553093213, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0001_UUID lock: ffff8eef6133a880/0xefacb2c3cf37bb6c lrc: 3/0,1 mode: --/PW res: [0x24000ed40:0xd:0x0].0x0 bits 0x40/0x0 rrc: 75 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 14634 timeout: 0 lvb_type: 0 Mar 20 07:48:23 fir-md1-s2 kernel: LustreError: 14634:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 17 previous similar messages Mar 20 07:49:57 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.8.3.21@o2ib6 ns: mdt-fir-MDT0001_UUID lock: ffff8eddd4fc2d00/0xefacb2c3cf37b6f0 lrc: 3/0,0 mode: PW/PW res: [0x24000ed40:0xd:0x0].0x0 bits 0x40/0x0 rrc: 75 type: IBT flags: 0x60200400000020 nid: 10.8.3.21@o2ib6 remote: 0x227934e37cdf7475 expref: 14 pid: 91340 timeout: 937940 lvb_type: 0 Mar 20 07:50:13 fir-md1-s2 kernel: LNet: Service thread pid 91590 was inactive for 200.46s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Mar 20 07:50:13 fir-md1-s2 kernel: LNet: Skipped 2 previous similar messages Mar 20 07:50:13 fir-md1-s2 kernel: Pid: 91590, comm: mdt00_069 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 20 07:50:13 fir-md1-s2 kernel: Call Trace: Mar 20 07:50:13 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 20 07:50:13 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 20 07:50:13 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Mar 20 07:50:13 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Mar 20 07:50:13 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Mar 20 07:50:13 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Mar 20 07:50:13 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Mar 20 07:50:13 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Mar 20 07:50:13 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Mar 20 07:50:13 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Mar 20 07:50:13 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Mar 20 07:50:13 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 20 07:50:13 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 20 07:50:13 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 20 07:50:13 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 20 07:50:13 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 20 07:50:13 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 20 07:50:13 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1553093413.91590 Mar 20 07:50:15 fir-md1-s2 kernel: LNet: Service thread pid 91250 was inactive for 201.97s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Mar 20 07:50:15 fir-md1-s2 kernel: Pid: 91250, comm: mdt01_023 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 20 07:50:15 fir-md1-s2 kernel: Call Trace: Mar 20 07:50:15 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 20 07:50:15 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 20 07:50:15 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Mar 20 07:50:15 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Mar 20 07:50:15 fir-md1-s2 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Mar 20 07:50:15 fir-md1-s2 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Mar 20 07:50:15 fir-md1-s2 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Mar 20 07:50:15 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Mar 20 07:50:15 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Mar 20 07:50:15 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Mar 20 07:50:15 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Mar 20 07:50:15 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 20 07:50:15 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 20 07:50:15 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 20 07:50:15 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 20 07:50:15 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 20 07:50:15 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 20 07:50:15 fir-md1-s2 kernel: Pid: 91213, comm: mdt00_006 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 20 07:50:15 fir-md1-s2 kernel: Call Trace: Mar 20 07:50:15 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 20 07:50:15 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 20 07:50:15 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Mar 20 07:50:15 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Mar 20 07:50:15 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Mar 20 07:50:15 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Mar 20 07:50:15 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Mar 20 07:50:15 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Mar 20 07:50:15 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Mar 20 07:50:15 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Mar 20 07:50:15 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Mar 20 07:50:15 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 20 07:50:15 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 20 07:50:15 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 20 07:50:15 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 20 07:50:15 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 20 07:50:15 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 20 07:50:15 fir-md1-s2 kernel: Pid: 14628, comm: mdt00_102 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 20 07:50:15 fir-md1-s2 kernel: Call Trace: Mar 20 07:50:15 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 20 07:50:15 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 20 07:50:15 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Mar 20 07:50:15 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Mar 20 07:50:15 fir-md1-s2 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Mar 20 07:50:15 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Mar 20 07:50:15 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Mar 20 07:50:15 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Mar 20 07:50:15 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Mar 20 07:50:15 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 20 07:50:15 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 20 07:50:15 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 20 07:50:15 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 20 07:50:15 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 20 07:50:15 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 20 07:50:15 fir-md1-s2 kernel: Pid: 91452, comm: mdt01_063 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 20 07:50:15 fir-md1-s2 kernel: Call Trace: Mar 20 07:50:15 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 20 07:50:15 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 20 07:50:15 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Mar 20 07:50:15 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Mar 20 07:50:15 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Mar 20 07:50:15 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Mar 20 07:50:15 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Mar 20 07:50:15 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Mar 20 07:50:15 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Mar 20 07:50:15 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Mar 20 07:50:15 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Mar 20 07:50:15 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 20 07:50:15 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 20 07:50:15 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 20 07:50:15 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 20 07:50:15 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 20 07:50:15 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 20 07:50:15 fir-md1-s2 kernel: LNet: Service thread pid 91536 was inactive for 202.47s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Mar 20 07:50:47 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1553093447.91212 Mar 20 07:50:47 fir-md1-s2 kernel: LNet: Service thread pid 91245 was inactive for 200.45s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Mar 20 07:50:47 fir-md1-s2 kernel: LNet: Skipped 6 previous similar messages Mar 20 07:51:27 fir-md1-s2 kernel: LustreError: 91065:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1553093397, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0001_UUID lock: ffff8ec79d26f740/0xefacb2c3d57108b1 lrc: 3/0,1 mode: --/PW res: [0x24000ed40:0xd:0x0].0x0 bits 0x40/0x0 rrc: 66 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 91065 timeout: 0 lvb_type: 0 Mar 20 07:51:27 fir-md1-s2 kernel: LustreError: 91065:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 12 previous similar messages Mar 20 07:51:35 fir-md1-s2 kernel: LustreError: 91864:0:(ldlm_lib.c:3258:target_bulk_io()) @@@ Reconnect on bulk READ req@ffff8ede6ae9cb00 x1627407431756656/t0(0) o37->a3b7d610-c065-c97e-bbec-d3030e9c4b9d@10.8.3.33@o2ib6:11/0 lens 448/440 e 0 to 0 dl 1553093501 ref 1 fl Interpret:/0/0 rc 0/0 Mar 20 07:52:13 fir-md1-s2 kernel: Lustre: 91547:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1553093526/real 1553093526] req@ffff8ec32bf37500 x1628272740190800/t0(0) o104->fir-MDT0003@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1553093533 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Mar 20 07:52:13 fir-md1-s2 kernel: Lustre: 91547:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 8 previous similar messages Mar 20 07:52:27 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.8.3.30@o2ib6 ns: mdt-fir-MDT0001_UUID lock: ffff8ecc62217740/0xefacb2c3cf37bc7d lrc: 3/0,0 mode: PW/PW res: [0x24000ed40:0xd:0x0].0x0 bits 0x40/0x0 rrc: 66 type: IBT flags: 0x60200400000020 nid: 10.8.3.30@o2ib6 remote: 0x4fa3d464e4ae867a expref: 39 pid: 91065 timeout: 938090 lvb_type: 0 Mar 20 07:52:27 fir-md1-s2 kernel: LNet: Service thread pid 90854 completed after 334.18s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Mar 20 07:52:27 fir-md1-s2 kernel: LustreError: 91536:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8ebf682b0000 ns: mdt-fir-MDT0001_UUID lock: ffff8ece6868f080/0xefacb2c3cf37c3a7 lrc: 3/0,0 mode: PR/PR res: [0x24000ed40:0xd:0x0].0x0 bits 0x20/0x0 rrc: 58 type: IBT flags: 0x50200000000000 nid: 10.8.3.3@o2ib6 remote: 0x1e9e366442a36271 expref: 2 pid: 91536 timeout: 0 lvb_type: 0 Mar 20 07:52:27 fir-md1-s2 kernel: LustreError: 91536:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 1 previous similar message Mar 20 07:52:27 fir-md1-s2 kernel: Lustre: 91536:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (41:293s); client may timeout. req@ffff8ec32be89500 x1627864642714128/t0(0) o101->3b34a789-11fd-9f70-20e4-3cdcec8a4a47@10.8.3.3@o2ib6:13/0 lens 568/1688 e 1 to 0 dl 1553093254 ref 1 fl Complete:/0/0 rc -107/-107 Mar 20 07:52:27 fir-md1-s2 kernel: Lustre: 91547:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1553093540/real 1553093540] req@ffff8ec32bf37500 x1628272740190800/t0(0) o104->fir-MDT0003@10.8.27.23@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1553093547 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Mar 20 07:52:27 fir-md1-s2 kernel: Lustre: 91547:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Mar 20 07:52:27 fir-md1-s2 kernel: LNet: Skipped 30 previous similar messages Mar 20 07:52:31 fir-md1-s2 kernel: Lustre: 91452:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8ec425ecb000 x1626134161479472/t0(0) o101->87fbd0c4-680c-902e-9d81-1141e47ac8d5@10.8.7.22@o2ib6:6/0 lens 1776/3288 e 0 to 0 dl 1553093556 ref 2 fl Interpret:/0/0 rc 0/0 Mar 20 07:52:31 fir-md1-s2 kernel: Lustre: 91452:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 44 previous similar messages Mar 20 07:52:41 fir-md1-s2 kernel: LustreError: 91547:0:(ldlm_lockd.c:682:ldlm_handle_ast_error()) ### client (nid 10.8.27.23@o2ib6) failed to reply to blocking AST (req@ffff8ec32bf37500 x1628272740190800 status 0 rc -110), evict it ns: mdt-fir-MDT0003_UUID lock: ffff8ec017628fc0/0xefacb2c3d9015735 lrc: 4/0,0 mode: PR/PR res: [0x2800065bf:0x12965:0x0].0x0 bits 0x13/0x0 rrc: 5 type: IBT flags: 0x60200400000020 nid: 10.8.27.23@o2ib6 remote: 0x110ccd84794a60c0 expref: 305 pid: 91560 timeout: 938127 lvb_type: 0 Mar 20 07:52:41 fir-md1-s2 kernel: LustreError: 138-a: fir-MDT0003: A client on nid 10.8.27.23@o2ib6 was evicted due to a lock blocking callback time out: rc -110 Mar 20 07:52:57 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.8.2.8@o2ib6 ns: mdt-fir-MDT0001_UUID lock: ffff8ed00f879b00/0xefacb2c3daa83150 lrc: 3/0,0 mode: PW/PW res: [0x24000ed40:0xd:0x0].0x0 bits 0x40/0x0 rrc: 26 type: IBT flags: 0x60200400000020 nid: 10.8.2.8@o2ib6 remote: 0xcb9a552e5acfdaa1 expref: 31604 pid: 91609 timeout: 938120 lvb_type: 0 Mar 20 07:52:57 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message Mar 20 07:52:59 fir-md1-s2 kernel: LustreError: 91462:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8eefcb45b800 ns: mdt-fir-MDT0001_UUID lock: ffff8ee033f9af40/0xefacb2c3daa83a09 lrc: 3/0,0 mode: PW/PW res: [0x24000ed40:0xd:0x0].0x0 bits 0x40/0x0 rrc: 24 type: IBT flags: 0x50200400000020 nid: 10.8.2.8@o2ib6 remote: 0xcb9a552e5acfdabd expref: 21170 pid: 91462 timeout: 0 lvb_type: 0 Mar 20 07:52:59 fir-md1-s2 kernel: LustreError: 91462:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 4 previous similar messages Mar 20 07:52:59 fir-md1-s2 kernel: Lustre: 91462:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:2s); client may timeout. req@ffff8ed336f31e00 x1626314221125520/t0(0) o101->d49a9195-3b1c-45e3-f960-3201c2aa7def@10.8.2.8@o2ib6:27/0 lens 480/536 e 0 to 0 dl 1553093577 ref 1 fl Complete:/0/0 rc -107/-107 Mar 20 07:52:59 fir-md1-s2 kernel: Lustre: 91462:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 2 previous similar messages Mar 20 07:53:01 fir-md1-s2 kernel: LustreError: 91347:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8ec615f26000 x1628272744369696/t0(0) o104->fir-MDT0001@10.8.2.8@o2ib6:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Mar 20 07:53:01 fir-md1-s2 kernel: LustreError: 91347:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 1 previous similar message Mar 20 07:54:13 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client fa565bf9-4e3f-b131-86ef-18427c7a396c (at 10.9.101.24@o2ib4) reconnecting Mar 20 07:54:13 fir-md1-s2 kernel: Lustre: Skipped 352 previous similar messages Mar 20 07:54:13 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.101.24@o2ib4) Mar 20 07:54:13 fir-md1-s2 kernel: Lustre: Skipped 356 previous similar messages Mar 20 07:55:28 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 58fe5ad1-8676-9b8c-0acf-d3a6f6cbb3c3 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8ec17e665000, cur 1553093728 expire 1553093578 last 1553093501 Mar 20 07:55:28 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 20 08:04:13 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 1831b60b-c0b3-6e16-f786-e9804146d690 (at 10.9.101.9@o2ib4) reconnecting Mar 20 08:04:13 fir-md1-s2 kernel: Lustre: Skipped 166 previous similar messages Mar 20 08:04:13 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.101.9@o2ib4) Mar 20 08:04:13 fir-md1-s2 kernel: Lustre: Skipped 168 previous similar messages Mar 20 08:12:55 fir-md1-s2 kernel: Lustre: 91353:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8eef92bcaa00 x1627854646333664/t0(0) o101->5535dd63-6e9b-7280-e906-63ff0acc4285@10.9.101.37@o2ib4:0/0 lens 576/3264 e 1 to 0 dl 1553094780 ref 2 fl Interpret:/0/0 rc 0/0 Mar 20 08:12:55 fir-md1-s2 kernel: Lustre: 91353:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 6 previous similar messages Mar 20 08:13:10 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.9.101.16@o2ib4 ns: mdt-fir-MDT0001_UUID lock: ffff8ec1aa27a1c0/0xefacb2c3fdbd22af lrc: 3/0,0 mode: PW/PW res: [0x24000ed67:0x18ec:0x0].0x0 bits 0x40/0x0 rrc: 58 type: IBT flags: 0x60200400000020 nid: 10.9.101.16@o2ib4 remote: 0x3c8988643b9f05fb expref: 502607 pid: 91452 timeout: 939333 lvb_type: 0 Mar 20 08:13:16 fir-md1-s2 kernel: LustreError: 99932:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8eec5fee6f00 x1628272827290288/t0(0) o104->fir-MDT0001@10.9.101.16@o2ib4:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Mar 20 08:13:46 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.9.101.16@o2ib4 ns: mdt-fir-MDT0001_UUID lock: ffff8ec3e56969c0/0xefacb2c31a684dda lrc: 3/0,0 mode: PW/PW res: [0x24000ed24:0x24:0x0].0x0 bits 0x40/0x0 rrc: 8 type: IBT flags: 0x60200400000020 nid: 10.9.101.16@o2ib4 remote: 0x3c8988643a9d47d8 expref: 315985 pid: 91578 timeout: 939369 lvb_type: 0 Mar 20 08:14:10 fir-md1-s2 kernel: LustreError: 91060:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1553094760, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0001_UUID lock: ffff8ebcaf068240/0xefacb2c3fdbd2357 lrc: 3/0,1 mode: --/PW res: [0x24000ed67:0x18ec:0x0].0x0 bits 0x40/0x0 rrc: 58 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 91060 timeout: 0 lvb_type: 0 Mar 20 08:14:10 fir-md1-s2 kernel: LustreError: 91060:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 20 previous similar messages Mar 20 08:14:18 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 7f326dd9-6b3c-45f1-d1dc-35b8a959cac2 (at 10.9.101.16@o2ib4) reconnecting Mar 20 08:14:18 fir-md1-s2 kernel: Lustre: Skipped 213 previous similar messages Mar 20 08:14:18 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.101.16@o2ib4) Mar 20 08:14:18 fir-md1-s2 kernel: Lustre: Skipped 214 previous similar messages Mar 20 08:14:46 fir-md1-s2 kernel: LustreError: 99932:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1553094796, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0001_UUID lock: ffff8eea1d205580/0xefacb2c3fe90e691 lrc: 3/0,1 mode: --/PW res: [0x24000ed24:0x24:0x0].0x0 bits 0x40/0x0 rrc: 8 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 99932 timeout: 0 lvb_type: 0 Mar 20 08:15:34 fir-md1-s2 kernel: Lustre: 91064:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (41:133s); client may timeout. req@ffff8ed19f2e4200 x1626131943566928/t112631090549(0) o36->7f326dd9-6b3c-45f1-d1dc-35b8a959cac2@10.9.101.16@o2ib4:0/0 lens 488/424 e 1 to 0 dl 1553094801 ref 1 fl Complete:/0/0 rc 0/0 Mar 20 08:15:59 fir-md1-s2 kernel: Lustre: 91447:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8edfd3fd9800 x1627854147084896/t0(0) o101->59331f1c-da80-18ea-511e-2185ff6b2811@10.9.101.38@o2ib4:4/0 lens 568/0 e 0 to 0 dl 1553094964 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Mar 20 08:15:59 fir-md1-s2 kernel: Lustre: 91447:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 25 previous similar messages Mar 20 08:16:00 fir-md1-s2 kernel: LNet: Service thread pid 91401 was inactive for 200.22s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Mar 20 08:16:00 fir-md1-s2 kernel: LNet: Skipped 3 previous similar messages Mar 20 08:16:00 fir-md1-s2 kernel: Pid: 91401, comm: mdt01_052 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 20 08:16:00 fir-md1-s2 kernel: Call Trace: Mar 20 08:16:00 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 20 08:16:00 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 20 08:16:00 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Mar 20 08:16:00 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Mar 20 08:16:00 fir-md1-s2 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Mar 20 08:16:00 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Mar 20 08:16:01 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Mar 20 08:16:01 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Mar 20 08:16:01 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Mar 20 08:16:01 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 20 08:16:01 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 20 08:16:01 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 20 08:16:01 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 20 08:16:01 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 20 08:16:01 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 20 08:16:01 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1553094961.91401 Mar 20 08:16:02 fir-md1-s2 kernel: LNet: Service thread pid 91599 was inactive for 201.73s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Mar 20 08:16:02 fir-md1-s2 kernel: Pid: 91599, comm: mdt00_070 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 20 08:16:02 fir-md1-s2 kernel: Call Trace: Mar 20 08:16:02 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 20 08:16:02 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 20 08:16:02 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Mar 20 08:16:02 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Mar 20 08:16:02 fir-md1-s2 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Mar 20 08:16:02 fir-md1-s2 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Mar 20 08:16:02 fir-md1-s2 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Mar 20 08:16:02 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Mar 20 08:16:02 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Mar 20 08:16:02 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Mar 20 08:16:02 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Mar 20 08:16:02 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 20 08:16:02 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 20 08:16:02 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 20 08:16:02 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 20 08:16:02 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 20 08:16:02 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 20 08:16:02 fir-md1-s2 kernel: Pid: 91472, comm: mdt02_050 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 20 08:16:02 fir-md1-s2 kernel: Call Trace: Mar 20 08:16:02 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 20 08:16:02 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 20 08:16:02 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Mar 20 08:16:02 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Mar 20 08:16:02 fir-md1-s2 kernel: [] mdt_reint_object_lock+0x2c/0x60 [mdt] Mar 20 08:16:02 fir-md1-s2 kernel: [] mdt_reint_striped_lock+0x8c/0x510 [mdt] Mar 20 08:16:02 fir-md1-s2 kernel: [] mdt_reint_setattr+0x6c8/0x12d0 [mdt] Mar 20 08:16:02 fir-md1-s2 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Mar 20 08:16:02 fir-md1-s2 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Mar 20 08:16:02 fir-md1-s2 kernel: [] mdt_reint+0x67/0x140 [mdt] Mar 20 08:16:02 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 20 08:16:02 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 20 08:16:02 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 20 08:16:02 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 20 08:16:02 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 20 08:16:02 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 20 08:16:02 fir-md1-s2 kernel: Pid: 91246, comm: mdt01_022 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 20 08:16:02 fir-md1-s2 kernel: Call Trace: Mar 20 08:16:02 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 20 08:16:02 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 20 08:16:02 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Mar 20 08:16:02 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Mar 20 08:16:02 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Mar 20 08:16:02 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Mar 20 08:16:02 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Mar 20 08:16:02 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Mar 20 08:16:02 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Mar 20 08:16:02 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Mar 20 08:16:02 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Mar 20 08:16:02 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 20 08:16:02 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 20 08:16:02 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 20 08:16:02 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 20 08:16:02 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 20 08:16:02 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 20 08:16:02 fir-md1-s2 kernel: Pid: 99936, comm: mdt03_033 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 20 08:16:02 fir-md1-s2 kernel: Call Trace: Mar 20 08:16:02 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 20 08:16:02 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 20 08:16:02 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Mar 20 08:16:02 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Mar 20 08:16:02 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Mar 20 08:16:02 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Mar 20 08:16:02 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Mar 20 08:16:02 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Mar 20 08:16:02 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Mar 20 08:16:02 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Mar 20 08:16:02 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Mar 20 08:16:02 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 20 08:16:02 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 20 08:16:02 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 20 08:16:02 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 20 08:16:02 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 20 08:16:02 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 20 08:16:02 fir-md1-s2 kernel: LNet: Service thread pid 91240 was inactive for 202.25s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Mar 20 08:16:02 fir-md1-s2 kernel: LNet: Skipped 12 previous similar messages Mar 20 08:16:37 fir-md1-s2 kernel: LNet: Service thread pid 99932 was inactive for 200.50s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Mar 20 08:16:37 fir-md1-s2 kernel: LNet: Skipped 10 previous similar messages Mar 20 08:16:37 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1553094997.99932 Mar 20 08:17:04 fir-md1-s2 kernel: LustreError: 91578:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1553094934, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0001_UUID lock: ffff8eefb6df8240/0xefacb2c401d0df1c lrc: 3/1,0 mode: --/PR res: [0x24000ed67:0x18ec:0x0].0x0 bits 0x20/0x0 rrc: 58 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 91578 timeout: 0 lvb_type: 0 Mar 20 08:17:04 fir-md1-s2 kernel: LustreError: 91578:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 2 previous similar messages Mar 20 08:17:16 fir-md1-s2 kernel: LNet: Service thread pid 99932 completed after 239.66s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Mar 20 08:18:04 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.9.108.45@o2ib4 ns: mdt-fir-MDT0001_UUID lock: ffff8ed2f1bdcec0/0xefacb2c3fdbd2381 lrc: 3/0,0 mode: PW/PW res: [0x24000ed67:0x18ec:0x0].0x0 bits 0x40/0x0 rrc: 58 type: IBT flags: 0x60200400000020 nid: 10.9.108.45@o2ib4 remote: 0x382e5ec80e8b1ec4 expref: 56 pid: 91103 timeout: 939627 lvb_type: 0 Mar 20 08:18:04 fir-md1-s2 kernel: LNet: Service thread pid 91388 completed after 323.68s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Mar 20 08:18:04 fir-md1-s2 kernel: LNet: Skipped 3 previous similar messages Mar 20 08:18:20 fir-md1-s2 kernel: Lustre: 91583:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-146), not sending early reply req@ffff8eec5fee1800 x1626276328227984/t0(0) o101->b1eb7517-52d1-d05c-8fdc-dce5ebc8d797@10.9.108.49@o2ib4:25/0 lens 568/0 e 0 to 0 dl 1553095105 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Mar 20 08:18:34 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 30s: evicting client at 10.9.101.46@o2ib4 ns: mdt-fir-MDT0001_UUID lock: ffff8ebbf98486c0/0xefacb2c3fdbd2484 lrc: 3/0,0 mode: PW/PW res: [0x24000ed67:0x18ec:0x0].0x0 bits 0x40/0x0 rrc: 57 type: IBT flags: 0x60200400000020 nid: 10.9.101.46@o2ib4 remote: 0x2c5956b488dd310b expref: 34 pid: 91388 timeout: 939657 lvb_type: 0 Mar 20 08:18:34 fir-md1-s2 kernel: LNet: Service thread pid 91599 completed after 353.68s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Mar 20 08:18:34 fir-md1-s2 kernel: LustreError: 91534:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8eb7c5cbe000 ns: mdt-fir-MDT0001_UUID lock: ffff8eeb33e4f2c0/0xefacb2c3fdbd27b0 lrc: 3/0,0 mode: PR/PR res: [0x24000ed67:0x18ec:0x0].0x0 bits 0x20/0x0 rrc: 51 type: IBT flags: 0x50200000000000 nid: 10.9.101.16@o2ib4 remote: 0x3c8988643b9f0609 expref: 2 pid: 91534 timeout: 0 lvb_type: 0 Mar 20 08:18:34 fir-md1-s2 kernel: LustreError: 91534:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 1 previous similar message Mar 20 08:18:34 fir-md1-s2 kernel: Lustre: 91534:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (41:313s); client may timeout. req@ffff8eefe5eabf00 x1626131943566960/t0(0) o101->7f326dd9-6b3c-45f1-d1dc-35b8a959cac2@10.9.101.16@o2ib4:0/0 lens 568/1688 e 1 to 0 dl 1553094801 ref 1 fl Complete:/0/0 rc -107/-107 Mar 20 08:18:34 fir-md1-s2 kernel: LNet: Skipped 11 previous similar messages Mar 20 08:20:04 fir-md1-s2 kernel: LustreError: 90855:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1553095114, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0001_UUID lock: ffff8ecdd9af8fc0/0xefacb2c40616976d lrc: 3/0,1 mode: --/PW res: [0x24000ed67:0x18ec:0x0].0x0 bits 0x40/0x0 rrc: 49 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 90855 timeout: 0 lvb_type: 0 Mar 20 08:20:04 fir-md1-s2 kernel: LustreError: 90855:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 14 previous similar messages Mar 20 08:21:04 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.9.108.49@o2ib4 ns: mdt-fir-MDT0001_UUID lock: ffff8eb807676c00/0xefacb2c406169743 lrc: 3/0,0 mode: PW/PW res: [0x24000ed67:0x18ec:0x0].0x0 bits 0x40/0x0 rrc: 49 type: IBT flags: 0x60200400000020 nid: 10.9.108.49@o2ib4 remote: 0x4e240f9fd80003cf expref: 40 pid: 14632 timeout: 939807 lvb_type: 0 Mar 20 08:21:04 fir-md1-s2 kernel: LustreError: 90854:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8ec17fb82c00 ns: mdt-fir-MDT0001_UUID lock: ffff8eb25b21f080/0xefacb2c40616980e lrc: 3/0,0 mode: PR/PR res: [0x24000ed67:0x18ec:0x0].0x0 bits 0x1b/0x0 rrc: 45 type: IBT flags: 0x50200400000020 nid: 10.9.108.49@o2ib4 remote: 0x4e240f9fd80003d6 expref: 22 pid: 90854 timeout: 0 lvb_type: 0 Mar 20 08:21:04 fir-md1-s2 kernel: LustreError: 90854:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 3 previous similar messages Mar 20 08:21:34 fir-md1-s2 kernel: LustreError: 91253:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8ec17fb82c00 ns: mdt-fir-MDT0001_UUID lock: ffff8edeefac1200/0xefacb2c406169cd7 lrc: 3/0,0 mode: PW/PW res: [0x24000ed67:0x18ec:0x0].0x0 bits 0x40/0x0 rrc: 41 type: IBT flags: 0x50200400000020 nid: 10.9.108.49@o2ib4 remote: 0x4e240f9fd80003dd expref: 7 pid: 91253 timeout: 0 lvb_type: 0 Mar 20 08:21:34 fir-md1-s2 kernel: Lustre: 91060:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (154:26s); client may timeout. req@ffff8eef92be9200 x1626276328251056/t0(0) o55->b1eb7517-52d1-d05c-8fdc-dce5ebc8d797@10.9.108.49@o2ib4:4/0 lens 472/192 e 0 to 0 dl 1553095268 ref 1 fl Complete:/0/0 rc -22/-22 Mar 20 08:21:34 fir-md1-s2 kernel: Lustre: 91060:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 4 previous similar messages Mar 20 08:21:34 fir-md1-s2 kernel: LustreError: 90855:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8ebb782a3300 x1628272855753584/t0(0) o104->fir-MDT0001@10.9.101.31@o2ib4:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Mar 20 08:24:04 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.9.108.39@o2ib4 ns: mdt-fir-MDT0001_UUID lock: ffff8ecfb7152640/0xefacb2c40a2109a1 lrc: 3/0,0 mode: PW/PW res: [0x24000ed67:0x18ec:0x0].0x0 bits 0x40/0x0 rrc: 40 type: IBT flags: 0x60200400000020 nid: 10.9.108.39@o2ib4 remote: 0x8eba35301a8b70b3 expref: 39 pid: 91468 timeout: 939987 lvb_type: 0 Mar 20 08:24:04 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message Mar 20 08:24:20 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client fa565bf9-4e3f-b131-86ef-18427c7a396c (at 10.9.101.24@o2ib4) reconnecting Mar 20 08:24:20 fir-md1-s2 kernel: Lustre: Skipped 396 previous similar messages Mar 20 08:24:20 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.101.24@o2ib4) Mar 20 08:24:20 fir-md1-s2 kernel: Lustre: Skipped 401 previous similar messages Mar 20 08:24:29 fir-md1-s2 kernel: Lustre: 91532:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8edc1be12100 x1626128051139344/t0(0) o101->c7a2bbac-bb32-5981-1af8-ec75ce917a08@10.9.101.32@o2ib4:4/0 lens 568/0 e 0 to 0 dl 1553095474 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Mar 20 08:24:29 fir-md1-s2 kernel: Lustre: 91532:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 31 previous similar messages Mar 20 08:24:54 fir-md1-s2 kernel: LNet: Service thread pid 91477 was inactive for 200.36s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Mar 20 08:24:54 fir-md1-s2 kernel: LNet: Skipped 3 previous similar messages Mar 20 08:24:54 fir-md1-s2 kernel: Pid: 91477, comm: mdt01_070 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 20 08:24:54 fir-md1-s2 kernel: Call Trace: Mar 20 08:24:54 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 20 08:24:54 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 20 08:24:54 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Mar 20 08:24:54 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Mar 20 08:24:55 fir-md1-s2 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Mar 20 08:24:55 fir-md1-s2 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Mar 20 08:24:55 fir-md1-s2 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Mar 20 08:24:55 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Mar 20 08:24:55 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Mar 20 08:24:55 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Mar 20 08:24:55 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Mar 20 08:24:55 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 20 08:24:55 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 20 08:24:55 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 20 08:24:55 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 20 08:24:55 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 20 08:24:55 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 20 08:24:55 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1553095495.91477 Mar 20 08:24:56 fir-md1-s2 kernel: Pid: 14632, comm: mdt02_108 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 20 08:24:56 fir-md1-s2 kernel: Call Trace: Mar 20 08:24:56 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 20 08:24:56 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 20 08:24:56 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Mar 20 08:24:56 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Mar 20 08:24:56 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Mar 20 08:24:56 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Mar 20 08:24:56 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Mar 20 08:24:56 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Mar 20 08:24:56 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Mar 20 08:24:56 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Mar 20 08:24:56 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Mar 20 08:24:56 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 20 08:24:56 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 20 08:24:56 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 20 08:24:56 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 20 08:24:56 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 20 08:24:56 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 20 08:24:56 fir-md1-s2 kernel: Pid: 91254, comm: mdt01_026 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 20 08:24:56 fir-md1-s2 kernel: Call Trace: Mar 20 08:24:56 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 20 08:24:56 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 20 08:24:56 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Mar 20 08:24:56 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Mar 20 08:24:56 fir-md1-s2 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Mar 20 08:24:56 fir-md1-s2 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Mar 20 08:24:56 fir-md1-s2 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Mar 20 08:24:56 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Mar 20 08:24:56 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Mar 20 08:24:56 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Mar 20 08:24:56 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Mar 20 08:24:56 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 20 08:24:56 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 20 08:24:56 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 20 08:24:56 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 20 08:24:56 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 20 08:24:56 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 20 08:24:56 fir-md1-s2 kernel: Pid: 14615, comm: mdt00_096 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 20 08:24:56 fir-md1-s2 kernel: Call Trace: Mar 20 08:24:56 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 20 08:24:56 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 20 08:24:56 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Mar 20 08:24:56 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Mar 20 08:24:56 fir-md1-s2 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Mar 20 08:24:56 fir-md1-s2 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Mar 20 08:24:56 fir-md1-s2 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Mar 20 08:24:56 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Mar 20 08:24:56 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Mar 20 08:24:56 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Mar 20 08:24:56 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Mar 20 08:24:56 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 20 08:24:56 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 20 08:24:56 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 20 08:24:56 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 20 08:24:56 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 20 08:24:56 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 20 08:24:56 fir-md1-s2 kernel: Pid: 91240, comm: mdt03_007 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 20 08:24:56 fir-md1-s2 kernel: Call Trace: Mar 20 08:24:56 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 20 08:24:56 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 20 08:24:56 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Mar 20 08:24:56 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Mar 20 08:24:56 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Mar 20 08:24:56 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Mar 20 08:24:56 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Mar 20 08:24:56 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Mar 20 08:24:56 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Mar 20 08:24:56 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Mar 20 08:24:56 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Mar 20 08:24:56 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 20 08:24:56 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 20 08:24:56 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 20 08:24:56 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 20 08:24:56 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 20 08:24:56 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 20 08:24:56 fir-md1-s2 kernel: LNet: Service thread pid 91512 was inactive for 202.34s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Mar 20 08:25:34 fir-md1-s2 kernel: LustreError: 99934:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1553095444, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0001_UUID lock: ffff8ef046afaac0/0xefacb2c40d6f79aa lrc: 3/1,0 mode: --/PR res: [0x24000ed67:0x18ec:0x0].0x0 bits 0x20/0x0 rrc: 38 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 99934 timeout: 0 lvb_type: 0 Mar 20 08:25:34 fir-md1-s2 kernel: LustreError: 99934:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 11 previous similar messages Mar 20 08:26:34 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.9.101.32@o2ib4 ns: mdt-fir-MDT0001_UUID lock: ffff8eea5eeb1f80/0xefacb2c40a2109bd lrc: 3/0,0 mode: PW/PW res: [0x24000ed67:0x18ec:0x0].0x0 bits 0x40/0x0 rrc: 38 type: IBT flags: 0x60200400000020 nid: 10.9.101.32@o2ib4 remote: 0xb355be0d89414518 expref: 940 pid: 99934 timeout: 940137 lvb_type: 0 Mar 20 08:26:34 fir-md1-s2 kernel: LNet: Service thread pid 14615 completed after 299.80s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Mar 20 08:27:05 fir-md1-s2 kernel: LustreError: 91263:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8edc62b80c00 x1628272875407488/t0(0) o104->fir-MDT0001@10.9.101.37@o2ib4:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Mar 20 08:27:05 fir-md1-s2 kernel: LNet: Service thread pid 90854 completed after 331.32s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Mar 20 08:27:20 fir-md1-s2 kernel: LustreError: 91533:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8ec3bda53000 x1628272876233712/t0(0) o104->fir-MDT0001@10.9.101.37@o2ib4:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Mar 20 08:27:20 fir-md1-s2 kernel: LustreError: 91533:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 2 previous similar messages Mar 20 08:27:24 fir-md1-s2 kernel: LNet: Service thread pid 99934 was inactive for 200.56s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Mar 20 08:27:24 fir-md1-s2 kernel: LNet: Skipped 4 previous similar messages Mar 20 08:27:24 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1553095644.99934 Mar 20 08:27:35 fir-md1-s2 kernel: LNet: Service thread pid 91477 completed after 360.76s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Mar 20 08:28:05 fir-md1-s2 kernel: LNet: Service thread pid 91512 completed after 390.76s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Mar 20 08:28:05 fir-md1-s2 kernel: LNet: Skipped 1 previous similar message Mar 20 08:29:54 fir-md1-s2 kernel: LNet: Service thread pid 90864 was inactive for 200.07s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Mar 20 08:29:54 fir-md1-s2 kernel: LNet: Skipped 4 previous similar messages Mar 20 08:29:54 fir-md1-s2 kernel: Pid: 90864, comm: mdt03_001 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 20 08:29:54 fir-md1-s2 kernel: Call Trace: Mar 20 08:29:54 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 20 08:29:54 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 20 08:29:54 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Mar 20 08:29:54 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Mar 20 08:29:54 fir-md1-s2 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Mar 20 08:29:54 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Mar 20 08:29:54 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Mar 20 08:29:54 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Mar 20 08:29:54 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Mar 20 08:29:54 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 20 08:29:54 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 20 08:29:54 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 20 08:29:54 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 20 08:29:54 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 20 08:29:54 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 20 08:29:54 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1553095794.90864 Mar 20 08:30:26 fir-md1-s2 kernel: LNet: Service thread pid 91535 was inactive for 200.30s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Mar 20 08:30:26 fir-md1-s2 kernel: Pid: 91535, comm: mdt01_087 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 20 08:30:26 fir-md1-s2 kernel: Call Trace: Mar 20 08:30:26 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 20 08:30:26 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 20 08:30:26 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Mar 20 08:30:26 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Mar 20 08:30:26 fir-md1-s2 kernel: [] mdt_intent_getxattr+0xb5/0x270 [mdt] Mar 20 08:30:26 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Mar 20 08:30:26 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Mar 20 08:30:26 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Mar 20 08:30:26 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Mar 20 08:30:26 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 20 08:30:26 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 20 08:30:26 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 20 08:30:26 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 20 08:30:26 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 20 08:30:26 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 20 08:30:26 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1553095826.91535 Mar 20 08:30:35 fir-md1-s2 kernel: LNet: Service thread pid 99938 completed after 540.75s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Mar 20 08:30:35 fir-md1-s2 kernel: LustreError: 91263:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8eec7778a400 ns: mdt-fir-MDT0001_UUID lock: ffff8ed211e70240/0xefacb2c41292095f lrc: 3/0,0 mode: PR/PR res: [0x24000ed67:0x18ec:0x0].0x0 bits 0x20/0x0 rrc: 24 type: IBT flags: 0x50200000000000 nid: 10.9.101.38@o2ib4 remote: 0x4fba3b47dc3a8010 expref: 24 pid: 91263 timeout: 0 lvb_type: 0 Mar 20 08:30:35 fir-md1-s2 kernel: LustreError: 91263:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 3 previous similar messages Mar 20 08:30:35 fir-md1-s2 kernel: Lustre: 91535:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (30:180s); client may timeout. req@ffff8ec889fb5400 x1626698362937920/t0(0) o101->21c1d1e0-a07e-9376-435e-d7152822fd0e@10.9.101.20@o2ib4:5/0 lens 568/1688 e 0 to 0 dl 1553095655 ref 1 fl Complete:/0/0 rc -107/-107 Mar 20 08:30:35 fir-md1-s2 kernel: Lustre: 91535:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 2 previous similar messages Mar 20 08:30:35 fir-md1-s2 kernel: LNet: Skipped 7 previous similar messages Mar 20 08:33:05 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.9.101.46@o2ib4 ns: mdt-fir-MDT0001_UUID lock: ffff8ec630f44ec0/0xefacb2c415bb5a7f lrc: 3/0,0 mode: PW/PW res: [0x24000ed67:0x18ec:0x0].0x0 bits 0x40/0x0 rrc: 22 type: IBT flags: 0x60200400000020 nid: 10.9.101.46@o2ib4 remote: 0x2c5956b488dd43a3 expref: 51 pid: 99934 timeout: 940528 lvb_type: 0 Mar 20 08:33:05 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 4 previous similar messages Mar 20 08:33:30 fir-md1-s2 kernel: Lustre: 99938:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8eec30794e00 x1626175102503920/t0(0) o101->ae2661a8-bdb6-51b2-7483-d53084bfe4c1@10.9.101.27@o2ib4:5/0 lens 568/0 e 0 to 0 dl 1553096015 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Mar 20 08:33:30 fir-md1-s2 kernel: Lustre: 99938:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 12 previous similar messages Mar 20 08:33:35 fir-md1-s2 kernel: LustreError: 91546:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8eef927f8000 ns: mdt-fir-MDT0001_UUID lock: ffff8ecc553921c0/0xefacb2c415bb602f lrc: 3/0,0 mode: PR/PR res: [0x24000ed67:0x18ec:0x0].0x0 bits 0x20/0x0 rrc: 17 type: IBT flags: 0x50200000000000 nid: 10.9.101.46@o2ib4 remote: 0x2c5956b488dd43aa expref: 2 pid: 91546 timeout: 0 lvb_type: 0 Mar 20 08:33:35 fir-md1-s2 kernel: LustreError: 91546:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 5 previous similar messages Mar 20 08:33:35 fir-md1-s2 kernel: Lustre: 91546:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (154:26s); client may timeout. req@ffff8ec428615a00 x1626212292763936/t0(0) o101->eb92f976-b859-22ab-5714-898607c188de@10.9.101.46@o2ib4:5/0 lens 568/1688 e 0 to 0 dl 1553095989 ref 1 fl Complete:/0/0 rc -107/-107 Mar 20 08:33:35 fir-md1-s2 kernel: Lustre: 91546:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 4 previous similar messages Mar 20 08:33:36 fir-md1-s2 kernel: LustreError: 99934:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8eef432f9800 x1628272897455776/t0(0) o104->fir-MDT0001@10.9.101.27@o2ib4:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Mar 20 08:33:44 fir-md1-s2 kernel: LustreError: 91535:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8ec63337fb00 x1628272897928640/t0(0) o104->fir-MDT0001@10.9.101.27@o2ib4:15/16 lens 296/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 Mar 20 08:33:44 fir-md1-s2 kernel: LustreError: 91535:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 4 previous similar messages Mar 20 08:34:29 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client fa565bf9-4e3f-b131-86ef-18427c7a396c (at 10.9.101.24@o2ib4) reconnecting Mar 20 08:34:29 fir-md1-s2 kernel: Lustre: Skipped 263 previous similar messages Mar 20 08:34:29 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.101.24@o2ib4) Mar 20 08:34:29 fir-md1-s2 kernel: Lustre: Skipped 270 previous similar messages Mar 20 08:39:47 fir-md1-s2 kernel: LustreError: 91438:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8edd2eaf2000 ns: mdt-fir-MDT0001_UUID lock: ffff8eba5c7a6e40/0xefacb2c3b34f5ca7 lrc: 3/0,0 mode: PR/PR res: [0x24000ed23:0xf4:0x0].0x0 bits 0x20/0x0 rrc: 32 type: IBT flags: 0x50200000000000 nid: 10.9.101.3@o2ib4 remote: 0x58d12bfdf409872a expref: 2 pid: 91438 timeout: 0 lvb_type: 0 Mar 20 08:39:47 fir-md1-s2 kernel: LNet: Service thread pid 91630 completed after 3931.28s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Mar 20 08:39:47 fir-md1-s2 kernel: LustreError: 91438:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 1 previous similar message Mar 20 08:39:47 fir-md1-s2 kernel: Lustre: 91438:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (41:3890s); client may timeout. req@ffff8eb2a8559200 x1626160150852352/t0(0) o101->251ee163-1302-af54-569b-06691a518665@10.9.101.3@o2ib4:6/0 lens 568/1688 e 1 to 0 dl 1553092497 ref 1 fl Complete:/0/0 rc -107/-107 Mar 20 08:40:17 fir-md1-s2 kernel: LustreError: 91513:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8eb935ff5000 ns: mdt-fir-MDT0001_UUID lock: ffff8eb1e122bf00/0xefacb2c42137849b lrc: 3/0,0 mode: PW/PW res: [0x24000ed23:0xf4:0x0].0x0 bits 0x40/0x0 rrc: 28 type: IBT flags: 0x50200400000020 nid: 10.9.101.6@o2ib4 remote: 0xc4d4d1f1ca7922ec expref: 287 pid: 91513 timeout: 0 lvb_type: 0 Mar 20 08:40:21 fir-md1-s2 kernel: Lustre: 14615:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 Mar 20 08:40:21 fir-md1-s2 kernel: Lustre: 14615:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 13 previous similar messages Mar 20 08:41:42 fir-md1-s2 kernel: Lustre: 91623:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 Mar 20 08:41:42 fir-md1-s2 kernel: Lustre: 91623:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 6 previous similar messages Mar 20 08:44:34 fir-md1-s2 kernel: Lustre: 91248:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 Mar 20 08:44:34 fir-md1-s2 kernel: Lustre: 91248:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 9 previous similar messages Mar 20 08:49:07 fir-md1-s2 kernel: Lustre: fir-MDT0003: Connection restored to (at 10.8.11.10@o2ib6) Mar 20 08:49:07 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.11.10@o2ib6) Mar 20 08:49:07 fir-md1-s2 kernel: Lustre: Skipped 87 previous similar messages Mar 20 08:52:43 fir-md1-s2 kernel: Lustre: 91498:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 Mar 20 08:52:43 fir-md1-s2 kernel: Lustre: 91498:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 17 previous similar messages Mar 20 08:54:12 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client a18f5541-141e-0b60-6f2b-a0d2a0024968 (at 10.9.104.48@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8eeb00229000, cur 1553097252 expire 1553097102 last 1553097025 Mar 20 08:55:28 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client 8c9bebd2-342e-0eed-0fe5-980eceed6b2a (at 10.8.10.28@o2ib6) in 225 seconds. I think it's dead, and I am evicting it. exp ffff8eee796b6000, cur 1553097328 expire 1553097178 last 1553097103 Mar 20 08:55:28 fir-md1-s2 kernel: Lustre: Skipped 29 previous similar messages Mar 20 08:59:41 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client dca8880b-a0a6-6e7b-c8b2-9c4e4dd52c70 (at 10.9.113.14@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8ebf8e141000, cur 1553097581 expire 1553097431 last 1553097354 Mar 20 08:59:41 fir-md1-s2 kernel: Lustre: Skipped 7 previous similar messages Mar 20 08:59:52 fir-md1-s2 kernel: Lustre: 99936:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8edf5661ad00 x1627854781666576/t0(0) o101->fa565bf9-4e3f-b131-86ef-18427c7a396c@10.9.101.24@o2ib4:27/0 lens 480/568 e 1 to 0 dl 1553097597 ref 2 fl Interpret:/0/0 rc 0/0 Mar 20 08:59:52 fir-md1-s2 kernel: Lustre: 99936:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 23 previous similar messages Mar 20 08:59:58 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client 21c1d1e0-a07e-9376-435e-d7152822fd0e (at 10.9.101.20@o2ib4) reconnecting Mar 20 08:59:58 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to f6951712-d609-35d5-7d48-8ccf309c970b (at 10.9.101.19@o2ib4) Mar 20 08:59:58 fir-md1-s2 kernel: Lustre: Skipped 89 previous similar messages Mar 20 09:01:07 fir-md1-s2 kernel: LustreError: 91240:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1553097577, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0001_UUID lock: ffff8ec3ce6a0fc0/0xefacb2c442b05608 lrc: 3/0,1 mode: --/PW res: [0x24000ed7c:0xdd:0x0].0x0 bits 0x2/0x0 rrc: 42 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 91240 timeout: 0 lvb_type: 0 Mar 20 09:01:07 fir-md1-s2 kernel: LustreError: 91240:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 30 previous similar messages Mar 20 09:01:22 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client f6951712-d609-35d5-7d48-8ccf309c970b (at 10.9.101.19@o2ib4) reconnecting Mar 20 09:01:22 fir-md1-s2 kernel: Lustre: Skipped 17 previous similar messages Mar 20 09:02:06 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 149s: evicting client at 10.9.101.19@o2ib4 ns: mdt-fir-MDT0001_UUID lock: ffff8eb76ca38480/0xefacb2c442b0513f lrc: 3/0,0 mode: PW/PW res: [0x24000ed7c:0xdd:0x0].0x0 bits 0x40/0x0 rrc: 42 type: IBT flags: 0x60200400000020 nid: 10.9.101.19@o2ib4 remote: 0xcb7379996fdebea6 expref: 62 pid: 91105 timeout: 942269 lvb_type: 0 Mar 20 09:02:06 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 3 previous similar messages Mar 20 09:02:06 fir-md1-s2 kernel: LustreError: 91263:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8eedf9774000 ns: mdt-fir-MDT0001_UUID lock: ffff8ed23e98dc40/0xefacb2c442b05694 lrc: 3/0,0 mode: PW/PW res: [0x24000ed7c:0xdd:0x0].0x0 bits 0x40/0x0 rrc: 34 type: IBT flags: 0x50200400000020 nid: 10.9.101.19@o2ib4 remote: 0xcb7379996fdebead expref: 8 pid: 91263 timeout: 0 lvb_type: 0 Mar 20 09:02:06 fir-md1-s2 kernel: LustreError: 91263:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 2 previous similar messages Mar 20 09:02:31 fir-md1-s2 kernel: Lustre: 14643:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8eb70debb000 x1626169503655152/t0(0) o101->f6951712-d609-35d5-7d48-8ccf309c970b@10.9.101.19@o2ib4:6/0 lens 600/3264 e 0 to 0 dl 1553097756 ref 2 fl Interpret:/0/0 rc 0/0 Mar 20 09:02:31 fir-md1-s2 kernel: Lustre: 14643:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 19 previous similar messages Mar 20 09:02:57 fir-md1-s2 kernel: LNet: Service thread pid 91665 was inactive for 200.65s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Mar 20 09:02:57 fir-md1-s2 kernel: Pid: 91665, comm: mdt00_080 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 20 09:02:57 fir-md1-s2 kernel: Call Trace: Mar 20 09:02:57 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 20 09:02:57 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 20 09:02:57 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Mar 20 09:02:57 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Mar 20 09:02:58 fir-md1-s2 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Mar 20 09:02:58 fir-md1-s2 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Mar 20 09:02:58 fir-md1-s2 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Mar 20 09:02:58 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Mar 20 09:02:58 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Mar 20 09:02:58 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Mar 20 09:02:58 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Mar 20 09:02:58 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 20 09:02:58 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 20 09:02:58 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 20 09:02:58 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 20 09:02:58 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 20 09:02:58 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 20 09:02:58 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1553097778.91665 Mar 20 09:02:59 fir-md1-s2 kernel: LNet: Service thread pid 91585 was inactive for 202.17s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Mar 20 09:02:59 fir-md1-s2 kernel: Pid: 91585, comm: mdt03_022 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 20 09:02:59 fir-md1-s2 kernel: Call Trace: Mar 20 09:02:59 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 20 09:02:59 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 20 09:02:59 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Mar 20 09:02:59 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Mar 20 09:02:59 fir-md1-s2 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Mar 20 09:02:59 fir-md1-s2 kernel: [] mdt_hsm_state_set+0xc9/0x830 [mdt] Mar 20 09:02:59 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 20 09:02:59 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 20 09:02:59 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 20 09:02:59 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 20 09:02:59 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 20 09:02:59 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 20 09:02:59 fir-md1-s2 kernel: Pid: 90864, comm: mdt03_001 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 20 09:02:59 fir-md1-s2 kernel: Call Trace: Mar 20 09:02:59 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 20 09:02:59 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 20 09:02:59 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Mar 20 09:02:59 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Mar 20 09:02:59 fir-md1-s2 kernel: [] mdt_object_lock_try+0x27/0xb0 [mdt] Mar 20 09:02:59 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x1287/0x1c30 [mdt] Mar 20 09:02:59 fir-md1-s2 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Mar 20 09:02:59 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Mar 20 09:02:59 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Mar 20 09:02:59 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Mar 20 09:02:59 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Mar 20 09:02:59 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 20 09:02:59 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 20 09:02:59 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 20 09:02:59 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 20 09:02:59 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 20 09:02:59 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 20 09:02:59 fir-md1-s2 kernel: Pid: 91575, comm: mdt00_064 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 20 09:02:59 fir-md1-s2 kernel: Call Trace: Mar 20 09:02:59 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 20 09:02:59 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 20 09:02:59 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Mar 20 09:02:59 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Mar 20 09:02:59 fir-md1-s2 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Mar 20 09:02:59 fir-md1-s2 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Mar 20 09:02:59 fir-md1-s2 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Mar 20 09:02:59 fir-md1-s2 kernel: [] mdt_intent_policy+0x2e8/0xd00 [mdt] Mar 20 09:02:59 fir-md1-s2 kernel: [] ldlm_lock_enqueue+0x366/0xa60 [ptlrpc] Mar 20 09:02:59 fir-md1-s2 kernel: [] ldlm_handle_enqueue0+0xa47/0x15a0 [ptlrpc] Mar 20 09:02:59 fir-md1-s2 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Mar 20 09:02:59 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 20 09:02:59 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 20 09:02:59 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 20 09:02:59 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 20 09:02:59 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 20 09:02:59 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 20 09:02:59 fir-md1-s2 kernel: Pid: 91620, comm: mdt00_073 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 20 09:02:59 fir-md1-s2 kernel: Call Trace: Mar 20 09:02:59 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 20 09:02:59 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 20 09:02:59 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Mar 20 09:02:59 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Mar 20 09:02:59 fir-md1-s2 kernel: [] mdt_reint_object_lock+0x2c/0x60 [mdt] Mar 20 09:02:59 fir-md1-s2 kernel: [] mdt_reint_striped_lock+0x8c/0x510 [mdt] Mar 20 09:02:59 fir-md1-s2 kernel: [] mdt_reint_setattr+0x6c8/0x12d0 [mdt] Mar 20 09:02:59 fir-md1-s2 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Mar 20 09:02:59 fir-md1-s2 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Mar 20 09:02:59 fir-md1-s2 kernel: [] mdt_reint+0x67/0x140 [mdt] Mar 20 09:02:59 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 20 09:02:59 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 20 09:02:59 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 20 09:02:59 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 20 09:02:59 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 20 09:02:59 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 20 09:02:59 fir-md1-s2 kernel: LNet: Service thread pid 91578 was inactive for 202.65s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Mar 20 09:03:36 fir-md1-s2 kernel: LustreError: 90859:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1553097726, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0001_UUID lock: ffff8ed033a3ca40/0xefacb2c447a9068d lrc: 3/0,1 mode: --/PW res: [0x24000ed7c:0xdd:0x0].0x0 bits 0x40/0x0 rrc: 37 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 90859 timeout: 0 lvb_type: 0 Mar 20 09:04:02 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client f2f4d52c-2807-60dd-8532-99fa3e9aeefa (at 10.0.10.3@o2ib7) reconnecting Mar 20 09:04:02 fir-md1-s2 kernel: Lustre: Skipped 35 previous similar messages Mar 20 09:04:36 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 150s: evicting client at 10.9.101.9@o2ib4 ns: mdt-fir-MDT0001_UUID lock: ffff8ed510f95c40/0xefacb2c442b057eb lrc: 3/0,0 mode: PW/PW res: [0x24000ed7c:0xdd:0x0].0x0 bits 0x40/0x0 rrc: 37 type: IBT flags: 0x60200400000020 nid: 10.9.101.9@o2ib4 remote: 0x20d2718795e89838 expref: 272 pid: 91522 timeout: 942419 lvb_type: 0 Mar 20 09:04:36 fir-md1-s2 kernel: LNet: Service thread pid 91366 completed after 299.08s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Mar 20 09:04:36 fir-md1-s2 kernel: LNet: Skipped 14 previous similar messages Mar 20 09:05:06 fir-md1-s2 kernel: LustreError: 91575:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8ebe9d16cc00 ns: mdt-fir-MDT0001_UUID lock: ffff8eb75d761200/0xefacb2c442b05b8e lrc: 3/0,0 mode: PW/PW res: [0x24000ed7c:0xdd:0x0].0x0 bits 0x40/0x0 rrc: 35 type: IBT flags: 0x50200400000020 nid: 10.9.101.18@o2ib4 remote: 0x51e79ecb4507c0ca expref: 17492 pid: 91575 timeout: 0 lvb_type: 0 Mar 20 09:05:06 fir-md1-s2 kernel: LustreError: 91575:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 1 previous similar message Mar 20 09:05:06 fir-md1-s2 kernel: LNet: Service thread pid 91575 completed after 329.52s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Mar 20 09:05:06 fir-md1-s2 kernel: LNet: Skipped 1 previous similar message Mar 20 09:05:07 fir-md1-s2 kernel: Lustre: 91663:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (314:16s); client may timeout. req@ffff8eb808120000 x1626174131593856/t0(0) o101->1831b60b-c0b3-6e16-f786-e9804146d690@10.9.101.9@o2ib4:27/0 lens 480/536 e 1 to 0 dl 1553097891 ref 1 fl Complete:/0/0 rc -107/-107 Mar 20 09:05:07 fir-md1-s2 kernel: Lustre: 91663:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 1 previous similar message Mar 20 09:05:17 fir-md1-s2 kernel: Lustre: 91401:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8ec1d6a80300 x1626174131669920/t0(0) o101->1831b60b-c0b3-6e16-f786-e9804146d690@10.9.101.9@o2ib4:22/0 lens 600/3264 e 0 to 0 dl 1553097922 ref 2 fl Interpret:/0/0 rc 0/0 Mar 20 09:05:17 fir-md1-s2 kernel: Lustre: 91401:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Mar 20 09:05:26 fir-md1-s2 kernel: LNet: Service thread pid 90859 was inactive for 200.03s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Mar 20 09:05:26 fir-md1-s2 kernel: LNet: Skipped 6 previous similar messages Mar 20 09:05:26 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1553097926.90859 Mar 20 09:06:06 fir-md1-s2 kernel: LustreError: 91060:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1553097876, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0001_UUID lock: ffff8eee5a65a400/0xefacb2c44c802d45 lrc: 3/1,0 mode: --/PR res: [0x24000ed7c:0xdd:0x0].0x0 bits 0x20/0x0 rrc: 28 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 91060 timeout: 0 lvb_type: 0 Mar 20 09:06:06 fir-md1-s2 kernel: LustreError: 91060:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 5 previous similar messages Mar 20 09:06:20 fir-md1-s2 kernel: LNet: Service thread pid 91215 was inactive for 200.34s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Mar 20 09:06:20 fir-md1-s2 kernel: LNet: Skipped 4 previous similar messages Mar 20 09:06:20 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1553097980.91215 Mar 20 09:07:36 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 149s: evicting client at 10.9.101.20@o2ib4 ns: mdt-fir-MDT0001_UUID lock: ffff8eb558ec3840/0xefacb2c442b05ee4 lrc: 3/0,0 mode: PW/PW res: [0x24000ed7c:0xdd:0x0].0x0 bits 0x40/0x0 rrc: 28 type: IBT flags: 0x60200400000020 nid: 10.9.101.20@o2ib4 remote: 0xee8b0e099e51c9cd expref: 47 pid: 91665 timeout: 942599 lvb_type: 0 Mar 20 09:07:36 fir-md1-s2 kernel: LustreError: 90840:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message Mar 20 09:07:36 fir-md1-s2 kernel: LustreError: 90864:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) ### lock on destroyed export ffff8ec161624c00 ns: mdt-fir-MDT0001_UUID lock: ffff8ec9f5757500/0xefacb2c442b06153 lrc: 3/0,0 mode: PR/PR res: [0x24000ed7c:0xdd:0x0].0x0 bits 0x1b/0x0 rrc: 24 type: IBT flags: 0x50200400000020 nid: 10.9.101.9@o2ib4 remote: 0x20d2718795e89870 expref: 4 pid: 90864 timeout: 0 lvb_type: 0 Mar 20 09:07:36 fir-md1-s2 kernel: LNet: Service thread pid 91267 completed after 479.09s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Mar 20 09:07:36 fir-md1-s2 kernel: LNet: Skipped 5 previous similar messages Mar 20 09:07:36 fir-md1-s2 kernel: LustreError: 90864:0:(ldlm_lockd.c:1357:ldlm_handle_enqueue0()) Skipped 2 previous similar messages Mar 20 09:07:36 fir-md1-s2 kernel: Lustre: 90864:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (314:165s); client may timeout. req@ffff8eef5066bc00 x1626174131594032/t0(0) o101->1831b60b-c0b3-6e16-f786-e9804146d690@10.9.101.9@o2ib4:27/0 lens 576/1168 e 1 to 0 dl 1553097891 ref 1 fl Complete:/0/0 rc -107/-107 Mar 20 09:07:36 fir-md1-s2 kernel: Lustre: 90864:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 3 previous similar messages Mar 20 09:07:45 fir-md1-s2 kernel: Lustre: 91375:0:(mdd_device.c:1794:mdd_changelog_clear()) fir-MDD0001: Failure to clear the changelog for user 1: -22 Mar 20 09:07:45 fir-md1-s2 kernel: Lustre: 91375:0:(mdd_device.c:1794:mdd_changelog_clear()) Skipped 6 previous similar messages Mar 20 09:13:09 fir-md1-s2 kernel: Lustre: 91642:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/-5), not sending early reply req@ffff8ed5d7729b00 x1626251799025920/t112686935075(0) o36->d8b1c71c-afe3-dcf8-5df2-190b0d37ec72@10.9.107.38@o2ib4:14/0 lens 536/2888 e 0 to 0 dl 1553098394 ref 2 fl Interpret:/0/0 rc 0/0 Mar 20 09:13:09 fir-md1-s2 kernel: Lustre: 91642:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 6 previous similar messages Mar 20 09:13:13 fir-md1-s2 kernel: LustreError: 91700:0:(mdt_io.c:442:mdt_preprw_write()) fir-MDT0001: BRW to missing obj [0x24000ed76:0x1d306:0x0] Mar 20 09:13:15 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client d8b1c71c-afe3-dcf8-5df2-190b0d37ec72 (at 10.9.107.38@o2ib4) reconnecting Mar 20 09:13:15 fir-md1-s2 kernel: Lustre: Skipped 41 previous similar messages Mar 20 09:13:15 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to f2f4d52c-2807-60dd-8532-99fa3e9aeefa (at 10.0.10.3@o2ib7) Mar 20 09:13:15 fir-md1-s2 kernel: Lustre: Skipped 104 previous similar messages Mar 20 09:14:14 fir-md1-s2 kernel: LustreError: 14629:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1553098364, 90s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0001_UUID lock: ffff8ed6f3a560c0/0xefacb2c45c745bd0 lrc: 3/0,1 mode: --/PW res: [0x240008e92:0x17fd0:0x0].0x0 bits 0x40/0x0 rrc: 4 type: IBT flags: 0x40010080000000 nid: local remote: 0x0 expref: -99 pid: 14629 timeout: 0 lvb_type: 0 Mar 20 09:14:14 fir-md1-s2 kernel: LustreError: 14629:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 4 previous similar messages Mar 20 09:16:05 fir-md1-s2 kernel: LNet: Service thread pid 14629 was inactive for 200.55s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Mar 20 09:16:05 fir-md1-s2 kernel: LNet: Skipped 3 previous similar messages Mar 20 09:16:05 fir-md1-s2 kernel: Pid: 14629, comm: mdt02_105 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 20 09:16:05 fir-md1-s2 kernel: Call Trace: Mar 20 09:16:05 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 20 09:16:05 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 20 09:16:05 fir-md1-s2 kernel: [] mdt_dom_discard_data+0x101/0x130 [mdt] Mar 20 09:16:05 fir-md1-s2 kernel: [] mdt_reint_unlink+0x331/0x14b0 [mdt] Mar 20 09:16:05 fir-md1-s2 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Mar 20 09:16:05 fir-md1-s2 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Mar 20 09:16:05 fir-md1-s2 kernel: [] mdt_reint+0x67/0x140 [mdt] Mar 20 09:16:05 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 20 09:16:05 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 20 09:16:05 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 20 09:16:05 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 20 09:16:05 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 20 09:16:05 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 20 09:16:05 fir-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1553098565.14629 Mar 20 09:16:06 fir-md1-s2 kernel: LNet: Service thread pid 91619 was inactive for 202.04s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Mar 20 09:16:06 fir-md1-s2 kernel: Pid: 91619, comm: mdt00_072 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Dec 7 14:50:35 PST 2018 Mar 20 09:16:06 fir-md1-s2 kernel: Call Trace: Mar 20 09:16:06 fir-md1-s2 kernel: [] ldlm_completion_ast+0x63d/0x920 [ptlrpc] Mar 20 09:16:06 fir-md1-s2 kernel: [] ldlm_cli_enqueue_local+0x23c/0x870 [ptlrpc] Mar 20 09:16:06 fir-md1-s2 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Mar 20 09:16:06 fir-md1-s2 kernel: [] mdt_object_lock_internal+0x70/0x3e0 [mdt] Mar 20 09:16:06 fir-md1-s2 kernel: [] mdt_getattr_name_lock+0x11d/0x1c30 [mdt] Mar 20 09:16:06 fir-md1-s2 kernel: [] mdt_getattr_name+0xc4/0x2b0 [mdt] Mar 20 09:16:06 fir-md1-s2 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Mar 20 09:16:06 fir-md1-s2 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Mar 20 09:16:06 fir-md1-s2 kernel: [] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] Mar 20 09:16:06 fir-md1-s2 kernel: [] kthread+0xd1/0xe0 Mar 20 09:16:06 fir-md1-s2 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Mar 20 09:16:06 fir-md1-s2 kernel: [] 0xffffffffffffffff Mar 20 09:23:25 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to 0912605b-5489-ac4a-4b69-232f8c87381d (at 10.9.104.40@o2ib4) Mar 20 09:23:25 fir-md1-s2 kernel: Lustre: Skipped 98 previous similar messages Mar 20 09:23:36 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client f2f4d52c-2807-60dd-8532-99fa3e9aeefa (at 10.0.10.3@o2ib7) reconnecting Mar 20 09:23:36 fir-md1-s2 kernel: Lustre: Skipped 38 previous similar messages Mar 20 09:33:25 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to d8b1c71c-afe3-dcf8-5df2-190b0d37ec72 (at 10.9.107.38@o2ib4) Mar 20 09:33:25 fir-md1-s2 kernel: Lustre: Skipped 85 previous similar messages Mar 20 09:33:56 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client f2f4d52c-2807-60dd-8532-99fa3e9aeefa (at 10.0.10.3@o2ib7) reconnecting Mar 20 09:33:56 fir-md1-s2 kernel: Lustre: Skipped 39 previous similar messages Mar 20 09:43:45 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to f2f4d52c-2807-60dd-8532-99fa3e9aeefa (at 10.0.10.3@o2ib7) Mar 20 09:43:45 fir-md1-s2 kernel: Lustre: Skipped 38 previous similar messages Mar 20 09:44:16 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client f2f4d52c-2807-60dd-8532-99fa3e9aeefa (at 10.0.10.3@o2ib7) reconnecting Mar 20 09:44:16 fir-md1-s2 kernel: Lustre: Skipped 39 previous similar messages Mar 20 09:48:47 fir-md1-s2 kernel: LNetError: 90656:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Mar 20 09:54:05 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to f2f4d52c-2807-60dd-8532-99fa3e9aeefa (at 10.0.10.3@o2ib7) Mar 20 09:54:05 fir-md1-s2 kernel: Lustre: Skipped 39 previous similar messages Mar 20 09:54:36 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client f2f4d52c-2807-60dd-8532-99fa3e9aeefa (at 10.0.10.3@o2ib7) reconnecting Mar 20 09:54:36 fir-md1-s2 kernel: Lustre: Skipped 39 previous similar messages Mar 20 09:55:32 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client 2d5487f8-fc66-6ff4-f642-a858d8c06d39 (at 10.8.14.1@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8ecdc1b19800, cur 1553100932 expire 1553100782 last 1553100705 Mar 20 09:55:32 fir-md1-s2 kernel: Lustre: Skipped 61 previous similar messages Mar 20 09:56:03 fir-md1-s2 kernel: perf: interrupt took too long (4018 > 3917), lowering kernel.perf_event_max_sample_rate to 49000 Mar 20 10:02:39 fir-md1-s2 kernel: LNetError: 90655:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Mar 20 10:04:25 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to f2f4d52c-2807-60dd-8532-99fa3e9aeefa (at 10.0.10.3@o2ib7) Mar 20 10:04:25 fir-md1-s2 kernel: Lustre: Skipped 39 previous similar messages Mar 20 10:04:56 fir-md1-s2 kernel: Lustre: fir-MDT0001: Client f2f4d52c-2807-60dd-8532-99fa3e9aeefa (at 10.0.10.3@o2ib7) reconnecting Mar 20 10:04:56 fir-md1-s2 kernel: Lustre: Skipped 39 previous similar messages Mar 20 10:07:09 fir-md1-s2 kernel: Lustre: 91114:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1553101621/real 1553101621] req@ffff8ec154f7bf00 x1628273331682224/t0(0) o104->fir-MDT0003@10.8.17.3@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1553101628 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Mar 20 10:07:09 fir-md1-s2 kernel: Lustre: 91114:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Mar 20 10:07:16 fir-md1-s2 kernel: Lustre: 91545:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffff8ec48730da00 x1627854694389744/t117865377602(0) o36->80ff07d8-e182-1d05-a62c-cc8fa29edc49@10.8.20.32@o2ib6:21/0 lens 488/3152 e 1 to 0 dl 1553101641 ref 2 fl Interpret:/0/0 rc 0/0 Mar 20 10:07:16 fir-md1-s2 kernel: Lustre: 91114:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1553101629/real 1553101629] req@ffff8ec154f7bf00 x1628273331682224/t0(0) o104->fir-MDT0003@10.8.17.3@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1553101636 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Mar 20 10:07:23 fir-md1-s2 kernel: Lustre: 91114:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1553101636/real 1553101636] req@ffff8ec154f7bf00 x1628273331682224/t0(0) o104->fir-MDT0003@10.8.17.3@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1553101643 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Mar 20 10:07:37 fir-md1-s2 kernel: Lustre: 91114:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1553101650/real 1553101650] req@ffff8ec154f7bf00 x1628273331682224/t0(0) o104->fir-MDT0003@10.8.17.3@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1553101657 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Mar 20 10:07:37 fir-md1-s2 kernel: Lustre: 91114:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 1 previous similar message Mar 20 10:07:58 fir-md1-s2 kernel: Lustre: 91114:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1553101671/real 1553101671] req@ffff8ec154f7bf00 x1628273331682224/t0(0) o104->fir-MDT0003@10.8.17.3@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1553101678 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Mar 20 10:07:58 fir-md1-s2 kernel: Lustre: 91114:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Mar 20 10:08:33 fir-md1-s2 kernel: Lustre: 91114:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1553101706/real 1553101706] req@ffff8ec154f7bf00 x1628273331682224/t0(0) o104->fir-MDT0003@10.8.17.3@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1553101713 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 Mar 20 10:08:33 fir-md1-s2 kernel: Lustre: 91114:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 4 previous similar messages Mar 20 10:09:36 fir-md1-s2 kernel: Lustre: fir-MDT0003: haven't heard from client b4e5396b-977b-db1e-9737-f69d52fedd19 (at 10.8.8.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8ed078433400, cur 1553101776 expire 1553101626 last 1553101549 Mar 20 10:09:36 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Mar 20 10:09:41 fir-md1-s2 kernel: Lustre: fir-MDT0001: haven't heard from client c949f5b2-7a2e-f96d-353d-05d9c3a77be2 (at 10.8.7.13@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8ee8552ea800, cur 1553101781 expire 1553101631 last 1553101554 Mar 20 10:09:41 fir-md1-s2 kernel: Lustre: Skipped 63 previous similar messages Mar 20 10:14:32 fir-md1-s2 kernel: LNetError: 90655:0:(lib-msg.c:811:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Mar 20 10:14:45 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to f2f4d52c-2807-60dd-8532-99fa3e9aeefa (at 10.0.10.3@o2ib7) Mar 20 10:14:45 fir-md1-s2 kernel: Lustre: Skipped 46 previous similar messages