Nov 04 23:55:19 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 04 23:59:23 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.106.17@o2ib4) Nov 04 23:59:23 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 00:00:57 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.107.21@o2ib4) Nov 05 00:00:57 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 00:01:13 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 00:01:13 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 00:02:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 7f9aa22d-36da-1a94-f631-a264f7aa9590 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffffa13e3478f000, cur 1572940966 expire 1572940816 last 1572940739 Nov 05 00:02:46 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 00:03:17 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 6c6ffb8c-d9db-2aba-c433-7efcd410416a (at 10.8.26.4@o2ib6) Nov 05 00:03:17 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 00:08:20 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 61666ad4-a74e-87a0-ba54-1d3f2aad9b41 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffffa12d6e971400, cur 1572941300 expire 1572941150 last 1572941073 Nov 05 00:08:20 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 00:11:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 00:11:15 fir-md1-s1 kernel: LustreError: Skipped 22 previous similar messages Nov 05 00:12:31 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 6c6ffb8c-d9db-2aba-c433-7efcd410416a (at 10.8.26.4@o2ib6) Nov 05 00:12:31 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 00:21:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 00:21:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 00:21:18 fir-md1-s1 kernel: LustreError: Skipped 22 previous similar messages Nov 05 00:28:52 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 6c6ffb8c-d9db-2aba-c433-7efcd410416a (at 10.8.26.4@o2ib6) Nov 05 00:28:52 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 00:29:26 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 23e87ab7-b7a9-fab5-66b2-6a5845711369 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffffa136d6b4c000, cur 1572942566 expire 1572942416 last 1572942339 Nov 05 00:29:26 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 00:31:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 00:31:20 fir-md1-s1 kernel: LustreError: Skipped 22 previous similar messages Nov 05 00:33:32 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 6c6ffb8c-d9db-2aba-c433-7efcd410416a (at 10.8.26.4@o2ib6) Nov 05 00:33:32 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 00:34:46 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 5fc51756-3c12-1f30-edcd-d23b80a2b589 (at 10.8.26.4@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffffa11e364b1800, cur 1572942886 expire 1572942736 last 1572942659 Nov 05 00:34:46 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 00:41:22 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 00:41:22 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 00:51:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 00:51:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 00:51:24 fir-md1-s1 kernel: LustreError: Skipped 22 previous similar messages Nov 05 00:58:00 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.107.51@o2ib4) Nov 05 00:58:00 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 01:01:17 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.107.52@o2ib4) Nov 05 01:01:17 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 01:01:26 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 01:01:26 fir-md1-s1 kernel: LustreError: Skipped 22 previous similar messages Nov 05 01:11:28 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 01:11:28 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 01:21:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 01:21:30 fir-md1-s1 kernel: LustreError: Skipped 22 previous similar messages Nov 05 01:31:32 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 01:31:32 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 01:41:34 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 01:41:34 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 01:51:37 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 01:51:37 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 02:01:39 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 02:01:39 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 02:08:50 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client eec53668-9e11-a36f-991f-f83d08f9400c (at 10.9.108.69@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffffa12a3b70f800, cur 1572948530 expire 1572948380 last 1572948303 Nov 05 02:08:50 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 02:11:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 02:11:41 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 02:21:43 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 02:21:43 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 02:31:45 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 02:31:45 fir-md1-s1 kernel: LustreError: Skipped 22 previous similar messages Nov 05 02:32:56 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.108.69@o2ib4) Nov 05 02:32:56 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 02:41:47 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 02:41:47 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 02:51:49 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 02:51:49 fir-md1-s1 kernel: LustreError: Skipped 22 previous similar messages Nov 05 03:01:51 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 03:01:51 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 03:11:53 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 03:11:53 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 03:21:56 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 03:21:56 fir-md1-s1 kernel: LustreError: Skipped 22 previous similar messages Nov 05 03:31:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 03:31:58 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 03:42:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 03:42:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 03:42:00 fir-md1-s1 kernel: LustreError: Skipped 22 previous similar messages Nov 05 03:52:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 03:52:02 fir-md1-s1 kernel: LustreError: Skipped 22 previous similar messages Nov 05 04:02:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 04:02:04 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 04:12:06 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 04:12:06 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 04:22:08 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 04:22:08 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 04:32:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 04:32:10 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 04:42:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 04:42:12 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 04:52:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 04:52:15 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 04:56:51 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.14@o2ib6) Nov 05 04:56:51 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 05:02:17 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 05:02:17 fir-md1-s1 kernel: LustreError: Skipped 22 previous similar messages Nov 05 05:12:19 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 05:12:19 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 05:22:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 05:22:21 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 05:32:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 05:32:23 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 05:36:39 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.115.13@o2ib4) Nov 05 05:36:39 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 05:42:25 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 05:42:25 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 05:52:27 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 05:52:27 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 06:02:29 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 06:02:29 fir-md1-s1 kernel: LustreError: Skipped 22 previous similar messages Nov 05 06:12:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 06:12:31 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 06:22:34 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 06:22:34 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 06:32:36 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 06:32:36 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 06:42:38 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 06:42:38 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 06:52:40 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 06:52:40 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 07:03:01 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 07:03:01 fir-md1-s1 kernel: LustreError: Skipped 22 previous similar messages Nov 05 07:13:08 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 07:13:08 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 07:22:20 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.115.12@o2ib4) Nov 05 07:22:20 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 07:23:10 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 07:23:10 fir-md1-s1 kernel: LustreError: Skipped 22 previous similar messages Nov 05 07:26:01 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.116.5@o2ib4) Nov 05 07:26:01 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 07:33:12 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 07:33:12 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 07:34:58 fir-md1-s1 kernel: perf: interrupt took too long (2503 > 2500), lowering kernel.perf_event_max_sample_rate to 79000 Nov 05 07:43:14 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 07:43:14 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 07:53:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 07:53:16 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 08:03:19 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 08:03:19 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 08:13:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 08:13:21 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 08:23:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 08:23:23 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 08:25:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 51a21c02-9c85-2ad1-5519-18d441d20b35 (at 10.9.110.14@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffffa10df923ac00, cur 1572971129 expire 1572970979 last 1572970902 Nov 05 08:25:29 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 08:33:25 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 08:33:25 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 08:36:17 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.106.4@o2ib4) Nov 05 08:36:17 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 08:37:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client b183ba99-6f34-bb51-d879-80707620fdc9 (at 10.8.27.23@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffffa12e1652a000, cur 1572971845 expire 1572971695 last 1572971618 Nov 05 08:37:25 fir-md1-s1 kernel: Lustre: Skipped 77 previous similar messages Nov 05 08:42:42 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.114.8@o2ib4) Nov 05 08:42:42 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 08:43:04 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.112.12@o2ib4) Nov 05 08:43:04 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 08:43:27 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 08:43:27 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 08:43:44 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.115.10@o2ib4) Nov 05 08:43:44 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 08:44:13 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.115.13@o2ib4) Nov 05 08:44:13 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 08:44:29 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.112.13@o2ib4) Nov 05 08:44:29 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Nov 05 08:46:33 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.110.20@o2ib4) Nov 05 08:46:33 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Nov 05 08:47:05 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.110.55@o2ib4) Nov 05 08:47:05 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Nov 05 08:48:17 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.108.40@o2ib4) Nov 05 08:48:17 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Nov 05 08:50:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 46c32956-662b-0706-c80a-bc0e57525ada (at 10.9.106.4@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffffa1254a451400, cur 1572972633 expire 1572972483 last 1572972406 Nov 05 08:50:33 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 08:52:59 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.116.6@o2ib4) Nov 05 08:52:59 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Nov 05 08:53:29 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 08:53:29 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 08:58:15 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.110.21@o2ib4) Nov 05 08:58:15 fir-md1-s1 kernel: Lustre: Skipped 13 previous similar messages Nov 05 09:03:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 09:03:31 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 09:07:01 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.102.46@o2ib4) Nov 05 09:07:01 fir-md1-s1 kernel: Lustre: Skipped 15 previous similar messages Nov 05 09:08:32 fir-md1-s1 kernel: perf: interrupt took too long (3130 > 3128), lowering kernel.perf_event_max_sample_rate to 63000 Nov 05 09:13:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 09:13:33 fir-md1-s1 kernel: LustreError: Skipped 22 previous similar messages Nov 05 09:18:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 8e383076-a475-0bf7-60a7-327bb5d9b5ef (at 10.9.116.10@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffffa121572a4800, cur 1572974315 expire 1572974165 last 1572974088 Nov 05 09:18:35 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 09:23:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 09:23:35 fir-md1-s1 kernel: LustreError: Skipped 22 previous similar messages Nov 05 09:29:04 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.107.4@o2ib4) Nov 05 09:29:04 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Nov 05 09:33:38 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 09:33:38 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 09:36:10 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client f9d7a98a-9714-1d86-ab2e-6cf84b814c56 (at 10.9.115.3@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffffa12d332ddc00, cur 1572975370 expire 1572975220 last 1572975143 Nov 05 09:36:10 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Nov 05 09:37:35 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.115.11@o2ib4) Nov 05 09:37:35 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 09:43:40 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 09:43:40 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 09:52:57 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.116.10@o2ib4) Nov 05 09:52:57 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 09:53:42 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 09:53:42 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 09:56:24 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.115.3@o2ib4) Nov 05 09:56:24 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 10:03:44 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 10:03:44 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 10:06:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 4a58c126-a568-2ef3-292b-60f093d08be1 (at 10.9.115.3@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffffa10ecae38000, cur 1572977189 expire 1572977039 last 1572976962 Nov 05 10:06:29 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 10:13:46 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 10:13:46 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 10:23:48 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 10:23:48 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 10:23:48 fir-md1-s1 kernel: LustreError: Skipped 22 previous similar messages Nov 05 10:27:04 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.115.3@o2ib4) Nov 05 10:27:04 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 10:33:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 10:33:50 fir-md1-s1 kernel: LustreError: Skipped 22 previous similar messages Nov 05 10:40:30 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.116.10@o2ib4) Nov 05 10:40:30 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 10:43:52 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 10:43:52 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 10:43:52 fir-md1-s1 kernel: LustreError: Skipped 22 previous similar messages Nov 05 10:44:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client d9e8d1d1-af07-ce57-473c-319ce9637cb5 (at 10.9.116.3@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffffa12dde116800, cur 1572979476 expire 1572979326 last 1572979249 Nov 05 10:44:36 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Nov 05 10:53:54 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 10:53:54 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 10:54:29 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client 7867b39b-8c2f-9360-7dc8-eb7198fb273f (at 10.9.107.50@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffffa13dc7ab6000, cur 1572980069 expire 1572979919 last 1572979842 Nov 05 10:54:29 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Nov 05 10:54:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 0173e697-ddaa-53e3-60b2-4f9d4eed09b5 (at 10.9.107.50@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffffa12db06dac00, cur 1572980077 expire 1572979927 last 1572979850 Nov 05 11:03:02 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.115.11@o2ib4) Nov 05 11:03:02 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 11:03:57 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 11:03:57 fir-md1-s1 kernel: LustreError: Skipped 22 previous similar messages Nov 05 11:04:08 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.116.3@o2ib4) Nov 05 11:04:08 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 11:06:54 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.107.4@o2ib4) Nov 05 11:06:54 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 11:13:59 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 11:13:59 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 11:18:13 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.107.50@o2ib4) Nov 05 11:18:13 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 11:19:21 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.107.49@o2ib4) Nov 05 11:19:21 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 11:24:01 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 11:24:01 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 11:34:03 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 11:34:03 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 11:41:06 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.112.1@o2ib4) Nov 05 11:41:06 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 11:44:05 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 11:44:05 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 11:44:05 fir-md1-s1 kernel: LustreError: Skipped 22 previous similar messages Nov 05 11:49:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client d0970a80-0067-6f05-bc50-bc98c606719c (at 10.9.106.60@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffffa1210c2d0800, cur 1572983363 expire 1572983213 last 1572983136 Nov 05 11:54:07 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 11:54:07 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 12:03:50 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.114.2@o2ib4) Nov 05 12:03:50 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 12:04:09 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 12:04:09 fir-md1-s1 kernel: LustreError: Skipped 22 previous similar messages Nov 05 12:13:23 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.107.29@o2ib4) Nov 05 12:13:23 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 12:13:47 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.106.60@o2ib4) Nov 05 12:13:47 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 12:14:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 12:14:11 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 12:24:13 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 12:24:13 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 12:34:16 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 12:34:16 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 12:44:18 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 12:44:18 fir-md1-s1 kernel: LustreError: Skipped 22 previous similar messages Nov 05 12:47:12 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 7f4b342c-9749-8c84-c672-1dbe439818af (at 10.9.106.54@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffffa12e1652e400, cur 1572986832 expire 1572986682 last 1572986605 Nov 05 12:47:12 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 12:49:41 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.106.54@o2ib4) Nov 05 12:49:41 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 12:54:20 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 12:54:20 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 12:59:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 9d70720d-7016-4c13-d12c-539714dab902 (at 10.9.108.20@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffffa1214f73bc00, cur 1572987561 expire 1572987411 last 1572987334 Nov 05 12:59:21 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 13:01:36 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.108.20@o2ib4) Nov 05 13:01:36 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 13:04:22 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 13:04:22 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 13:14:24 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 13:14:24 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 13:24:26 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 13:24:26 fir-md1-s1 kernel: LustreError: Skipped 22 previous similar messages Nov 05 13:24:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client ec422765-5d0c-62dc-308c-7711ebc93482 (at 10.9.108.20@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffffa134fffd3c00, cur 1572989078 expire 1572988928 last 1572988851 Nov 05 13:24:38 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 13:25:15 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.108.20@o2ib4) Nov 05 13:25:15 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 13:34:28 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 13:34:28 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 13:43:18 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.106.26@o2ib4) Nov 05 13:43:18 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 13:44:30 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 13:44:30 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 13:51:57 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.101.40@o2ib4) Nov 05 13:51:57 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 13:54:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 13:54:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 13:54:33 fir-md1-s1 kernel: LustreError: Skipped 22 previous similar messages Nov 05 14:04:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 14:04:35 fir-md1-s1 kernel: LustreError: Skipped 22 previous similar messages Nov 05 14:08:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client f105afde-8fa5-537d-8346-d77717e1b922 (at 10.8.31.10@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffffa12e1029a400, cur 1572991716 expire 1572991566 last 1572991489 Nov 05 14:08:36 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 14:14:37 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 14:14:37 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 14:18:50 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client e55d06a4-4856-ba5e-428b-486382f61d70 (at 10.9.106.13@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffffa13dc6c52000, cur 1572992330 expire 1572992180 last 1572992103 Nov 05 14:18:50 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 14:18:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client f4f23800-f63d-171b-dcef-50edbe5d431b (at 10.9.106.13@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffffa12c89240800, cur 1572992336 expire 1572992186 last 1572992109 Nov 05 14:24:39 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 14:24:39 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 14:34:37 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.31.10@o2ib6) Nov 05 14:34:37 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 14:34:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 14:34:41 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 14:41:01 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.106.13@o2ib4) Nov 05 14:41:01 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 14:44:27 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.30.5@o2ib6) Nov 05 14:44:27 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 14:44:43 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 14:44:43 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 14:54:45 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 14:54:45 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 15:04:47 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 15:04:47 fir-md1-s1 kernel: LustreError: Skipped 22 previous similar messages Nov 05 15:14:49 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 15:14:49 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 15:24:52 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 15:24:52 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 15:25:23 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 01c55571-2d56-bead-b405-095b48ee38e2 (at 10.9.107.50@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffffa10ae9399400, cur 1572996323 expire 1572996173 last 1572996096 Nov 05 15:34:54 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 15:34:54 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 15:43:43 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.108.11@o2ib4) Nov 05 15:43:43 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 15:44:00 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 16505992-06d4-841b-2891-ba7d1e60ea4e (at 10.9.113.1@o2ib4) Nov 05 15:44:00 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 15:44:56 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 15:44:56 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 15:48:05 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.109.66@o2ib4) Nov 05 15:48:05 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 15:48:22 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 6caeb0b0-1879-6bd2-8b60-f282f622a3f7 (at 10.9.109.68@o2ib4) Nov 05 15:48:22 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 15:48:29 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.109.4@o2ib4) Nov 05 15:48:29 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Nov 05 15:48:56 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.101.59@o2ib4) Nov 05 15:48:56 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 15:49:29 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.107.49@o2ib4) Nov 05 15:49:29 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 15:54:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 15:54:58 fir-md1-s1 kernel: LustreError: Skipped 22 previous similar messages Nov 05 15:58:28 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.112.1@o2ib4) Nov 05 15:58:28 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 16:05:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 16:05:00 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 16:13:31 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.101.53@o2ib4) Nov 05 16:13:31 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 16:13:42 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.101.30@o2ib4) Nov 05 16:13:42 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 16:13:59 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.115.1@o2ib4) Nov 05 16:13:59 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 16:15:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 16:15:02 fir-md1-s1 kernel: LustreError: Skipped 22 previous similar messages Nov 05 16:16:43 fir-md1-s1 kernel: LNetError: 42404:0:(o2iblnd_cb.c:2961:kiblnd_rejected()) 10.0.10.204@o2ib7 rejected: o2iblnd fatal error Nov 05 16:16:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 3c020cd0-089d-acb1-e879-86429192cebf (at 10.8.27.2@o2ib6) reconnecting Nov 05 16:16:45 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 16:16:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to (at 10.8.27.2@o2ib6) Nov 05 16:16:45 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Nov 05 16:16:47 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.7.8@o2ib6, removing former export from same NID Nov 05 16:16:50 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.7.9@o2ib6, removing former export from same NID Nov 05 16:16:50 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 16:16:52 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.20.3@o2ib6, removing former export from same NID Nov 05 16:16:52 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Nov 05 16:16:54 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.7.13@o2ib6, removing former export from same NID Nov 05 16:16:54 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Nov 05 16:16:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 379bd79e-4547-3245-7942-cd3e89e75fec (at 10.8.20.11@o2ib6) reconnecting Nov 05 16:16:54 fir-md1-s1 kernel: Lustre: Skipped 24 previous similar messages Nov 05 16:16:58 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.18.13@o2ib6, removing former export from same NID Nov 05 16:16:58 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Nov 05 16:17:05 fir-md1-s1 kernel: LustreError: 41567:0:(ldlm_lib.c:3256:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffffa12e3d721850 x1649289152157184/t0(0) o4->c777a6c5-e0f8-3280-9717-224e36706d7f@10.8.27.3@o2ib6:738/0 lens 488/448 e 1 to 0 dl 1572999448 ref 1 fl Interpret:/0/0 rc 0/0 Nov 05 16:17:05 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO write error with c777a6c5-e0f8-3280-9717-224e36706d7f (at 10.8.27.3@o2ib6), client will retry: rc = -110 Nov 05 16:17:05 fir-md1-s1 kernel: LustreError: 41567:0:(ldlm_lib.c:3256:target_bulk_io()) Skipped 2 previous similar messages Nov 05 16:17:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.31.5@o2ib6, removing former export from same NID Nov 05 16:17:07 fir-md1-s1 kernel: Lustre: Skipped 23 previous similar messages Nov 05 16:17:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 18b27f89-a34a-31bb-f8dd-7db0e726f174 (at 10.8.15.3@o2ib6) reconnecting Nov 05 16:17:14 fir-md1-s1 kernel: Lustre: Skipped 32 previous similar messages Nov 05 16:17:43 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.22.8@o2ib6, removing former export from same NID Nov 05 16:17:43 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Nov 05 16:17:53 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.22.13@o2ib6) Nov 05 16:17:53 fir-md1-s1 kernel: Lustre: Skipped 124 previous similar messages Nov 05 16:18:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 3267df8e-4520-8b87-bad4-c836649229ba (at 10.8.26.4@o2ib6) reconnecting Nov 05 16:18:34 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 16:21:13 fir-md1-s1 kernel: LNet: 19748:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.204@o2ib7: 1 seconds Nov 05 16:24:49 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.0.66@o2ib6) Nov 05 16:24:49 fir-md1-s1 kernel: Lustre: Skipped 6 previous similar messages Nov 05 16:25:05 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 16:25:05 fir-md1-s1 kernel: LustreError: Skipped 37 previous similar messages Nov 05 16:26:13 fir-md1-s1 kernel: LNet: 19748:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.204@o2ib7: 0 seconds Nov 05 16:28:16 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client b3a05eed-2a4b-daf7-96e8-65768daebb42 (at 10.8.7.13@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffffa12dd7c3e800, cur 1573000096 expire 1572999946 last 1572999869 Nov 05 16:28:16 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Nov 05 16:28:22 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client d8a13d34-6bac-f796-7b10-4ee168b88c28 (at 10.8.24.31@o2ib6) reconnecting Nov 05 16:28:22 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Nov 05 16:28:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.21.13@o2ib6, removing former export from same NID Nov 05 16:28:22 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Nov 05 16:28:23 fir-md1-s1 kernel: Lustre: 41041:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1573000096/real 1573000096] req@ffffa12994b52880 x1649322770654496/t0(0) o106->fir-MDT0000@10.8.30.2@o2ib6:15/16 lens 296/280 e 0 to 1 dl 1573000103 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Nov 05 16:28:23 fir-md1-s1 kernel: Lustre: 41041:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 9 previous similar messages Nov 05 16:28:25 fir-md1-s1 kernel: LustreError: 41743:0:(ldlm_lib.c:3256:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffffa112776c7850 x1649289110301744/t0(0) o4->3c020cd0-089d-acb1-e879-86429192cebf@10.8.27.2@o2ib6:666/0 lens 488/448 e 1 to 0 dl 1573000131 ref 1 fl Interpret:/0/0 rc 0/0 Nov 05 16:28:25 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO write error with 3c020cd0-089d-acb1-e879-86429192cebf (at 10.8.27.2@o2ib6), client will retry: rc = -110 Nov 05 16:28:25 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Nov 05 16:28:30 fir-md1-s1 kernel: LustreError: 41659:0:(ldlm_lib.c:3256:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffffa10c708e4050 x1649043652512016/t0(0) o4->0cc7c050-dcae-faee-002b-f33d51a47d1f@10.8.30.2@o2ib6:668/0 lens 488/448 e 1 to 0 dl 1573000133 ref 1 fl Interpret:/0/0 rc 0/0 Nov 05 16:28:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO write error with 0cc7c050-dcae-faee-002b-f33d51a47d1f (at 10.8.30.2@o2ib6), client will retry: rc = -110 Nov 05 16:28:32 fir-md1-s1 kernel: Lustre: 41427:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1573000105/real 1573000105] req@ffffa11b11027080 x1649322770939696/t0(0) o104->fir-MDT0000@10.8.27.1@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1573000112 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Nov 05 16:28:32 fir-md1-s1 kernel: Lustre: 41427:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 3 previous similar messages Nov 05 16:28:39 fir-md1-s1 kernel: LNet: 19760:0:(o2iblnd_cb.c:413:kiblnd_handle_rx()) PUT_NACK from 10.0.10.203@o2ib7 Nov 05 16:28:40 fir-md1-s1 kernel: LustreError: 42001:0:(ldlm_lib.c:3256:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffffa118c4e40850 x1648658799343200/t0(0) o4->ee8a8d10-65c2-ae96-bc67-9f6bae32e110@10.8.18.18@o2ib6:679/0 lens 488/448 e 1 to 0 dl 1573000144 ref 1 fl Interpret:/2/0 rc 0/0 Nov 05 16:28:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO write error with ee8a8d10-65c2-ae96-bc67-9f6bae32e110 (at 10.8.18.18@o2ib6), client will retry: rc = -110 Nov 05 16:28:42 fir-md1-s1 kernel: LNet: 19759:0:(o2iblnd_cb.c:413:kiblnd_handle_rx()) PUT_NACK from 10.0.10.203@o2ib7 Nov 05 16:28:44 fir-md1-s1 kernel: LNet: 19758:0:(o2iblnd_cb.c:413:kiblnd_handle_rx()) PUT_NACK from 10.0.10.203@o2ib7 Nov 05 16:28:50 fir-md1-s1 kernel: Lustre: 41265:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1573000123/real 1573000123] req@ffffa12e362b5a00 x1649322771398192/t0(0) o104->fir-MDT0000@10.8.0.82@o2ib6:15/16 lens 296/224 e 0 to 1 dl 1573000130 ref 1 fl Rpc:X/2/ffffffff rc -11/-1 Nov 05 16:28:50 fir-md1-s1 kernel: Lustre: 41265:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Nov 05 16:29:05 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.24.5@o2ib6) Nov 05 16:29:05 fir-md1-s1 kernel: Lustre: Skipped 384 previous similar messages Nov 05 16:29:05 fir-md1-s1 kernel: LustreError: 25035:0:(ldlm_lib.c:3256:target_bulk_io()) @@@ Reconnect on bulk READ req@ffffa13e299bc050 x1649402991655360/t0(0) o256->88a01b47-775e-e868-16e1-6d2ad521f00e@10.8.31.10@o2ib6:703/0 lens 304/240 e 2 to 0 dl 1573000168 ref 1 fl Interpret:/0/0 rc 0/0 Nov 05 16:29:09 fir-md1-s1 kernel: LNet: 19757:0:(o2iblnd_cb.c:413:kiblnd_handle_rx()) PUT_NACK from 10.0.10.203@o2ib7 Nov 05 16:29:09 fir-md1-s1 kernel: LNet: 19757:0:(o2iblnd_cb.c:413:kiblnd_handle_rx()) Skipped 1 previous similar message Nov 05 16:29:17 fir-md1-s1 kernel: LustreError: 22257:0:(ldlm_lib.c:3256:target_bulk_io()) @@@ Reconnect on bulk READ req@ffffa13e362ef050 x1648659624839536/t0(0) o256->dc60d28a-3d41-93f3-b87b-309e888fbc0c@10.8.18.26@o2ib6:715/0 lens 304/240 e 1 to 0 dl 1573000180 ref 1 fl Interpret:/0/0 rc 0/0 Nov 05 16:29:17 fir-md1-s1 kernel: LNet: 19757:0:(o2iblnd_cb.c:413:kiblnd_handle_rx()) PUT_NACK from 10.0.10.203@o2ib7 Nov 05 16:29:17 fir-md1-s1 kernel: LNet: 19757:0:(o2iblnd_cb.c:413:kiblnd_handle_rx()) Skipped 1 previous similar message Nov 05 16:29:26 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.21.32@o2ib6, removing former export from same NID Nov 05 16:29:26 fir-md1-s1 kernel: Lustre: Skipped 264 previous similar messages Nov 05 16:29:58 fir-md1-s1 kernel: LustreError: 25050:0:(ldlm_lib.c:3256:target_bulk_io()) @@@ Reconnect on bulk READ req@ffffa11e3045a050 x1648330586189808/t0(0) o256->cb8f7a50-86cc-6c1a-eafe-baacf1a8212b@10.8.26.34@o2ib6:748/0 lens 304/240 e 1 to 0 dl 1573000213 ref 1 fl Interpret:/0/0 rc 0/0 Nov 05 16:29:58 fir-md1-s1 kernel: LustreError: 25050:0:(ldlm_lib.c:3256:target_bulk_io()) Skipped 2 previous similar messages Nov 05 16:29:59 fir-md1-s1 kernel: LustreError: 41996:0:(sec.c:2485:sptlrpc_svc_unwrap_bulk()) @@@ truncated bulk GET 0(131072) req@ffffa10ee9970850 x1648658799343200/t0(0) o4->ee8a8d10-65c2-ae96-bc67-9f6bae32e110@10.8.18.18@o2ib6:743/0 lens 488/448 e 2 to 0 dl 1573000208 ref 1 fl Interpret:/2/0 rc 0/0 Nov 05 16:29:59 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO write error with ee8a8d10-65c2-ae96-bc67-9f6bae32e110 (at 10.8.18.18@o2ib6), client will retry: rc = -110 Nov 05 16:30:35 fir-md1-s1 kernel: Lustre: 41050:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1573000228/real 1573000228] req@ffffa13410bff500 x1649322775870656/t0(0) o104->fir-MDT0000@10.9.104.71@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1573000235 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Nov 05 16:30:35 fir-md1-s1 kernel: Lustre: 41050:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 1 previous similar message Nov 05 16:30:36 fir-md1-s1 kernel: LustreError: 41570:0:(ldlm_lib.c:3256:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffffa12e3612d050 x1649296152989744/t0(0) o4->71ddb392-3190-c4dc-8641-8394d6133acc@10.9.104.23@o2ib4:115/0 lens 488/448 e 0 to 0 dl 1573000335 ref 1 fl Interpret:/0/0 rc 0/0 Nov 05 16:30:36 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO write error with 71ddb392-3190-c4dc-8641-8394d6133acc (at 10.9.104.23@o2ib4), client will retry: rc = -110 Nov 05 16:30:37 fir-md1-s1 kernel: LNet: 19750:0:(o2iblnd_cb.c:413:kiblnd_handle_rx()) PUT_NACK from 10.0.10.212@o2ib7 Nov 05 16:30:37 fir-md1-s1 kernel: LNet: 19750:0:(o2iblnd_cb.c:413:kiblnd_handle_rx()) Skipped 1 previous similar message Nov 05 16:30:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with ef7799b5-3274-30a3-b849-8ee207a32daa (at 10.9.115.11@o2ib4), client will retry: rc -110 Nov 05 16:30:44 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO write error with 71ddb392-3190-c4dc-8641-8394d6133acc (at 10.9.104.23@o2ib4), client will retry: rc = -110 Nov 05 16:30:44 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Nov 05 16:30:53 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 67fb2ec4-7a5a-f103-4386-bc08c967f193 (at 10.9.107.9@o2ib4) reconnecting Nov 05 16:30:53 fir-md1-s1 kernel: Lustre: Skipped 483 previous similar messages Nov 05 16:31:34 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.21.2@o2ib6, removing former export from same NID Nov 05 16:31:34 fir-md1-s1 kernel: Lustre: Skipped 301 previous similar messages Nov 05 16:31:38 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 07312e22-36ea-cbe1-f5a7-b2f2d00651b0 (at 10.8.20.22@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffffa1229937c000, cur 1573000298 expire 1573000148 last 1573000071 Nov 05 16:31:38 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 16:31:39 fir-md1-s1 kernel: LustreError: 42001:0:(ldlm_lib.c:3271:target_bulk_io()) @@@ truncated bulk READ 0(32768) req@ffffa112776c0850 x1649389691262768/t0(0) o3->ef7799b5-3274-30a3-b849-8ee207a32daa@10.9.115.11@o2ib4:98/0 lens 488/440 e 1 to 0 dl 1573000318 ref 1 fl Interpret:/0/0 rc 0/0 Nov 05 16:31:39 fir-md1-s1 kernel: LustreError: 41625:0:(sec.c:2485:sptlrpc_svc_unwrap_bulk()) @@@ truncated bulk GET 0(16406) req@ffffa13765a5c050 x1649295066581344/t0(0) o4->37bab6df-e097-67db-8c07-f9a0551b2beb@10.9.109.61@o2ib4:103/0 lens 488/448 e 2 to 0 dl 1573000323 ref 1 fl Interpret:/0/0 rc 0/0 Nov 05 16:31:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO write error with 37bab6df-e097-67db-8c07-f9a0551b2beb (at 10.9.109.61@o2ib4), client will retry: rc = -110 Nov 05 16:31:39 fir-md1-s1 kernel: Lustre: Skipped 12 previous similar messages Nov 05 16:31:39 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with ef7799b5-3274-30a3-b849-8ee207a32daa (at 10.9.115.11@o2ib4), client will retry: rc -110 Nov 05 16:31:39 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 16:32:04 fir-md1-s1 kernel: LustreError: 41582:0:(sec.c:2485:sptlrpc_svc_unwrap_bulk()) @@@ truncated bulk GET 0(32829) req@ffffa12d54782050 x1648834534079376/t0(0) o4->d3880b5c-b72c-0e9f-b18a-d6299f066ebd@10.9.107.13@o2ib4:139/0 lens 488/448 e 0 to 0 dl 1573000359 ref 1 fl Interpret:/2/0 rc 0/0 Nov 05 16:32:04 fir-md1-s1 kernel: LustreError: 41582:0:(sec.c:2485:sptlrpc_svc_unwrap_bulk()) Skipped 5 previous similar messages Nov 05 16:32:06 fir-md1-s1 kernel: Lustre: 41452:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1573000318/real 1573000318] req@ffffa13411223600 x1649322775988864/t0(0) o104->fir-MDT0000@10.9.109.61@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1573000325 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Nov 05 16:32:06 fir-md1-s1 kernel: Lustre: 41452:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 10 previous similar messages Nov 05 16:32:07 fir-md1-s1 kernel: LustreError: 41596:0:(ldlm_lib.c:3256:target_bulk_io()) @@@ Reconnect on bulk WRITE req@ffffa13ce1af9850 x1649295066581680/t0(0) o4->37bab6df-e097-67db-8c07-f9a0551b2beb@10.9.109.61@o2ib4:141/0 lens 488/448 e 0 to 0 dl 1573000361 ref 1 fl Interpret:/2/0 rc 0/0 Nov 05 16:32:07 fir-md1-s1 kernel: LustreError: 41596:0:(ldlm_lib.c:3256:target_bulk_io()) Skipped 27 previous similar messages Nov 05 16:32:10 fir-md1-s1 kernel: LNet: 19752:0:(o2iblnd_cb.c:413:kiblnd_handle_rx()) PUT_NACK from 10.0.10.212@o2ib7 Nov 05 16:32:21 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO write error with 71ddb392-3190-c4dc-8641-8394d6133acc (at 10.9.104.23@o2ib4), client will retry: rc = -110 Nov 05 16:32:21 fir-md1-s1 kernel: Lustre: Skipped 11 previous similar messages Nov 05 16:32:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client d8a07f77-ab6a-3cfc-cae0-aecee82b5ebd (at 10.9.108.41@o2ib4) in 170 seconds. I think it's dead, and I am evicting it. exp ffffa12d332da000, cur 1573000374 expire 1573000224 last 1573000204 Nov 05 16:32:54 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 16:33:19 fir-md1-s1 kernel: LustreError: 41601:0:(ldlm_lib.c:3271:target_bulk_io()) @@@ truncated bulk READ 0(32768) req@ffffa118c4e46050 x1649389691262256/t0(0) o3->ef7799b5-3274-30a3-b849-8ee207a32daa@10.9.115.11@o2ib4:191/0 lens 488/440 e 1 to 0 dl 1573000411 ref 1 fl Interpret:/2/0 rc 0/0 Nov 05 16:33:19 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO read error with ef7799b5-3274-30a3-b849-8ee207a32daa (at 10.9.115.11@o2ib4), client will retry: rc -110 Nov 05 16:33:46 fir-md1-s1 kernel: LustreError: 21591:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 99s: evicting client at 10.9.107.12@o2ib4 ns: mdt-fir-MDT0000_UUID lock: ffffa11497567740/0x6756830164b0b85 lrc: 3/0,0 mode: PR/PR res: [0x200035664:0x68e2:0x0].0x0 bits 0x13/0x0 rrc: 8 type: IBT flags: 0x60200400000020 nid: 10.9.107.12@o2ib4 remote: 0xb47a9c5113366023 expref: 8641 pid: 41427 timeout: 89703 lvb_type: 0 Nov 05 16:33:49 fir-md1-s1 kernel: LustreError: 21591:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 99s: evicting client at 10.8.18.18@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffffa10cf0eb3600/0x675683016c804a3 lrc: 3/0,0 mode: PW/PW res: [0x200034e88:0x4c1:0x0].0x0 bits 0x40/0x0 rrc: 79 type: IBT flags: 0x60200400000020 nid: 10.8.18.18@o2ib6 remote: 0x6a6cc0363e89d325 expref: 566 pid: 41241 timeout: 89706 lvb_type: 0 Nov 05 16:33:49 fir-md1-s1 kernel: LustreError: 21591:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 1 previous similar message Nov 05 16:33:59 fir-md1-s1 kernel: LustreError: 41581:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0000: cli 37bab6df-e097-67db-8c07-f9a0551b2beb claims 32768 GRANT, real grant 20480 Nov 05 16:34:01 fir-md1-s1 kernel: LustreError: 41996:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0000: cli 37bab6df-e097-67db-8c07-f9a0551b2beb claims 36864 GRANT, real grant 0 Nov 05 16:34:01 fir-md1-s1 kernel: LustreError: 41996:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 101 previous similar messages Nov 05 16:34:02 fir-md1-s1 kernel: LustreError: 41561:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0000: cli 37bab6df-e097-67db-8c07-f9a0551b2beb claims 45056 GRANT, real grant 0 Nov 05 16:34:04 fir-md1-s1 kernel: LustreError: 41582:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0000: cli 37bab6df-e097-67db-8c07-f9a0551b2beb claims 45056 GRANT, real grant 0 Nov 05 16:34:04 fir-md1-s1 kernel: LustreError: 41582:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 3 previous similar messages Nov 05 16:34:11 fir-md1-s1 kernel: LustreError: 41762:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0000: cli 37bab6df-e097-67db-8c07-f9a0551b2beb claims 28672 GRANT, real grant 0 Nov 05 16:34:11 fir-md1-s1 kernel: LustreError: 41762:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 190 previous similar messages Nov 05 16:34:20 fir-md1-s1 kernel: LustreError: 41661:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0000: cli 37bab6df-e097-67db-8c07-f9a0551b2beb claims 45056 GRANT, real grant 0 Nov 05 16:34:20 fir-md1-s1 kernel: LustreError: 41661:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 334 previous similar messages Nov 05 16:34:37 fir-md1-s1 kernel: LustreError: 41625:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0000: cli 30c13ed2-9906-06c4-2c41-802cab1c4632 claims 32768 GRANT, real grant 0 Nov 05 16:34:37 fir-md1-s1 kernel: LustreError: 41625:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 433 previous similar messages Nov 05 16:35:10 fir-md1-s1 kernel: LustreError: 41587:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0000: cli 37bab6df-e097-67db-8c07-f9a0551b2beb claims 69632 GRANT, real grant 0 Nov 05 16:35:10 fir-md1-s1 kernel: LustreError: 41587:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 851 previous similar messages Nov 05 16:35:29 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 16:35:29 fir-md1-s1 kernel: LustreError: Skipped 1158 previous similar messages Nov 05 16:36:17 fir-md1-s1 kernel: LustreError: 41564:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0000: cli 30c13ed2-9906-06c4-2c41-802cab1c4632 claims 28672 GRANT, real grant 0 Nov 05 16:36:17 fir-md1-s1 kernel: LustreError: 41564:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 206 previous similar messages Nov 05 16:38:22 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.9.108.1@o2ib4, removing former export from same NID Nov 05 16:38:22 fir-md1-s1 kernel: Lustre: Skipped 766 previous similar messages Nov 05 16:38:22 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.108.1@o2ib4) Nov 05 16:38:22 fir-md1-s1 kernel: Lustre: Skipped 2426 previous similar messages Nov 05 16:38:29 fir-md1-s1 kernel: LustreError: 41582:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0000: cli 37bab6df-e097-67db-8c07-f9a0551b2beb claims 32768 GRANT, real grant 0 Nov 05 16:38:29 fir-md1-s1 kernel: LustreError: 41582:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 715 previous similar messages Nov 05 16:39:24 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client c5f752ad-463e-c0fc-bbb6-5b29206ddbd4 (at 10.9.109.53@o2ib4) reconnecting Nov 05 16:39:24 fir-md1-s1 kernel: Lustre: Skipped 982 previous similar messages Nov 05 16:42:51 fir-md1-s1 kernel: LustreError: 41637:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0000: cli 37bab6df-e097-67db-8c07-f9a0551b2beb claims 28672 GRANT, real grant 0 Nov 05 16:42:51 fir-md1-s1 kernel: LustreError: 41637:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 1236 previous similar messages Nov 05 16:45:31 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 16:45:31 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 16:51:55 fir-md1-s1 kernel: LustreError: 41637:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0000: cli 30c13ed2-9906-06c4-2c41-802cab1c4632 claims 36864 GRANT, real grant 0 Nov 05 16:51:55 fir-md1-s1 kernel: LustreError: 41637:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 2884 previous similar messages Nov 05 16:53:13 fir-md1-s1 kernel: LNetError: 19748:0:(o2iblnd_cb.c:3350:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds Nov 05 16:53:13 fir-md1-s1 kernel: LNetError: 19748:0:(o2iblnd_cb.c:3425:kiblnd_check_conns()) Timed out RDMA with 10.0.10.212@o2ib7 (38): c: 8, oc: 0, rc: 8 Nov 05 16:53:29 fir-md1-s1 kernel: LNetError: 19748:0:(o2iblnd_cb.c:3350:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds Nov 05 16:53:29 fir-md1-s1 kernel: LNetError: 19748:0:(o2iblnd_cb.c:3425:kiblnd_check_conns()) Timed out RDMA with 10.0.10.203@o2ib7 (105): c: 7, oc: 0, rc: 8 Nov 05 16:55:28 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.9.114.13@o2ib4, removing former export from same NID Nov 05 16:55:28 fir-md1-s1 kernel: Lustre: Skipped 4 previous similar messages Nov 05 16:55:28 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.114.13@o2ib4) Nov 05 16:55:28 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Nov 05 16:55:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 16:55:33 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 16:57:47 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client f5b732ff-4959-4283-d29c-fcd8fac11c91 (at 10.9.113.1@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffffa124a92bc400, cur 1573001867 expire 1573001717 last 1573001640 Nov 05 16:57:47 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 17:01:56 fir-md1-s1 kernel: LustreError: 41561:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0000: cli 30c13ed2-9906-06c4-2c41-802cab1c4632 claims 36864 GRANT, real grant 0 Nov 05 17:01:56 fir-md1-s1 kernel: LustreError: 41561:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 1298 previous similar messages Nov 05 17:05:31 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.106.27@o2ib4) Nov 05 17:05:31 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 17:05:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 17:05:35 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 17:11:57 fir-md1-s1 kernel: LustreError: 41773:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0000: cli 30c13ed2-9906-06c4-2c41-802cab1c4632 claims 28672 GRANT, real grant 0 Nov 05 17:11:57 fir-md1-s1 kernel: LustreError: 41773:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 5032 previous similar messages Nov 05 17:15:37 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 17:15:37 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 17:16:48 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.101.43@o2ib4) Nov 05 17:16:48 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 17:21:59 fir-md1-s1 kernel: LustreError: 41593:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0000: cli 30c13ed2-9906-06c4-2c41-802cab1c4632 claims 28672 GRANT, real grant 0 Nov 05 17:21:59 fir-md1-s1 kernel: LustreError: 41593:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 2197 previous similar messages Nov 05 17:25:39 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 17:25:39 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 17:27:07 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.9.103.70@o2ib4, removing former export from same NID Nov 05 17:27:07 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 17:27:07 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.103.70@o2ib4) Nov 05 17:27:07 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 17:32:03 fir-md1-s1 kernel: LustreError: 41667:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0000: cli 37bab6df-e097-67db-8c07-f9a0551b2beb claims 28672 GRANT, real grant 0 Nov 05 17:32:03 fir-md1-s1 kernel: LustreError: 41667:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 920 previous similar messages Nov 05 17:35:41 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 17:35:41 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 17:42:04 fir-md1-s1 kernel: LustreError: 41731:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0000: cli 37bab6df-e097-67db-8c07-f9a0551b2beb claims 36864 GRANT, real grant 0 Nov 05 17:42:04 fir-md1-s1 kernel: LustreError: 41731:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 2166 previous similar messages Nov 05 17:45:43 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 17:45:43 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 17:51:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 24e9e2b8-c701-7886-1991-a2238348e3e1 (at 10.9.108.11@o2ib4) reconnecting Nov 05 17:51:48 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to (at 10.9.108.11@o2ib4) Nov 05 17:52:04 fir-md1-s1 kernel: LustreError: 41584:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0000: cli 30c13ed2-9906-06c4-2c41-802cab1c4632 claims 28672 GRANT, real grant 0 Nov 05 17:52:04 fir-md1-s1 kernel: LustreError: 41584:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 5566 previous similar messages Nov 05 17:55:45 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 17:55:45 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 17:59:00 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client a6d8bbe8-f1aa-21ab-d121-4c8801f56577 (at 10.8.27.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffffa13dc6c3c800, cur 1573005540 expire 1573005390 last 1573005313 Nov 05 17:59:00 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 17:59:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client a3c2090f-0eca-cd39-65df-d4c926cfe4e9 (at 10.8.27.24@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffffa12dd9d03000, cur 1573005554 expire 1573005404 last 1573005327 Nov 05 18:00:26 fir-md1-s1 kernel: LNetError: 19762:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Nov 05 18:01:45 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.106.24@o2ib4) Nov 05 18:02:05 fir-md1-s1 kernel: LustreError: 41652:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0000: cli 37bab6df-e097-67db-8c07-f9a0551b2beb claims 36864 GRANT, real grant 0 Nov 05 18:02:05 fir-md1-s1 kernel: LustreError: 41652:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 2982 previous similar messages Nov 05 18:02:37 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.9.104.72@o2ib4, removing former export from same NID Nov 05 18:04:59 fir-md1-s1 kernel: LNetError: 19763:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Nov 05 18:05:35 fir-md1-s1 kernel: LNetError: 19756:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Nov 05 18:05:48 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 18:05:48 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 18:10:48 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.106.16@o2ib4) Nov 05 18:10:48 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Nov 05 18:12:06 fir-md1-s1 kernel: LustreError: 41560:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0000: cli 30c13ed2-9906-06c4-2c41-802cab1c4632 claims 28672 GRANT, real grant 0 Nov 05 18:12:06 fir-md1-s1 kernel: LustreError: 41560:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 3938 previous similar messages Nov 05 18:15:50 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 18:15:50 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 18:21:53 fir-md1-s1 kernel: LNetError: 19757:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Nov 05 18:22:07 fir-md1-s1 kernel: LustreError: 41799:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0000: cli 30c13ed2-9906-06c4-2c41-802cab1c4632 claims 28672 GRANT, real grant 0 Nov 05 18:22:07 fir-md1-s1 kernel: LustreError: 41799:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 1508 previous similar messages Nov 05 18:24:59 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.27.24@o2ib6) Nov 05 18:24:59 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 18:25:52 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 18:25:52 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 18:30:11 fir-md1-s1 kernel: LNetError: 19757:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Nov 05 18:32:08 fir-md1-s1 kernel: LustreError: 41562:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0000: cli 30c13ed2-9906-06c4-2c41-802cab1c4632 claims 28672 GRANT, real grant 0 Nov 05 18:32:08 fir-md1-s1 kernel: LustreError: 41562:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 2749 previous similar messages Nov 05 18:33:50 fir-md1-s1 kernel: LNet: Service thread pid 41058 was inactive for 200.29s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Nov 05 18:33:50 fir-md1-s1 kernel: LNet: Skipped 4 previous similar messages Nov 05 18:33:50 fir-md1-s1 kernel: Pid: 41058, comm: mdt00_012 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 SMP Mon Aug 5 15:28:37 PDT 2019 Nov 05 18:33:50 fir-md1-s1 kernel: Call Trace: Nov 05 18:33:50 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x480/0x790 [ptlrpc] Nov 05 18:33:50 fir-md1-s1 kernel: [] ptlrpc_queue_wait+0x83/0x230 [ptlrpc] Nov 05 18:33:50 fir-md1-s1 kernel: [] osp_remote_sync+0xd3/0x200 [osp] Nov 05 18:33:50 fir-md1-s1 kernel: [] osp_attr_get+0x463/0x730 [osp] Nov 05 18:33:50 fir-md1-s1 kernel: [] osp_object_init+0x16d/0x2d0 [osp] Nov 05 18:33:50 fir-md1-s1 kernel: [] lu_object_start.isra.35+0x8b/0x120 [obdclass] Nov 05 18:33:50 fir-md1-s1 kernel: [] lu_object_find_at+0x1e1/0xa60 [obdclass] Nov 05 18:33:50 fir-md1-s1 kernel: [] lu_object_find_slice+0x1f/0x90 [obdclass] Nov 05 18:33:50 fir-md1-s1 kernel: [] mdd_object_find+0x10/0x70 [mdd] Nov 05 18:33:50 fir-md1-s1 kernel: [] obf_lookup+0x2c9/0x350 [mdd] Nov 05 18:33:50 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0xf7c/0x1c30 [mdt] Nov 05 18:33:50 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Nov 05 18:33:50 fir-md1-s1 kernel: [] mdt_intent_policy+0x435/0xd80 [mdt] Nov 05 18:33:50 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x356/0xa20 [ptlrpc] Nov 05 18:33:50 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa56/0x15f0 [ptlrpc] Nov 05 18:33:50 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Nov 05 18:33:50 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Nov 05 18:33:50 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 18:33:50 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 18:33:50 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 18:33:50 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 18:33:50 fir-md1-s1 kernel: [] 0xffffffffffffffff Nov 05 18:33:50 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1573007630.41058 Nov 05 18:35:54 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 18:35:54 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 18:39:34 fir-md1-s1 kernel: LNetError: 19757:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Nov 05 18:40:24 fir-md1-s1 kernel: Lustre: 41202:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffffa10ae76a1b00 x1649319045476368/t0(0) o101->fddec74f-f1aa-cb90-958b-39ca20c60eb9@10.0.10.3@o2ib7:259/0 lens 592/3264 e 24 to 0 dl 1573008029 ref 2 fl Interpret:/0/0 rc 0/0 Nov 05 18:40:24 fir-md1-s1 kernel: Lustre: 41202:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 3 previous similar messages Nov 05 18:40:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client fddec74f-f1aa-cb90-958b-39ca20c60eb9 (at 10.0.10.3@o2ib7) reconnecting Nov 05 18:40:30 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to (at 10.0.10.3@o2ib7) Nov 05 18:40:30 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 18:42:08 fir-md1-s1 kernel: LustreError: 41564:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0000: cli 30c13ed2-9906-06c4-2c41-802cab1c4632 claims 28672 GRANT, real grant 0 Nov 05 18:42:08 fir-md1-s1 kernel: LustreError: 41564:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 3220 previous similar messages Nov 05 18:44:01 fir-md1-s1 kernel: LNetError: 19754:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Nov 05 18:45:56 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 18:45:56 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 18:49:46 fir-md1-s1 kernel: LNetError: 19755:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Nov 05 18:50:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client fddec74f-f1aa-cb90-958b-39ca20c60eb9 (at 10.0.10.3@o2ib7) reconnecting Nov 05 18:50:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to (at 10.0.10.3@o2ib7) Nov 05 18:52:10 fir-md1-s1 kernel: LustreError: 41667:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0000: cli 30c13ed2-9906-06c4-2c41-802cab1c4632 claims 28672 GRANT, real grant 0 Nov 05 18:52:10 fir-md1-s1 kernel: LustreError: 41667:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 621 previous similar messages Nov 05 18:52:54 fir-md1-s1 kernel: perf: interrupt took too long (3914 > 3912), lowering kernel.perf_event_max_sample_rate to 51000 Nov 05 18:55:43 fir-md1-s1 kernel: LNetError: 19760:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Nov 05 18:55:58 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 18:55:58 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 18:59:33 fir-md1-s1 kernel: LNetError: 19759:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Nov 05 19:00:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client fddec74f-f1aa-cb90-958b-39ca20c60eb9 (at 10.0.10.3@o2ib7) reconnecting Nov 05 19:00:32 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to (at 10.0.10.3@o2ib7) Nov 05 19:02:13 fir-md1-s1 kernel: LustreError: 41660:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0000: cli 37bab6df-e097-67db-8c07-f9a0551b2beb claims 32768 GRANT, real grant 0 Nov 05 19:02:13 fir-md1-s1 kernel: LustreError: 41660:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 3330 previous similar messages Nov 05 19:04:13 fir-md1-s1 kernel: LNetError: 19756:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Nov 05 19:06:00 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 19:06:00 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 19:10:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client fddec74f-f1aa-cb90-958b-39ca20c60eb9 (at 10.0.10.3@o2ib7) reconnecting Nov 05 19:10:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to (at 10.0.10.3@o2ib7) Nov 05 19:12:18 fir-md1-s1 kernel: LustreError: 41601:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0000: cli 37bab6df-e097-67db-8c07-f9a0551b2beb claims 28672 GRANT, real grant 0 Nov 05 19:12:18 fir-md1-s1 kernel: LustreError: 41601:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 1808 previous similar messages Nov 05 19:16:02 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 19:16:02 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 19:16:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 8088fb82-69d6-5f55-6a4c-3369f0c19cb6 (at 10.9.113.2@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffffa1392a2dc000, cur 1573010167 expire 1573010017 last 1573009940 Nov 05 19:20:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client fddec74f-f1aa-cb90-958b-39ca20c60eb9 (at 10.0.10.3@o2ib7) reconnecting Nov 05 19:20:35 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to (at 10.0.10.3@o2ib7) Nov 05 19:21:15 fir-md1-s1 kernel: LNetError: 19753:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Nov 05 19:22:20 fir-md1-s1 kernel: LustreError: 42003:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0000: cli 30c13ed2-9906-06c4-2c41-802cab1c4632 claims 28672 GRANT, real grant 0 Nov 05 19:22:20 fir-md1-s1 kernel: LustreError: 42003:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 1816 previous similar messages Nov 05 19:26:04 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 19:26:04 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 19:28:25 fir-md1-s1 kernel: LNetError: 19759:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Nov 05 19:29:41 fir-md1-s1 kernel: LNet: Service thread pid 41058 completed after 3552.15s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Nov 05 19:29:41 fir-md1-s1 kernel: LNet: Skipped 422 previous similar messages Nov 05 19:32:24 fir-md1-s1 kernel: LustreError: 40680:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0000: cli 30c13ed2-9906-06c4-2c41-802cab1c4632 claims 155648 GRANT, real grant 0 Nov 05 19:32:24 fir-md1-s1 kernel: LustreError: 40680:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 848 previous similar messages Nov 05 19:36:07 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 19:36:07 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 19:42:28 fir-md1-s1 kernel: LustreError: 41247:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0000: cli 37bab6df-e097-67db-8c07-f9a0551b2beb claims 28672 GRANT, real grant 0 Nov 05 19:42:28 fir-md1-s1 kernel: LustreError: 41247:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 1739 previous similar messages Nov 05 19:46:09 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 19:46:09 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 19:52:31 fir-md1-s1 kernel: LustreError: 41804:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0000: cli 30c13ed2-9906-06c4-2c41-802cab1c4632 claims 36864 GRANT, real grant 0 Nov 05 19:52:31 fir-md1-s1 kernel: LustreError: 41804:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 2198 previous similar messages Nov 05 19:56:11 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 19:56:11 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 20:02:35 fir-md1-s1 kernel: LustreError: 41582:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0000: cli 37bab6df-e097-67db-8c07-f9a0551b2beb claims 28672 GRANT, real grant 0 Nov 05 20:02:35 fir-md1-s1 kernel: LustreError: 41582:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 533 previous similar messages Nov 05 20:04:22 fir-md1-s1 kernel: LNetError: 19760:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Nov 05 20:04:22 fir-md1-s1 kernel: LNetError: 19760:0:(lib-msg.c:822:lnet_is_health_check()) Skipped 1 previous similar message Nov 05 20:06:13 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 20:06:13 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 20:07:00 fir-md1-s1 kernel: LNetError: 19759:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Nov 05 20:08:46 fir-md1-s1 kernel: LNet: Service thread pid 41202 was inactive for 200.31s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Nov 05 20:08:46 fir-md1-s1 kernel: Pid: 41202, comm: mdt00_044 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 SMP Mon Aug 5 15:28:37 PDT 2019 Nov 05 20:08:46 fir-md1-s1 kernel: Call Trace: Nov 05 20:08:46 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x480/0x790 [ptlrpc] Nov 05 20:08:46 fir-md1-s1 kernel: [] ptlrpc_queue_wait+0x83/0x230 [ptlrpc] Nov 05 20:08:46 fir-md1-s1 kernel: [] osp_remote_sync+0xd3/0x200 [osp] Nov 05 20:08:46 fir-md1-s1 kernel: [] osp_attr_get+0x463/0x730 [osp] Nov 05 20:08:46 fir-md1-s1 kernel: [] osp_object_init+0x16d/0x2d0 [osp] Nov 05 20:08:46 fir-md1-s1 kernel: [] lu_object_start.isra.35+0x8b/0x120 [obdclass] Nov 05 20:08:46 fir-md1-s1 kernel: [] lu_object_find_at+0x1e1/0xa60 [obdclass] Nov 05 20:08:46 fir-md1-s1 kernel: [] lu_object_find_slice+0x1f/0x90 [obdclass] Nov 05 20:08:46 fir-md1-s1 kernel: [] mdd_object_find+0x10/0x70 [mdd] Nov 05 20:08:46 fir-md1-s1 kernel: [] obf_lookup+0x2c9/0x350 [mdd] Nov 05 20:08:46 fir-md1-s1 kernel: [] mdt_getattr_name_lock+0xf7c/0x1c30 [mdt] Nov 05 20:08:46 fir-md1-s1 kernel: [] mdt_intent_getattr+0x2b5/0x480 [mdt] Nov 05 20:08:46 fir-md1-s1 kernel: [] mdt_intent_policy+0x435/0xd80 [mdt] Nov 05 20:08:46 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x356/0xa20 [ptlrpc] Nov 05 20:08:46 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa56/0x15f0 [ptlrpc] Nov 05 20:08:46 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Nov 05 20:08:46 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Nov 05 20:08:46 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:08:46 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:08:46 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:08:46 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:08:46 fir-md1-s1 kernel: [] 0xffffffffffffffff Nov 05 20:08:46 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1573013326.41202 Nov 05 20:12:36 fir-md1-s1 kernel: LustreError: 41757:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0000: cli 37bab6df-e097-67db-8c07-f9a0551b2beb claims 28672 GRANT, real grant 0 Nov 05 20:12:36 fir-md1-s1 kernel: LustreError: 41757:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 1597 previous similar messages Nov 05 20:15:21 fir-md1-s1 kernel: Lustre: 41365:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffffa106a93bbf00 x1649319221746896/t0(0) o101->fddec74f-f1aa-cb90-958b-39ca20c60eb9@10.0.10.3@o2ib7:671/0 lens 592/3264 e 24 to 0 dl 1573013726 ref 2 fl Interpret:/0/0 rc 0/0 Nov 05 20:15:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client fddec74f-f1aa-cb90-958b-39ca20c60eb9 (at 10.0.10.3@o2ib7) reconnecting Nov 05 20:15:27 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to (at 10.0.10.3@o2ib7) Nov 05 20:16:15 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 20:16:15 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 20:22:34 fir-md1-s1 kernel: LNetError: 19763:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Nov 05 20:22:38 fir-md1-s1 kernel: LustreError: 41634:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0000: cli 30c13ed2-9906-06c4-2c41-802cab1c4632 claims 45056 GRANT, real grant 0 Nov 05 20:22:38 fir-md1-s1 kernel: LustreError: 41634:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 2168 previous similar messages Nov 05 20:25:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client fddec74f-f1aa-cb90-958b-39ca20c60eb9 (at 10.0.10.3@o2ib7) reconnecting Nov 05 20:25:28 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to (at 10.0.10.3@o2ib7) Nov 05 20:26:17 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 20:26:17 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 20:32:39 fir-md1-s1 kernel: LustreError: 41760:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0000: cli 30c13ed2-9906-06c4-2c41-802cab1c4632 claims 28672 GRANT, real grant 0 Nov 05 20:32:39 fir-md1-s1 kernel: LustreError: 41760:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 1127 previous similar messages Nov 05 20:35:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client fddec74f-f1aa-cb90-958b-39ca20c60eb9 (at 10.0.10.3@o2ib7) reconnecting Nov 05 20:35:29 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to (at 10.0.10.3@o2ib7) Nov 05 20:36:19 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 20:36:19 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 20:42:41 fir-md1-s1 kernel: LustreError: 41741:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0000: cli 37bab6df-e097-67db-8c07-f9a0551b2beb claims 28672 GRANT, real grant 0 Nov 05 20:42:41 fir-md1-s1 kernel: LustreError: 41741:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 4123 previous similar messages Nov 05 20:45:08 fir-md1-s1 kernel: LNet: Service thread pid 41202 completed after 2382.07s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Nov 05 20:45:16 fir-md1-s1 kernel: LNetError: 19762:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (0, 5) Nov 05 20:46:21 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 20:46:21 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 20:52:44 fir-md1-s1 kernel: LustreError: 41600:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0000: cli 30c13ed2-9906-06c4-2c41-802cab1c4632 claims 28672 GRANT, real grant 0 Nov 05 20:52:44 fir-md1-s1 kernel: LustreError: 41600:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 1343 previous similar messages Nov 05 20:54:16 fir-md1-s1 kernel: LNet: 19748:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 0 seconds Nov 05 20:56:23 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 20:56:23 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 20:58:17 fir-md1-s1 kernel: list passed to list_sort() too long for efficiency Nov 05 20:58:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 4737d7cc-3e1f-a8cc-964f-c8d597fce061 (at 10.8.27.25@o2ib6) reconnecting Nov 05 20:58:17 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to (at 10.8.27.25@o2ib6) Nov 05 20:58:17 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [mdt_io00_052:41794] Nov 05 20:58:17 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [mdt01_055:41234] Nov 05 20:58:17 fir-md1-s1 kernel: Modules linked in: Nov 05 20:58:17 fir-md1-s1 kernel: Modules linked in: Nov 05 20:58:17 fir-md1-s1 kernel: lustre(OE) Nov 05 20:58:17 fir-md1-s1 kernel: mdc(OE) Nov 05 20:58:17 fir-md1-s1 kernel: mgs(OE) Nov 05 20:58:17 fir-md1-s1 kernel: osp(OE) Nov 05 20:58:17 fir-md1-s1 kernel: mdd(OE) Nov 05 20:58:17 fir-md1-s1 kernel: lod(OE) Nov 05 20:58:17 fir-md1-s1 kernel: mdt(OE) Nov 05 20:58:17 fir-md1-s1 kernel: lfsck(OE) Nov 05 20:58:17 fir-md1-s1 kernel: mgc(OE) Nov 05 20:58:17 fir-md1-s1 kernel: osd_ldiskfs(OE) Nov 05 20:58:17 fir-md1-s1 kernel: lquota(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ldiskfs(OE) Nov 05 20:58:17 fir-md1-s1 kernel: lmv(OE) Nov 05 20:58:17 fir-md1-s1 kernel: osc(OE) Nov 05 20:58:17 fir-md1-s1 kernel: lov(OE) Nov 05 20:58:17 fir-md1-s1 kernel: fid(OE) Nov 05 20:58:17 fir-md1-s1 kernel: fld(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ko2iblnd(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ptlrpc(OE) Nov 05 20:58:17 fir-md1-s1 kernel: obdclass(OE) Nov 05 20:58:17 fir-md1-s1 kernel: lnet(OE) Nov 05 20:58:17 fir-md1-s1 kernel: libcfs(OE) Nov 05 20:58:17 fir-md1-s1 kernel: rpcsec_gss_krb5 Nov 05 20:58:17 fir-md1-s1 kernel: auth_rpcgss Nov 05 20:58:17 fir-md1-s1 kernel: nfsv4 Nov 05 20:58:17 fir-md1-s1 kernel: dns_resolver Nov 05 20:58:17 fir-md1-s1 kernel: nfs Nov 05 20:58:17 fir-md1-s1 kernel: lockd Nov 05 20:58:17 fir-md1-s1 kernel: grace Nov 05 20:58:17 fir-md1-s1 kernel: fscache Nov 05 20:58:17 fir-md1-s1 kernel: rdma_ucm(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_ucm(OE) Nov 05 20:58:17 fir-md1-s1 kernel: rdma_cm(OE) Nov 05 20:58:17 fir-md1-s1 kernel: iw_cm(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_ipoib(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_cm(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_umad(OE) Nov 05 20:58:17 fir-md1-s1 kernel: mlx4_en(OE) Nov 05 20:58:17 fir-md1-s1 kernel: mlx4_ib(OE) Nov 05 20:58:17 fir-md1-s1 kernel: mlx4_core(OE) Nov 05 20:58:17 fir-md1-s1 kernel: dell_rbu Nov 05 20:58:17 fir-md1-s1 kernel: sunrpc Nov 05 20:58:17 fir-md1-s1 kernel: vfat Nov 05 20:58:17 fir-md1-s1 kernel: fat Nov 05 20:58:17 fir-md1-s1 kernel: dm_round_robin Nov 05 20:58:17 fir-md1-s1 kernel: amd64_edac_mod Nov 05 20:58:17 fir-md1-s1 kernel: edac_mce_amd Nov 05 20:58:17 fir-md1-s1 kernel: kvm_amd Nov 05 20:58:17 fir-md1-s1 kernel: kvm Nov 05 20:58:17 fir-md1-s1 kernel: irqbypass Nov 05 20:58:17 fir-md1-s1 kernel: crc32_pclmul Nov 05 20:58:17 fir-md1-s1 kernel: ghash_clmulni_intel Nov 05 20:58:17 fir-md1-s1 kernel: aesni_intel Nov 05 20:58:17 fir-md1-s1 kernel: lrw Nov 05 20:58:17 fir-md1-s1 kernel: gf128mul Nov 05 20:58:17 fir-md1-s1 kernel: dcdbas Nov 05 20:58:17 fir-md1-s1 kernel: glue_helper Nov 05 20:58:17 fir-md1-s1 kernel: ablk_helper Nov 05 20:58:17 fir-md1-s1 kernel: ses Nov 05 20:58:17 fir-md1-s1 kernel: dm_multipath Nov 05 20:58:17 fir-md1-s1 kernel: enclosure Nov 05 20:58:17 fir-md1-s1 kernel: ipmi_si Nov 05 20:58:17 fir-md1-s1 kernel: cryptd Nov 05 20:58:17 fir-md1-s1 kernel: sg Nov 05 20:58:17 fir-md1-s1 kernel: dm_mod Nov 05 20:58:17 fir-md1-s1 kernel: ipmi_devintf Nov 05 20:58:17 fir-md1-s1 kernel: pcspkr Nov 05 20:58:17 fir-md1-s1 kernel: ccp Nov 05 20:58:17 fir-md1-s1 kernel: k10temp Nov 05 20:58:17 fir-md1-s1 kernel: i2c_piix4 Nov 05 20:58:17 fir-md1-s1 kernel: ipmi_msghandler Nov 05 20:58:17 fir-md1-s1 kernel: acpi_power_meter Nov 05 20:58:17 fir-md1-s1 kernel: ip_tables Nov 05 20:58:17 fir-md1-s1 kernel: ext4 Nov 05 20:58:17 fir-md1-s1 kernel: mbcache Nov 05 20:58:17 fir-md1-s1 kernel: jbd2 Nov 05 20:58:17 fir-md1-s1 kernel: sd_mod Nov 05 20:58:17 fir-md1-s1 kernel: crc_t10dif Nov 05 20:58:17 fir-md1-s1 kernel: crct10dif_generic Nov 05 20:58:17 fir-md1-s1 kernel: mlx5_ib(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_uverbs(OE) Nov 05 20:58:17 fir-md1-s1 kernel: i2c_algo_bit Nov 05 20:58:17 fir-md1-s1 kernel: ib_core(OE) Nov 05 20:58:17 fir-md1-s1 kernel: drm_kms_helper Nov 05 20:58:17 fir-md1-s1 kernel: syscopyarea Nov 05 20:58:17 fir-md1-s1 kernel: sysfillrect Nov 05 20:58:17 fir-md1-s1 kernel: sysimgblt Nov 05 20:58:17 fir-md1-s1 kernel: ahci Nov 05 20:58:17 fir-md1-s1 kernel: fb_sys_fops Nov 05 20:58:17 fir-md1-s1 kernel: mlx5_core(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ttm Nov 05 20:58:17 fir-md1-s1 kernel: libahci Nov 05 20:58:17 fir-md1-s1 kernel: mlxfw(OE) Nov 05 20:58:17 fir-md1-s1 kernel: devlink Nov 05 20:58:17 fir-md1-s1 kernel: mpt3sas(OE) Nov 05 20:58:17 fir-md1-s1 kernel: mlx_compat(OE) Nov 05 20:58:17 fir-md1-s1 kernel: tg3 Nov 05 20:58:17 fir-md1-s1 kernel: drm Nov 05 20:58:17 fir-md1-s1 kernel: raid_class Nov 05 20:58:17 fir-md1-s1 kernel: crct10dif_pclmul Nov 05 20:58:17 fir-md1-s1 kernel: crct10dif_common Nov 05 20:58:17 fir-md1-s1 kernel: ptp Nov 05 20:58:17 fir-md1-s1 kernel: libata Nov 05 20:58:17 fir-md1-s1 kernel: megaraid_sas Nov 05 20:58:17 fir-md1-s1 kernel: scsi_transport_sas Nov 05 20:58:17 fir-md1-s1 kernel: crc32c_intel Nov 05 20:58:17 fir-md1-s1 kernel: drm_panel_orientation_quirks Nov 05 20:58:17 fir-md1-s1 kernel: pps_core Nov 05 20:58:17 fir-md1-s1 kernel: [last unloaded: mdc] Nov 05 20:58:17 fir-md1-s1 kernel: Nov 05 20:58:17 fir-md1-s1 kernel: CPU: 1 PID: 41234 Comm: mdt01_055 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:17 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:17 fir-md1-s1 kernel: task: ffffa11e370bc100 ti: ffffa11e2812c000 task.ti: ffffa11e2812c000 Nov 05 20:58:17 fir-md1-s1 kernel: RIP: 0010:[] Nov 05 20:58:17 fir-md1-s1 kernel: [] native_queued_spin_lock_slowpath+0x122/0x200 Nov 05 20:58:17 fir-md1-s1 kernel: RSP: 0018:ffffa11e2812f930 EFLAGS: 00000246 Nov 05 20:58:17 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa11c4a2f7828 RCX: 0000000000090000 Nov 05 20:58:17 fir-md1-s1 kernel: RDX: ffffa13e7f49b780 RSI: 0000000000590101 RDI: ffffa13e3710f480 Nov 05 20:58:17 fir-md1-s1 kernel: RBP: ffffa11e2812f930 R08: ffffa11e3f61b780 R09: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R10: ffffa11e3f61f0c0 R11: ffffda9191d2c600 R12: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R13: ffffffffc13da26c R14: ffffa11e2812f8c0 R15: ffffa11e2812f970 Nov 05 20:58:17 fir-md1-s1 kernel: FS: 00007f3f2c704700(0000) GS:ffffa11e3f600000(0000) knlGS:0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:17 fir-md1-s1 kernel: CR2: 00007f3f2c7d8000 CR3: 000000364fa10000 CR4: 00000000003407e0 Nov 05 20:58:17 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:17 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:17 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_es_lru_del+0x32/0x70 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_clear_inode+0x41/0x90 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_free_inode+0x10b/0x610 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_iloc_dirty+0x68/0x80 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_evict_inode+0x472/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_inode_dirty+0x6f/0x210 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_evict_inode+0x57d/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] evict+0xb4/0x180 Nov 05 20:58:17 fir-md1-s1 kernel: [] iput+0xfc/0x190 Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_object_delete+0x1d2/0x330 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_free.isra.32+0x68/0x170 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? cfs_hash_bd_from_key+0x38/0xb0 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_put+0xc5/0x3d0 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_reint_unlink+0x813/0x14b0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? mdt_thread_info_init+0xa4/0x1e0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? wake_up_state+0x20/0x20 Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: Code: Nov 05 20:58:17 fir-md1-s1 kernel: 13 Nov 05 20:58:17 fir-md1-s1 kernel: 48 Nov 05 20:58:17 fir-md1-s1 kernel: c1 Nov 05 20:58:17 fir-md1-s1 kernel: ea Nov 05 20:58:17 fir-md1-s1 kernel: 0d Nov 05 20:58:17 fir-md1-s1 kernel: 48 Nov 05 20:58:17 fir-md1-s1 kernel: 98 Nov 05 20:58:17 fir-md1-s1 kernel: 83 Nov 05 20:58:17 fir-md1-s1 kernel: e2 Nov 05 20:58:17 fir-md1-s1 kernel: 30 Nov 05 20:58:17 fir-md1-s1 kernel: 48 Nov 05 20:58:17 fir-md1-s1 kernel: 81 Nov 05 20:58:17 fir-md1-s1 kernel: c2 Nov 05 20:58:17 fir-md1-s1 kernel: 80 Nov 05 20:58:17 fir-md1-s1 kernel: b7 Nov 05 20:58:17 fir-md1-s1 kernel: 01 Nov 05 20:58:17 fir-md1-s1 kernel: 00 Nov 05 20:58:17 fir-md1-s1 kernel: 48 Nov 05 20:58:17 fir-md1-s1 kernel: 03 Nov 05 20:58:17 fir-md1-s1 kernel: 14 Nov 05 20:58:17 fir-md1-s1 kernel: c5 Nov 05 20:58:17 fir-md1-s1 kernel: e0 Nov 05 20:58:17 fir-md1-s1 kernel: bf Nov 05 20:58:17 fir-md1-s1 kernel: 54 Nov 05 20:58:17 fir-md1-s1 kernel: bf Nov 05 20:58:17 fir-md1-s1 kernel: 4c Nov 05 20:58:17 fir-md1-s1 kernel: 89 Nov 05 20:58:17 fir-md1-s1 kernel: 02 Nov 05 20:58:17 fir-md1-s1 kernel: 41 Nov 05 20:58:17 fir-md1-s1 kernel: 8b Nov 05 20:58:17 fir-md1-s1 kernel: 40 Nov 05 20:58:17 fir-md1-s1 kernel: 08 Nov 05 20:58:17 fir-md1-s1 kernel: 85 Nov 05 20:58:17 fir-md1-s1 kernel: c0 Nov 05 20:58:17 fir-md1-s1 kernel: 75 Nov 05 20:58:17 fir-md1-s1 kernel: 0f Nov 05 20:58:17 fir-md1-s1 kernel: 0f Nov 05 20:58:17 fir-md1-s1 kernel: 1f Nov 05 20:58:17 fir-md1-s1 kernel: 44 Nov 05 20:58:17 fir-md1-s1 kernel: 00 Nov 05 20:58:17 fir-md1-s1 kernel: 00 Nov 05 20:58:17 fir-md1-s1 kernel: f3 Nov 05 20:58:17 fir-md1-s1 kernel: 90 Nov 05 20:58:17 fir-md1-s1 kernel: <41> Nov 05 20:58:17 fir-md1-s1 kernel: 8b Nov 05 20:58:17 fir-md1-s1 kernel: 40 Nov 05 20:58:17 fir-md1-s1 kernel: 08 Nov 05 20:58:17 fir-md1-s1 kernel: 85 Nov 05 20:58:17 fir-md1-s1 kernel: c0 Nov 05 20:58:17 fir-md1-s1 kernel: 74 Nov 05 20:58:17 fir-md1-s1 kernel: f6 Nov 05 20:58:17 fir-md1-s1 kernel: 4d Nov 05 20:58:17 fir-md1-s1 kernel: 8b Nov 05 20:58:17 fir-md1-s1 kernel: 08 Nov 05 20:58:17 fir-md1-s1 kernel: 4d Nov 05 20:58:17 fir-md1-s1 kernel: 85 Nov 05 20:58:17 fir-md1-s1 kernel: c9 Nov 05 20:58:17 fir-md1-s1 kernel: 74 Nov 05 20:58:17 fir-md1-s1 kernel: 04 Nov 05 20:58:17 fir-md1-s1 kernel: 41 Nov 05 20:58:17 fir-md1-s1 kernel: 0f Nov 05 20:58:17 fir-md1-s1 kernel: 18 Nov 05 20:58:17 fir-md1-s1 kernel: 09 Nov 05 20:58:17 fir-md1-s1 kernel: 8b Nov 05 20:58:17 fir-md1-s1 kernel: Nov 05 20:58:17 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [ldlm_cn02_013:52318] Nov 05 20:58:17 fir-md1-s1 kernel: Modules linked in: Nov 05 20:58:17 fir-md1-s1 kernel: lustre(OE) Nov 05 20:58:17 fir-md1-s1 kernel: mdc(OE) Nov 05 20:58:17 fir-md1-s1 kernel: mgs(OE) Nov 05 20:58:17 fir-md1-s1 kernel: osp(OE) Nov 05 20:58:17 fir-md1-s1 kernel: mdd(OE) Nov 05 20:58:17 fir-md1-s1 kernel: lod(OE) Nov 05 20:58:17 fir-md1-s1 kernel: mdt(OE) Nov 05 20:58:17 fir-md1-s1 kernel: lfsck(OE) Nov 05 20:58:17 fir-md1-s1 kernel: mgc(OE) Nov 05 20:58:17 fir-md1-s1 kernel: osd_ldiskfs(OE) Nov 05 20:58:17 fir-md1-s1 kernel: lquota(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ldiskfs(OE) Nov 05 20:58:17 fir-md1-s1 kernel: lmv(OE) Nov 05 20:58:17 fir-md1-s1 kernel: osc(OE) Nov 05 20:58:17 fir-md1-s1 kernel: lov(OE) Nov 05 20:58:17 fir-md1-s1 kernel: fid(OE) Nov 05 20:58:17 fir-md1-s1 kernel: fld(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ko2iblnd(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ptlrpc(OE) Nov 05 20:58:17 fir-md1-s1 kernel: obdclass(OE) Nov 05 20:58:17 fir-md1-s1 kernel: lnet(OE) Nov 05 20:58:17 fir-md1-s1 kernel: libcfs(OE) Nov 05 20:58:17 fir-md1-s1 kernel: rpcsec_gss_krb5 Nov 05 20:58:17 fir-md1-s1 kernel: auth_rpcgss Nov 05 20:58:17 fir-md1-s1 kernel: nfsv4 Nov 05 20:58:17 fir-md1-s1 kernel: dns_resolver Nov 05 20:58:17 fir-md1-s1 kernel: nfs Nov 05 20:58:17 fir-md1-s1 kernel: lockd Nov 05 20:58:17 fir-md1-s1 kernel: grace Nov 05 20:58:17 fir-md1-s1 kernel: fscache Nov 05 20:58:17 fir-md1-s1 kernel: rdma_ucm(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_ucm(OE) Nov 05 20:58:17 fir-md1-s1 kernel: rdma_cm(OE) Nov 05 20:58:17 fir-md1-s1 kernel: iw_cm(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_ipoib(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_cm(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_umad(OE) Nov 05 20:58:17 fir-md1-s1 kernel: mlx4_en(OE) Nov 05 20:58:17 fir-md1-s1 kernel: mlx4_ib(OE) Nov 05 20:58:17 fir-md1-s1 kernel: mlx4_core(OE) Nov 05 20:58:17 fir-md1-s1 kernel: dell_rbu Nov 05 20:58:17 fir-md1-s1 kernel: sunrpc Nov 05 20:58:17 fir-md1-s1 kernel: vfat Nov 05 20:58:17 fir-md1-s1 kernel: fat Nov 05 20:58:17 fir-md1-s1 kernel: dm_round_robin Nov 05 20:58:17 fir-md1-s1 kernel: amd64_edac_mod Nov 05 20:58:17 fir-md1-s1 kernel: edac_mce_amd Nov 05 20:58:17 fir-md1-s1 kernel: kvm_amd Nov 05 20:58:17 fir-md1-s1 kernel: kvm Nov 05 20:58:17 fir-md1-s1 kernel: irqbypass Nov 05 20:58:17 fir-md1-s1 kernel: crc32_pclmul Nov 05 20:58:17 fir-md1-s1 kernel: ghash_clmulni_intel Nov 05 20:58:17 fir-md1-s1 kernel: aesni_intel Nov 05 20:58:17 fir-md1-s1 kernel: lrw Nov 05 20:58:17 fir-md1-s1 kernel: gf128mul Nov 05 20:58:17 fir-md1-s1 kernel: dcdbas Nov 05 20:58:17 fir-md1-s1 kernel: glue_helper Nov 05 20:58:17 fir-md1-s1 kernel: ablk_helper Nov 05 20:58:17 fir-md1-s1 kernel: ses Nov 05 20:58:17 fir-md1-s1 kernel: dm_multipath Nov 05 20:58:17 fir-md1-s1 kernel: enclosure Nov 05 20:58:17 fir-md1-s1 kernel: ipmi_si Nov 05 20:58:17 fir-md1-s1 kernel: cryptd Nov 05 20:58:17 fir-md1-s1 kernel: sg Nov 05 20:58:17 fir-md1-s1 kernel: dm_mod Nov 05 20:58:17 fir-md1-s1 kernel: ipmi_devintf Nov 05 20:58:17 fir-md1-s1 kernel: pcspkr Nov 05 20:58:17 fir-md1-s1 kernel: ccp Nov 05 20:58:17 fir-md1-s1 kernel: k10temp Nov 05 20:58:17 fir-md1-s1 kernel: i2c_piix4 Nov 05 20:58:17 fir-md1-s1 kernel: ipmi_msghandler Nov 05 20:58:17 fir-md1-s1 kernel: acpi_power_meter Nov 05 20:58:17 fir-md1-s1 kernel: ip_tables Nov 05 20:58:17 fir-md1-s1 kernel: ext4 Nov 05 20:58:17 fir-md1-s1 kernel: mbcache Nov 05 20:58:17 fir-md1-s1 kernel: jbd2 Nov 05 20:58:17 fir-md1-s1 kernel: sd_mod Nov 05 20:58:17 fir-md1-s1 kernel: crc_t10dif Nov 05 20:58:17 fir-md1-s1 kernel: crct10dif_generic Nov 05 20:58:17 fir-md1-s1 kernel: mlx5_ib(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_uverbs(OE) Nov 05 20:58:17 fir-md1-s1 kernel: i2c_algo_bit Nov 05 20:58:17 fir-md1-s1 kernel: ib_core(OE) Nov 05 20:58:17 fir-md1-s1 kernel: drm_kms_helper Nov 05 20:58:17 fir-md1-s1 kernel: syscopyarea Nov 05 20:58:17 fir-md1-s1 kernel: sysfillrect Nov 05 20:58:17 fir-md1-s1 kernel: sysimgblt Nov 05 20:58:17 fir-md1-s1 kernel: ahci Nov 05 20:58:17 fir-md1-s1 kernel: fb_sys_fops Nov 05 20:58:17 fir-md1-s1 kernel: mlx5_core(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ttm Nov 05 20:58:17 fir-md1-s1 kernel: libahci Nov 05 20:58:17 fir-md1-s1 kernel: mlxfw(OE) Nov 05 20:58:17 fir-md1-s1 kernel: devlink Nov 05 20:58:17 fir-md1-s1 kernel: mpt3sas(OE) Nov 05 20:58:17 fir-md1-s1 kernel: mlx_compat(OE) Nov 05 20:58:17 fir-md1-s1 kernel: tg3 Nov 05 20:58:17 fir-md1-s1 kernel: drm Nov 05 20:58:17 fir-md1-s1 kernel: raid_class Nov 05 20:58:17 fir-md1-s1 kernel: crct10dif_pclmul Nov 05 20:58:17 fir-md1-s1 kernel: crct10dif_common Nov 05 20:58:17 fir-md1-s1 kernel: ptp Nov 05 20:58:17 fir-md1-s1 kernel: libata Nov 05 20:58:17 fir-md1-s1 kernel: megaraid_sas Nov 05 20:58:17 fir-md1-s1 kernel: scsi_transport_sas Nov 05 20:58:17 fir-md1-s1 kernel: crc32c_intel Nov 05 20:58:17 fir-md1-s1 kernel: drm_panel_orientation_quirks Nov 05 20:58:17 fir-md1-s1 kernel: pps_core Nov 05 20:58:17 fir-md1-s1 kernel: [last unloaded: mdc] Nov 05 20:58:17 fir-md1-s1 kernel: Nov 05 20:58:17 fir-md1-s1 kernel: CPU: 2 PID: 52318 Comm: ldlm_cn02_013 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:17 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:17 fir-md1-s1 kernel: task: ffffa11e32cfa080 ti: ffffa13716318000 task.ti: ffffa13716318000 Nov 05 20:58:17 fir-md1-s1 kernel: RIP: 0010:[] Nov 05 20:58:17 fir-md1-s1 kernel: [] native_queued_spin_lock_slowpath+0x122/0x200 Nov 05 20:58:17 fir-md1-s1 kernel: RSP: 0018:ffffa1371631b880 EFLAGS: 00000246 Nov 05 20:58:17 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa129c7caa038 RCX: 0000000000110000 Nov 05 20:58:17 fir-md1-s1 kernel: RDX: ffffa13e7f59b780 RSI: 0000000000d90101 RDI: ffffa13e3710f480 Nov 05 20:58:17 fir-md1-s1 kernel: RBP: ffffa1371631b880 R08: ffffa12e3f61b780 R09: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R10: ffffa12e3f61f0c0 R11: ffffda91ee4ef9c0 R12: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R13: ffffffffc13da26c R14: ffffa1371631b810 R15: ffffa1371631b8c0 Nov 05 20:58:17 fir-md1-s1 kernel: FS: 00007f94d47aa880(0000) GS:ffffa12e3f600000(0000) knlGS:0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:17 fir-md1-s1 kernel: CR2: 00007f94c3685010 CR3: 000000202a620000 CR4: 00000000003407e0 Nov 05 20:58:17 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:17 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:17 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_es_lru_del+0x32/0x70 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_clear_inode+0x41/0x90 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_free_inode+0x10b/0x610 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_iloc_dirty+0x68/0x80 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_evict_inode+0x472/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_inode_dirty+0x6f/0x210 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_evict_inode+0x57d/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] evict+0xb4/0x180 Nov 05 20:58:17 fir-md1-s1 kernel: [] iput+0xfc/0x190 Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_object_delete+0x1d2/0x330 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_free.isra.32+0x68/0x170 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? cfs_hash_bd_from_key+0x38/0xb0 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_put+0xc5/0x3d0 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? mdt_punch_hpreq_fini+0x10/0x10 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_dom_discard_cp_ast+0xb1/0x2b0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_work_cp_ast_lock+0xa8/0x1d0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x72/0x790 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? kmem_cache_alloc_node_trace+0x11d/0x210 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? lprocfs_counter_add+0xf9/0x160 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldlm_work_gl_ast_lock+0x3a0/0x3a0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_prep_set+0xd2/0x280 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] __ldlm_reprocess_all+0x11f/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_reprocess_all+0x13/0x20 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_request_cancel+0x42f/0x780 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_handle_cancel+0x232/0x2b0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_cancel_handler+0x158/0x590 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: Code: Nov 05 20:58:17 fir-md1-s1 kernel: 13 Nov 05 20:58:17 fir-md1-s1 kernel: 48 Nov 05 20:58:17 fir-md1-s1 kernel: c1 Nov 05 20:58:17 fir-md1-s1 kernel: ea Nov 05 20:58:17 fir-md1-s1 kernel: 0d Nov 05 20:58:17 fir-md1-s1 kernel: 48 Nov 05 20:58:17 fir-md1-s1 kernel: 98 Nov 05 20:58:17 fir-md1-s1 kernel: 83 Nov 05 20:58:17 fir-md1-s1 kernel: e2 Nov 05 20:58:17 fir-md1-s1 kernel: 30 Nov 05 20:58:17 fir-md1-s1 kernel: 48 Nov 05 20:58:17 fir-md1-s1 kernel: 81 Nov 05 20:58:17 fir-md1-s1 kernel: c2 Nov 05 20:58:17 fir-md1-s1 kernel: 80 Nov 05 20:58:17 fir-md1-s1 kernel: b7 Nov 05 20:58:17 fir-md1-s1 kernel: 01 Nov 05 20:58:17 fir-md1-s1 kernel: 00 Nov 05 20:58:17 fir-md1-s1 kernel: 48 Nov 05 20:58:17 fir-md1-s1 kernel: 03 Nov 05 20:58:17 fir-md1-s1 kernel: 14 Nov 05 20:58:17 fir-md1-s1 kernel: c5 Nov 05 20:58:17 fir-md1-s1 kernel: e0 Nov 05 20:58:17 fir-md1-s1 kernel: bf Nov 05 20:58:17 fir-md1-s1 kernel: 54 Nov 05 20:58:17 fir-md1-s1 kernel: bf Nov 05 20:58:17 fir-md1-s1 kernel: 4c Nov 05 20:58:17 fir-md1-s1 kernel: 89 Nov 05 20:58:17 fir-md1-s1 kernel: 02 Nov 05 20:58:17 fir-md1-s1 kernel: 41 Nov 05 20:58:17 fir-md1-s1 kernel: 8b Nov 05 20:58:17 fir-md1-s1 kernel: 40 Nov 05 20:58:17 fir-md1-s1 kernel: 08 Nov 05 20:58:17 fir-md1-s1 kernel: 85 Nov 05 20:58:17 fir-md1-s1 kernel: c0 Nov 05 20:58:17 fir-md1-s1 kernel: 75 Nov 05 20:58:17 fir-md1-s1 kernel: 0f Nov 05 20:58:17 fir-md1-s1 kernel: 0f Nov 05 20:58:17 fir-md1-s1 kernel: 1f Nov 05 20:58:17 fir-md1-s1 kernel: 44 Nov 05 20:58:17 fir-md1-s1 kernel: 00 Nov 05 20:58:17 fir-md1-s1 kernel: 00 Nov 05 20:58:17 fir-md1-s1 kernel: f3 Nov 05 20:58:17 fir-md1-s1 kernel: 90 Nov 05 20:58:17 fir-md1-s1 kernel: <41> Nov 05 20:58:17 fir-md1-s1 kernel: 8b Nov 05 20:58:17 fir-md1-s1 kernel: 40 Nov 05 20:58:17 fir-md1-s1 kernel: 08 Nov 05 20:58:17 fir-md1-s1 kernel: 85 Nov 05 20:58:17 fir-md1-s1 kernel: c0 Nov 05 20:58:17 fir-md1-s1 kernel: 74 Nov 05 20:58:17 fir-md1-s1 kernel: f6 Nov 05 20:58:17 fir-md1-s1 kernel: 4d Nov 05 20:58:17 fir-md1-s1 kernel: 8b Nov 05 20:58:17 fir-md1-s1 kernel: 08 Nov 05 20:58:17 fir-md1-s1 kernel: 4d Nov 05 20:58:17 fir-md1-s1 kernel: 85 Nov 05 20:58:17 fir-md1-s1 kernel: c9 Nov 05 20:58:17 fir-md1-s1 kernel: 74 Nov 05 20:58:17 fir-md1-s1 kernel: 04 Nov 05 20:58:17 fir-md1-s1 kernel: 41 Nov 05 20:58:17 fir-md1-s1 kernel: 0f Nov 05 20:58:17 fir-md1-s1 kernel: 18 Nov 05 20:58:17 fir-md1-s1 kernel: 09 Nov 05 20:58:17 fir-md1-s1 kernel: 8b Nov 05 20:58:17 fir-md1-s1 kernel: Nov 05 20:58:17 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [ldlm_cn03_025:61222] Nov 05 20:58:17 fir-md1-s1 kernel: Modules linked in: Nov 05 20:58:17 fir-md1-s1 kernel: lustre(OE) Nov 05 20:58:17 fir-md1-s1 kernel: mdc(OE) Nov 05 20:58:17 fir-md1-s1 kernel: mgs(OE) Nov 05 20:58:17 fir-md1-s1 kernel: osp(OE) Nov 05 20:58:17 fir-md1-s1 kernel: mdd(OE) Nov 05 20:58:17 fir-md1-s1 kernel: lod(OE) Nov 05 20:58:17 fir-md1-s1 kernel: mdt(OE) Nov 05 20:58:17 fir-md1-s1 kernel: lfsck(OE) Nov 05 20:58:17 fir-md1-s1 kernel: mgc(OE) Nov 05 20:58:17 fir-md1-s1 kernel: osd_ldiskfs(OE) Nov 05 20:58:17 fir-md1-s1 kernel: lquota(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ldiskfs(OE) Nov 05 20:58:17 fir-md1-s1 kernel: lmv(OE) Nov 05 20:58:17 fir-md1-s1 kernel: osc(OE) Nov 05 20:58:17 fir-md1-s1 kernel: lov(OE) Nov 05 20:58:17 fir-md1-s1 kernel: fid(OE) Nov 05 20:58:17 fir-md1-s1 kernel: fld(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ko2iblnd(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ptlrpc(OE) Nov 05 20:58:17 fir-md1-s1 kernel: obdclass(OE) Nov 05 20:58:17 fir-md1-s1 kernel: lnet(OE) Nov 05 20:58:17 fir-md1-s1 kernel: libcfs(OE) Nov 05 20:58:17 fir-md1-s1 kernel: rpcsec_gss_krb5 Nov 05 20:58:17 fir-md1-s1 kernel: auth_rpcgss Nov 05 20:58:17 fir-md1-s1 kernel: nfsv4 Nov 05 20:58:17 fir-md1-s1 kernel: dns_resolver Nov 05 20:58:17 fir-md1-s1 kernel: nfs Nov 05 20:58:17 fir-md1-s1 kernel: lockd Nov 05 20:58:17 fir-md1-s1 kernel: grace Nov 05 20:58:17 fir-md1-s1 kernel: fscache Nov 05 20:58:17 fir-md1-s1 kernel: rdma_ucm(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_ucm(OE) Nov 05 20:58:17 fir-md1-s1 kernel: rdma_cm(OE) Nov 05 20:58:17 fir-md1-s1 kernel: iw_cm(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_ipoib(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_cm(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_umad(OE) Nov 05 20:58:17 fir-md1-s1 kernel: mlx4_en(OE) Nov 05 20:58:17 fir-md1-s1 kernel: mlx4_ib(OE) Nov 05 20:58:17 fir-md1-s1 kernel: mlx4_core(OE) Nov 05 20:58:17 fir-md1-s1 kernel: dell_rbu Nov 05 20:58:17 fir-md1-s1 kernel: sunrpc Nov 05 20:58:17 fir-md1-s1 kernel: vfat Nov 05 20:58:17 fir-md1-s1 kernel: fat Nov 05 20:58:17 fir-md1-s1 kernel: dm_round_robin Nov 05 20:58:17 fir-md1-s1 kernel: amd64_edac_mod Nov 05 20:58:17 fir-md1-s1 kernel: edac_mce_amd Nov 05 20:58:17 fir-md1-s1 kernel: kvm_amd Nov 05 20:58:17 fir-md1-s1 kernel: kvm Nov 05 20:58:17 fir-md1-s1 kernel: irqbypass Nov 05 20:58:17 fir-md1-s1 kernel: crc32_pclmul Nov 05 20:58:17 fir-md1-s1 kernel: ghash_clmulni_intel Nov 05 20:58:17 fir-md1-s1 kernel: aesni_intel Nov 05 20:58:17 fir-md1-s1 kernel: lrw Nov 05 20:58:17 fir-md1-s1 kernel: gf128mul Nov 05 20:58:17 fir-md1-s1 kernel: dcdbas Nov 05 20:58:17 fir-md1-s1 kernel: glue_helper Nov 05 20:58:17 fir-md1-s1 kernel: ablk_helper Nov 05 20:58:17 fir-md1-s1 kernel: ses Nov 05 20:58:17 fir-md1-s1 kernel: dm_multipath Nov 05 20:58:17 fir-md1-s1 kernel: enclosure Nov 05 20:58:17 fir-md1-s1 kernel: ipmi_si Nov 05 20:58:17 fir-md1-s1 kernel: cryptd Nov 05 20:58:17 fir-md1-s1 kernel: sg Nov 05 20:58:17 fir-md1-s1 kernel: dm_mod Nov 05 20:58:17 fir-md1-s1 kernel: ipmi_devintf Nov 05 20:58:17 fir-md1-s1 kernel: pcspkr Nov 05 20:58:17 fir-md1-s1 kernel: ccp Nov 05 20:58:17 fir-md1-s1 kernel: k10temp Nov 05 20:58:17 fir-md1-s1 kernel: i2c_piix4 Nov 05 20:58:17 fir-md1-s1 kernel: ipmi_msghandler Nov 05 20:58:17 fir-md1-s1 kernel: acpi_power_meter Nov 05 20:58:17 fir-md1-s1 kernel: ip_tables Nov 05 20:58:17 fir-md1-s1 kernel: ext4 Nov 05 20:58:17 fir-md1-s1 kernel: mbcache Nov 05 20:58:17 fir-md1-s1 kernel: jbd2 Nov 05 20:58:17 fir-md1-s1 kernel: sd_mod Nov 05 20:58:17 fir-md1-s1 kernel: crc_t10dif Nov 05 20:58:17 fir-md1-s1 kernel: crct10dif_generic Nov 05 20:58:17 fir-md1-s1 kernel: mlx5_ib(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_uverbs(OE) Nov 05 20:58:17 fir-md1-s1 kernel: i2c_algo_bit Nov 05 20:58:17 fir-md1-s1 kernel: ib_core(OE) Nov 05 20:58:17 fir-md1-s1 kernel: drm_kms_helper Nov 05 20:58:17 fir-md1-s1 kernel: syscopyarea Nov 05 20:58:17 fir-md1-s1 kernel: sysfillrect Nov 05 20:58:17 fir-md1-s1 kernel: sysimgblt Nov 05 20:58:17 fir-md1-s1 kernel: ahci Nov 05 20:58:17 fir-md1-s1 kernel: fb_sys_fops Nov 05 20:58:17 fir-md1-s1 kernel: mlx5_core(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ttm Nov 05 20:58:17 fir-md1-s1 kernel: libahci Nov 05 20:58:17 fir-md1-s1 kernel: mlxfw(OE) Nov 05 20:58:17 fir-md1-s1 kernel: devlink Nov 05 20:58:17 fir-md1-s1 kernel: mpt3sas(OE) Nov 05 20:58:17 fir-md1-s1 kernel: mlx_compat(OE) Nov 05 20:58:17 fir-md1-s1 kernel: tg3 Nov 05 20:58:17 fir-md1-s1 kernel: drm Nov 05 20:58:17 fir-md1-s1 kernel: raid_class Nov 05 20:58:17 fir-md1-s1 kernel: crct10dif_pclmul Nov 05 20:58:17 fir-md1-s1 kernel: crct10dif_common Nov 05 20:58:17 fir-md1-s1 kernel: ptp Nov 05 20:58:17 fir-md1-s1 kernel: libata Nov 05 20:58:17 fir-md1-s1 kernel: megaraid_sas Nov 05 20:58:17 fir-md1-s1 kernel: scsi_transport_sas Nov 05 20:58:17 fir-md1-s1 kernel: crc32c_intel Nov 05 20:58:17 fir-md1-s1 kernel: drm_panel_orientation_quirks Nov 05 20:58:17 fir-md1-s1 kernel: pps_core Nov 05 20:58:17 fir-md1-s1 kernel: [last unloaded: mdc] Nov 05 20:58:17 fir-md1-s1 kernel: Nov 05 20:58:17 fir-md1-s1 kernel: CPU: 3 PID: 61222 Comm: ldlm_cn03_025 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:17 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:17 fir-md1-s1 kernel: task: ffffa13e397dc100 ti: ffffa13679694000 task.ti: ffffa13679694000 Nov 05 20:58:17 fir-md1-s1 kernel: RIP: 0010:[] Nov 05 20:58:17 fir-md1-s1 kernel: [] native_queued_spin_lock_slowpath+0x126/0x200 Nov 05 20:58:17 fir-md1-s1 kernel: RSP: 0018:ffffa13679697880 EFLAGS: 00000246 Nov 05 20:58:17 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa11c4a320f78 RCX: 0000000000190000 Nov 05 20:58:17 fir-md1-s1 kernel: RDX: ffffa12e3f81b780 RSI: 0000000001110101 RDI: ffffa13e3710f480 Nov 05 20:58:17 fir-md1-s1 kernel: RBP: ffffa13679697880 R08: ffffa13e7f41b780 R09: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R10: ffffa13e7f41f0c0 R11: ffffda920c54c8c0 R12: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R13: ffffffffc13da26c R14: ffffa13679697810 R15: ffffa136796978c0 Nov 05 20:58:17 fir-md1-s1 kernel: FS: 00007f69c209d740(0000) GS:ffffa13e7f400000(0000) knlGS:0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:17 fir-md1-s1 kernel: CR2: 00007f69c20aa000 CR3: 000000364fa10000 CR4: 00000000003407e0 Nov 05 20:58:17 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:17 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:17 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_es_lru_del+0x32/0x70 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_clear_inode+0x41/0x90 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_free_inode+0x10b/0x610 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_iloc_dirty+0x68/0x80 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_evict_inode+0x472/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_inode_dirty+0x6f/0x210 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_evict_inode+0x57d/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] evict+0xb4/0x180 Nov 05 20:58:17 fir-md1-s1 kernel: [] iput+0xfc/0x190 Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_object_delete+0x1d2/0x330 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_free.isra.32+0x68/0x170 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? cfs_hash_bd_from_key+0x38/0xb0 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_put+0xc5/0x3d0 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? mdt_punch_hpreq_fini+0x10/0x10 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_dom_discard_cp_ast+0xb1/0x2b0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_work_cp_ast_lock+0xa8/0x1d0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x72/0x790 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? kmem_cache_alloc_node_trace+0x11d/0x210 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? lprocfs_counter_add+0xf9/0x160 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldlm_work_gl_ast_lock+0x3a0/0x3a0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_prep_set+0xd2/0x280 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] __ldlm_reprocess_all+0x11f/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_reprocess_all+0x13/0x20 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_request_cancel+0x42f/0x780 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) ib_uverbs(OE) Nov 05 20:58:17 fir-md1-s1 kernel: i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:17 fir-md1-s1 kernel: CPU: 0 PID: 41794 Comm: mdt_io00_052 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:17 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:17 fir-md1-s1 kernel: task: ffffa137ca681040 ti: ffffa10e221fc000 task.ti: ffffa10e221fc000 Nov 05 20:58:17 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 Nov 05 20:58:17 fir-md1-s1 kernel: RSP: 0018:ffffa10e221ff800 EFLAGS: 00000246 Nov 05 20:58:17 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa1179ddeb5c0 RCX: 0000000000010000 Nov 05 20:58:17 fir-md1-s1 kernel: RDX: ffffa10e3f09b780 RSI: 0000000001410101 RDI: ffffa13e3710f480 Nov 05 20:58:17 fir-md1-s1 kernel: RBP: ffffa10e221ff800 R08: ffffa10e3ee1b780 R09: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R10: ffffa10e3ee1f140 R11: ffffda91b8d49400 R12: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R13: ffffa10e221ff7a0 R14: ffffa1179ddeb328 R15: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: FS: 00007f4fff165880(0000) GS:ffffa10e3ee00000(0000) knlGS:0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:17 fir-md1-s1 kernel: CR2: 00007f4fec48b9e4 CR3: 000000364fa10000 CR4: 00000000003407f0 Nov 05 20:58:17 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:17 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:17 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get+0x52/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? kiblnd_post_tx_locked+0x7bb/0xa50 [ko2iblnd] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? cfs_hash_bd_lookup_intent+0x63/0x170 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_ldiskfs_map_inode_pages+0x143/0x420 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_write_prep+0x2b6/0x360 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_obd_preprw+0x65b/0x10a0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] tgt_brw_write+0xc7c/0x1cf0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? load_balance+0x1be/0x9a0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? tgt_lookup_reply+0x2d/0x190 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __schedule+0x42a/0x860 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b Nov 05 20:58:17 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#4 stuck for 22s! [mdt00_083:41365] Nov 05 20:58:17 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:17 fir-md1-s1 kernel: CPU: 4 PID: 41365 Comm: mdt00_083 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:17 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:17 fir-md1-s1 kernel: task: ffffa127be26c100 ti: ffffa127c1ed8000 task.ti: ffffa127c1ed8000 Nov 05 20:58:17 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x126/0x200 Nov 05 20:58:17 fir-md1-s1 kernel: RSP: 0018:ffffa127c1edb510 EFLAGS: 00000246 Nov 05 20:58:17 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa129c7cb4250 RCX: 0000000000210000 Nov 05 20:58:17 fir-md1-s1 kernel: RDX: ffffa10e3f05b780 RSI: 0000000001210101 RDI: ffffa13e3710f480 Nov 05 20:58:17 fir-md1-s1 kernel: RBP: ffffa127c1edb510 R08: ffffa10e3ee5b780 R09: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R10: ffffa10e3ee5f140 R11: ffffda9173e6a400 R12: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R13: ffffa127c1edb4b0 R14: ffffa129c7cb3fb8 R15: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: FS: 00007fb30a5e6880(0000) GS:ffffa10e3ee40000(0000) knlGS:0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:17 fir-md1-s1 kernel: CR2: 00007f38cf67e000 CR3: 00000010209aa000 CR4: 00000000003407e0 Nov 05 20:58:17 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:17 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:17 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? lprocfs_counter_sub+0xc1/0x130 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? dynlock_unlock+0x194/0x1e0 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __brelse+0x3d/0x50 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? iam_path_release+0x42/0x60 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_getblk+0x65/0x200 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_bread+0x27/0xc0 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_append+0x81/0x150 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_init_new_dir+0xcf/0x230 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __brelse+0x3d/0x50 Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_add_dot_dotdot+0x4e/0x90 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_add_dot_dotdot_internal.isra.76+0x5f/0x80 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_index_ea_insert+0xbaa/0x12f0 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lod_sub_insert+0x1c1/0x340 [lod] Nov 05 20:58:17 fir-md1-s1 kernel: [] lod_insert+0x24/0x30 [lod] Nov 05 20:58:17 fir-md1-s1 kernel: [] __mdd_index_insert_only+0x1cc/0x280 [mdd] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdd_create_object+0x6c8/0x820 [mdd] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdd_create+0xe31/0x14e0 [mdd] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_create+0xb54/0x1090 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldlm_resource_putref+0x199/0x260 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? lprocfs_stats_lock+0x24/0xd0 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_reint_create+0x16b/0x360 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? mdt_thread_info_init+0xa4/0x1e0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: Code: 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 41 8b 40 08 <85> c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b 17 0f b7 c2 Nov 05 20:58:17 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#5 stuck for 22s! [mdt_rdpg01_042:61620] Nov 05 20:58:17 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:17 fir-md1-s1 kernel: CPU: 5 PID: 61620 Comm: mdt_rdpg01_042 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:17 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:17 fir-md1-s1 kernel: task: ffffa128b43b0000 ti: ffffa11f28f08000 task.ti: ffffa11f28f08000 Nov 05 20:58:17 fir-md1-s1 kernel: RIP: 0010:[] [] ldiskfs_inode_touch_time_cmp+0xd/0x90 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: RSP: 0000:ffffa11f28f0b728 EFLAGS: 00000282 Nov 05 20:58:17 fir-md1-s1 kernel: RAX: 8000040400080000 RBX: ffffffffbe89b6f4 RCX: 0000000103ce1c56 Nov 05 20:58:17 fir-md1-s1 kernel: RDX: ffffa127acac4ee8 RSI: ffffa13d8ed0ad68 RDI: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: RBP: ffffa11f28f0b778 R08: 000000000000000a R09: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R10: 0000000000001831 R11: ffffa11f28f0b486 R12: 0000000000000006 Nov 05 20:58:17 fir-md1-s1 kernel: R13: 0000000000000032 R14: 0000000000000000 R15: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: FS: 00007f2faba6b740(0000) GS:ffffa11e3f640000(0000) knlGS:0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:17 fir-md1-s1 kernel: CR2: 00000000025a6028 CR3: 000000364fa10000 CR4: 00000000003407e0 Nov 05 20:58:17 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:17 fir-md1-s1 kernel: [] ? merge+0x62/0xc0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_init_inode_table+0x410/0x410 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] list_sort+0x9b/0x250 Nov 05 20:58:17 fir-md1-s1 kernel: [] __ldiskfs_es_shrink+0x1ce/0x2a0 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_es_shrink+0xb4/0x130 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] shrink_slab+0x175/0x340 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? vmpressure+0x61/0x90 Nov 05 20:58:17 fir-md1-s1 kernel: [] zone_reclaim+0x1d1/0x2f0 Nov 05 20:58:17 fir-md1-s1 kernel: [] get_page_from_freelist+0x87b/0xa70 Nov 05 20:58:17 fir-md1-s1 kernel: [] __alloc_pages_nodemask+0x176/0x420 Nov 05 20:58:17 fir-md1-s1 kernel: [] alloc_pages_current+0x98/0x110 Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_readpage+0x3cc/0x880 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __schedule+0x42a/0x860 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: Code: ff 8d 4a 01 89 d0 f0 0f b1 0f 39 d0 0f 84 fb fd ff ff 89 c2 eb e2 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 8b 86 e0 fc ff ff <48> 89 e5 48 c1 e8 2b a8 01 74 15 48 8b 8a e0 fc ff ff b8 01 00 Nov 05 20:58:17 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#6 stuck for 22s! [mdt_io02_036:41739] Nov 05 20:58:17 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:17 fir-md1-s1 kernel: CPU: 6 PID: 41739 Comm: mdt_io02_036 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:17 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:17 fir-md1-s1 kernel: task: ffffa11e344a1040 ti: ffffa13e27604000 task.ti: ffffa13e27604000 Nov 05 20:58:17 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 Nov 05 20:58:17 fir-md1-s1 kernel: RSP: 0018:ffffa13e27607800 EFLAGS: 00000246 Nov 05 20:58:17 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa130a6107490 RCX: 0000000000310000 Nov 05 20:58:17 fir-md1-s1 kernel: RDX: ffffa12e3f85b780 RSI: 0000000001310101 RDI: ffffa13e3710f480 Nov 05 20:58:17 fir-md1-s1 kernel: RBP: ffffa13e27607800 R08: ffffa12e3f65b780 R09: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R10: ffffa12e3f65f140 R11: ffffda91d9456000 R12: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R13: ffffa13e276077a0 R14: ffffa130a61071f8 R15: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: FS: 00007f7a69e73740(0000) GS:ffffa12e3f640000(0000) knlGS:0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:17 fir-md1-s1 kernel: CR2: 0000000000e8cc28 CR3: 00000030172e8000 CR4: 00000000003407e0 Nov 05 20:58:17 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:17 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:17 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get+0x52/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ___slab_alloc+0x209/0x4f0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? kiblnd_post_tx_locked+0x7bb/0xa50 [ko2iblnd] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? cfs_hash_bd_lookup_intent+0x63/0x170 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_ldiskfs_map_inode_pages+0x143/0x420 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_write_prep+0x2b6/0x360 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_obd_preprw+0x65b/0x10a0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] tgt_brw_write+0xc7c/0x1cf0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? lustre_msg_buf_v2+0x1e0/0x1e0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __slab_free+0x81/0x2f0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? update_curr+0x14c/0x1e0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? account_entity_dequeue+0xae/0xd0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? dequeue_entity+0x11c/0x5e0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? tgt_lookup_reply+0x2d/0x190 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? wake_up_state+0x20/0x20 Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b Nov 05 20:58:17 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#7 stuck for 22s! [mdt_io03_013:41597] Nov 05 20:58:17 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:17 fir-md1-s1 kernel: CPU: 7 PID: 41597 Comm: mdt_io03_013 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:17 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:17 fir-md1-s1 kernel: task: ffffa12dd3985140 ti: ffffa12d54fe0000 task.ti: ffffa12d54fe0000 Nov 05 20:58:17 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x126/0x200 Nov 05 20:58:17 fir-md1-s1 kernel: RSP: 0018:ffffa12d54fe3800 EFLAGS: 00000246 Nov 05 20:58:17 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa130a614d740 RCX: 0000000000390000 Nov 05 20:58:17 fir-md1-s1 kernel: RDX: ffffa13e7f55b780 RSI: 0000000000b90101 RDI: ffffa13e3710f480 Nov 05 20:58:17 fir-md1-s1 kernel: RBP: ffffa12d54fe3800 R08: ffffa13e7f45b780 R09: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R10: ffffa13e7f45f140 R11: ffffda920a915400 R12: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R13: ffffa12d54fe37a0 R14: ffffa130a614d4a8 R15: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: FS: 00007fe333ba7740(0000) GS:ffffa13e7f440000(0000) knlGS:0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:17 fir-md1-s1 kernel: CR2: 00007fe3337921cc CR3: 000000364fa10000 CR4: 00000000003407e0 Nov 05 20:58:17 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:17 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:17 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get+0x52/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? kiblnd_post_tx_locked+0x7bb/0xa50 [ko2iblnd] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? cfs_hash_bd_lookup_intent+0x63/0x170 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_ldiskfs_map_inode_pages+0x143/0x420 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_write_prep+0x2b6/0x360 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_obd_preprw+0x65b/0x10a0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] tgt_brw_write+0xc7c/0x1cf0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __slab_free+0x81/0x2f0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? update_curr+0x14c/0x1e0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __enqueue_entity+0x78/0x80 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? enqueue_entity+0x2ef/0xbe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? tgt_lookup_reply+0x2d/0x190 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: Code: 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 41 8b 40 08 <85> c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b 17 0f b7 c2 Nov 05 20:58:17 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#9 stuck for 22s! [ldlm_cn01_010:28356] Nov 05 20:58:17 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:17 fir-md1-s1 kernel: CPU: 9 PID: 28356 Comm: ldlm_cn01_010 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:17 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:17 fir-md1-s1 kernel: task: ffffa11e2c98e180 ti: ffffa1375d76c000 task.ti: ffffa1375d76c000 Nov 05 20:58:17 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 Nov 05 20:58:17 fir-md1-s1 kernel: RSP: 0018:ffffa1375d76f880 EFLAGS: 00000246 Nov 05 20:58:17 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa11c4a0e5278 RCX: 0000000000490000 Nov 05 20:58:17 fir-md1-s1 kernel: RDX: ffffa12e3f61b780 RSI: 0000000000110101 RDI: ffffa13e3710f480 Nov 05 20:58:17 fir-md1-s1 kernel: RBP: ffffa1375d76f880 R08: ffffa11e3f69b780 R09: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R10: ffffa11e3f69f0c0 R11: ffffda91a4dd3e80 R12: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R13: ffffffffc13da26c R14: ffffa1375d76f810 R15: ffffa1375d76f8c0 Nov 05 20:58:17 fir-md1-s1 kernel: FS: 00007f3f2c767700(0000) GS:ffffa11e3f680000(0000) knlGS:0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:17 fir-md1-s1 kernel: CR2: 00007f3f2c7d8000 CR3: 000000364fa10000 CR4: 00000000003407e0 Nov 05 20:58:17 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:17 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:17 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_es_lru_del+0x32/0x70 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_clear_inode+0x41/0x90 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_free_inode+0x10b/0x610 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_iloc_dirty+0x68/0x80 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_evict_inode+0x472/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_inode_dirty+0x6f/0x210 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_evict_inode+0x57d/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] evict+0xb4/0x180 Nov 05 20:58:17 fir-md1-s1 kernel: [] iput+0xfc/0x190 Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_object_delete+0x1d2/0x330 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_free.isra.32+0x68/0x170 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? cfs_hash_bd_from_key+0x38/0xb0 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_put+0xc5/0x3d0 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? mdt_punch_hpreq_fini+0x10/0x10 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_dom_discard_cp_ast+0xb1/0x2b0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_work_cp_ast_lock+0xa8/0x1d0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x72/0x790 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? kmem_cache_alloc_node_trace+0x11d/0x210 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? lprocfs_counter_add+0xf9/0x160 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldlm_work_gl_ast_lock+0x3a0/0x3a0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_prep_set+0xd2/0x280 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_handle_cancel+0x232/0x2b0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_cancel_handler+0x158/0x590 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __schedule+0x42a/0x860 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: Code: 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 41 8b 40 08 <85> c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b 17 0f b7 c2 Nov 05 20:58:17 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#10 stuck for 22s! [ldlm_cn02_005:22634] Nov 05 20:58:17 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:17 fir-md1-s1 kernel: CPU: 10 PID: 22634 Comm: ldlm_cn02_005 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:17 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:17 fir-md1-s1 kernel: task: ffffa12de0bd0000 ti: ffffa12dd3e4c000 task.ti: ffffa12dd3e4c000 Nov 05 20:58:17 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x15e/0x200 Nov 05 20:58:17 fir-md1-s1 kernel: RSP: 0018:ffffa12dd3e4f880 EFLAGS: 00000212 Nov 05 20:58:17 fir-md1-s1 kernel: RAX: 0000000000000101 RBX: ffffa123e488bd88 RCX: 0000000000510000 Nov 05 20:58:17 fir-md1-s1 kernel: RDX: 0000000000410101 RSI: 0000000000000101 RDI: ffffa13e3710f480 Nov 05 20:58:17 fir-md1-s1 kernel: RBP: ffffa12dd3e4f880 R08: ffffa12e3f69b780 R09: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R10: ffffa12e3f69f0c0 R11: ffffda91d5dcf580 R12: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R13: ffffffffc13da26c R14: ffffa12dd3e4f810 R15: ffffa12dd3e4f8c0 Nov 05 20:58:17 fir-md1-s1 kernel: FS: 00007f30a909c700(0000) GS:ffffa12e3f680000(0000) knlGS:0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:17 fir-md1-s1 kernel: CR2: 00007f94c3031010 CR3: 0000004025876000 CR4: 00000000003407e0 Nov 05 20:58:17 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:17 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:17 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_es_lru_del+0x32/0x70 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_clear_inode+0x41/0x90 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_free_inode+0x10b/0x610 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_iloc_dirty+0x68/0x80 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_evict_inode+0x472/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_inode_dirty+0x6f/0x210 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_evict_inode+0x57d/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] evict+0xb4/0x180 Nov 05 20:58:17 fir-md1-s1 kernel: [] iput+0xfc/0x190 Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_object_delete+0x1d2/0x330 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_free.isra.32+0x68/0x170 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? cfs_hash_bd_from_key+0x38/0xb0 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_put+0xc5/0x3d0 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? mdt_punch_hpreq_fini+0x10/0x10 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_dom_discard_cp_ast+0xb1/0x2b0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_work_cp_ast_lock+0xa8/0x1d0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x72/0x790 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? kmem_cache_alloc_node_trace+0x11d/0x210 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? lprocfs_counter_add+0xf9/0x160 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldlm_work_gl_ast_lock+0x3a0/0x3a0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_prep_set+0xd2/0x280 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] __ldlm_reprocess_all+0x11f/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_reprocess_all+0x13/0x20 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_request_cancel+0x42f/0x780 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_handle_cancel+0x232/0x2b0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_cancel_handler+0x158/0x590 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? finish_task_switch+0xa9/0x1c0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: Code: 0f 18 09 8b 17 0f b7 c2 85 c0 74 21 83 f8 03 75 10 eb 1a 66 2e 0f 1f 84 00 00 00 00 00 85 c0 74 0c f3 90 8b 17 0f b7 c2 83 f8 03 <75> f0 be 01 00 00 00 eb 15 66 0f 1f 84 00 00 00 00 00 89 d0 f0 Nov 05 20:58:17 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#11 stuck for 22s! [ldlm_cn03_018:60052] Nov 05 20:58:17 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:17 fir-md1-s1 kernel: CPU: 11 PID: 60052 Comm: ldlm_cn03_018 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:17 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:17 fir-md1-s1 kernel: task: ffffa129b1a80000 ti: ffffa12553a58000 task.ti: ffffa12553a58000 Nov 05 20:58:17 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 Nov 05 20:58:17 fir-md1-s1 kernel: RSP: 0018:ffffa12553a5b880 EFLAGS: 00000246 Nov 05 20:58:17 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa130a1358b48 RCX: 0000000000590000 Nov 05 20:58:17 fir-md1-s1 kernel: RDX: ffffa13e7f51b780 RSI: 0000000000990101 RDI: ffffa13e3710f480 Nov 05 20:58:17 fir-md1-s1 kernel: RBP: ffffa12553a5b880 R08: ffffa13e7f49b780 R09: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R10: ffffa13e7f49f0c0 R11: ffffda920b3b9a80 R12: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R13: ffffffffc13da26c R14: ffffa12553a5b810 R15: ffffa12553a5b8c0 Nov 05 20:58:17 fir-md1-s1 kernel: FS: 00007f4fff165880(0000) GS:ffffa13e7f480000(0000) knlGS:0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:17 fir-md1-s1 kernel: CR2: 00007f4fec4861c4 CR3: 000000364fa10000 CR4: 00000000003407e0 Nov 05 20:58:17 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:17 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:17 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_es_lru_del+0x32/0x70 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_clear_inode+0x41/0x90 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_free_inode+0x10b/0x610 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_iloc_dirty+0x68/0x80 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_evict_inode+0x472/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_inode_dirty+0x6f/0x210 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_evict_inode+0x57d/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] evict+0xb4/0x180 Nov 05 20:58:17 fir-md1-s1 kernel: [] iput+0xfc/0x190 Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_object_delete+0x1d2/0x330 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_free.isra.32+0x68/0x170 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? cfs_hash_bd_from_key+0x38/0xb0 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_put+0xc5/0x3d0 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? mdt_punch_hpreq_fini+0x10/0x10 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_dom_discard_cp_ast+0xb1/0x2b0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_work_cp_ast_lock+0xa8/0x1d0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x72/0x790 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? kmem_cache_alloc_node_trace+0x11d/0x210 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? lprocfs_counter_add+0xf9/0x160 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldlm_work_gl_ast_lock+0x3a0/0x3a0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_prep_set+0xd2/0x280 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] __ldlm_reprocess_all+0x11f/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_reprocess_all+0x13/0x20 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_request_cancel+0x42f/0x780 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_handle_cancel+0x232/0x2b0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_cancel_handler+0x158/0x590 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b Nov 05 20:58:17 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#12 stuck for 22s! [ldlm_cn00_018:53023] Nov 05 20:58:17 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:17 fir-md1-s1 kernel: CPU: 12 PID: 53023 Comm: ldlm_cn00_018 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:17 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:17 fir-md1-s1 kernel: task: ffffa13dcda91040 ti: ffffa1377131c000 task.ti: ffffa1377131c000 Nov 05 20:58:17 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x126/0x200 Nov 05 20:58:17 fir-md1-s1 kernel: RSP: 0018:ffffa1377131f880 EFLAGS: 00000246 Nov 05 20:58:17 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa11636187c58 RCX: 0000000000610000 Nov 05 20:58:17 fir-md1-s1 kernel: RDX: ffffa11e3f75b780 RSI: 0000000000a90101 RDI: ffffa13e3710f480 Nov 05 20:58:17 fir-md1-s1 kernel: RBP: ffffa1377131f880 R08: ffffa10e3eedb780 R09: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R10: ffffa10e3eedf0c0 R11: ffffda91911e8400 R12: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R13: ffffffffc13da26c R14: ffffa1377131f810 R15: ffffa1377131f8c0 Nov 05 20:58:17 fir-md1-s1 kernel: FS: 00007f5587c09880(0000) GS:ffffa10e3eec0000(0000) knlGS:0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:17 fir-md1-s1 kernel: CR2: 00007f38cf67e000 CR3: 000000101efc0000 CR4: 00000000003407e0 Nov 05 20:58:17 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:17 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:17 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_es_lru_del+0x32/0x70 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_clear_inode+0x41/0x90 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_free_inode+0x10b/0x610 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_iloc_dirty+0x68/0x80 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_evict_inode+0x472/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_inode_dirty+0x6f/0x210 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_evict_inode+0x57d/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] evict+0xb4/0x180 Nov 05 20:58:17 fir-md1-s1 kernel: [] iput+0xfc/0x190 Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_object_delete+0x1d2/0x330 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_free.isra.32+0x68/0x170 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? cfs_hash_bd_from_key+0x38/0xb0 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_put+0xc5/0x3d0 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? mdt_punch_hpreq_fini+0x10/0x10 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_dom_discard_cp_ast+0xb1/0x2b0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_work_cp_ast_lock+0xa8/0x1d0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x72/0x790 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? kmem_cache_alloc_node_trace+0x11d/0x210 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? lprocfs_counter_add+0xf9/0x160 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldlm_work_gl_ast_lock+0x3a0/0x3a0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_prep_set+0xd2/0x280 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] __ldlm_reprocess_all+0x11f/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_reprocess_all+0x13/0x20 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_request_cancel+0x42f/0x780 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_handle_cancel+0x232/0x2b0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_cancel_handler+0x158/0x590 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __schedule+0x42a/0x860 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: Code: 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 41 8b 40 08 <85> c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b 17 0f b7 c2 Nov 05 20:58:17 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#13 stuck for 22s! [ldlm_cn01_018:60043] Nov 05 20:58:17 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:17 fir-md1-s1 kernel: CPU: 13 PID: 60043 Comm: ldlm_cn01_018 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:17 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:17 fir-md1-s1 kernel: task: ffffa124c1346180 ti: ffffa12af5ae8000 task.ti: ffffa12af5ae8000 Nov 05 20:58:17 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 Nov 05 20:58:17 fir-md1-s1 kernel: RSP: 0018:ffffa12af5aeb880 EFLAGS: 00000246 Nov 05 20:58:17 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa129c7f1ca18 RCX: 0000000000690000 Nov 05 20:58:17 fir-md1-s1 kernel: RDX: ffffa13e7f41b780 RSI: 0000000000190101 RDI: ffffa13e3710f480 Nov 05 20:58:17 fir-md1-s1 kernel: RBP: ffffa12af5aeb880 R08: ffffa11e3f6db780 R09: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R10: ffffa11e3f6df0c0 R11: ffffda919566ec00 R12: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R13: ffffffffc13da26c R14: ffffa12af5aeb810 R15: ffffa12af5aeb8c0 Nov 05 20:58:17 fir-md1-s1 kernel: FS: 00007f8f6cd9d880(0000) GS:ffffa11e3f6c0000(0000) knlGS:0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:17 fir-md1-s1 kernel: CR2: 00007f92f9f5f000 CR3: 000000364fa10000 CR4: 00000000003407e0 Nov 05 20:58:17 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:17 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:17 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_es_lru_del+0x32/0x70 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_clear_inode+0x41/0x90 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_free_inode+0x10b/0x610 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_iloc_dirty+0x68/0x80 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_evict_inode+0x472/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_inode_dirty+0x6f/0x210 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_evict_inode+0x57d/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] evict+0xb4/0x180 Nov 05 20:58:17 fir-md1-s1 kernel: [] iput+0xfc/0x190 Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_object_delete+0x1d2/0x330 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_free.isra.32+0x68/0x170 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? cfs_hash_bd_from_key+0x38/0xb0 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_put+0xc5/0x3d0 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? mdt_punch_hpreq_fini+0x10/0x10 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_dom_discard_cp_ast+0xb1/0x2b0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_work_cp_ast_lock+0xa8/0x1d0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x72/0x790 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? kmem_cache_alloc_node_trace+0x11d/0x210 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? lprocfs_counter_add+0xf9/0x160 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldlm_work_gl_ast_lock+0x3a0/0x3a0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_prep_set+0xd2/0x280 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] __ldlm_reprocess_all+0x11f/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_reprocess_all+0x13/0x20 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_request_cancel+0x42f/0x780 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_handle_cancel+0x232/0x2b0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_cancel_handler+0x158/0x590 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __schedule+0x42a/0x860 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b Nov 05 20:58:17 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#14 stuck for 22s! [mdt02_063:41272] Nov 05 20:58:17 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:17 fir-md1-s1 kernel: CPU: 14 PID: 41272 Comm: mdt02_063 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:17 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:17 fir-md1-s1 kernel: task: ffffa11e2b2f0000 ti: ffffa11e2ad0c000 task.ti: ffffa11e2ad0c000 Nov 05 20:58:17 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x126/0x200 Nov 05 20:58:17 fir-md1-s1 kernel: RSP: 0018:ffffa11e2ad0f930 EFLAGS: 00000246 Nov 05 20:58:17 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa130a129dad8 RCX: 0000000000710000 Nov 05 20:58:17 fir-md1-s1 kernel: RDX: ffffa10e3ef9b780 RSI: 0000000000c10101 RDI: ffffa13e3710f480 Nov 05 20:58:17 fir-md1-s1 kernel: RBP: ffffa11e2ad0f930 R08: ffffa12e3f6db780 R09: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R10: ffffa12e3f6df0c0 R11: ffffda91f6155bc0 R12: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R13: ffffffffc13da26c R14: ffffa11e2ad0f8c0 R15: ffffa11e2ad0f970 Nov 05 20:58:17 fir-md1-s1 kernel: FS: 00007f7a69e73740(0000) GS:ffffa12e3f6c0000(0000) knlGS:0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:17 fir-md1-s1 kernel: CR2: 00007f38cf67e000 CR3: 00000030172e8000 CR4: 00000000003407e0 Nov 05 20:58:17 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:17 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:17 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_es_lru_del+0x32/0x70 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_clear_inode+0x41/0x90 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_free_inode+0x10b/0x610 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_iloc_dirty+0x68/0x80 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_evict_inode+0x472/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_inode_dirty+0x6f/0x210 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_evict_inode+0x57d/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] evict+0xb4/0x180 Nov 05 20:58:17 fir-md1-s1 kernel: [] iput+0xfc/0x190 Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_object_delete+0x1d2/0x330 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_free.isra.32+0x68/0x170 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? cfs_hash_bd_from_key+0x38/0xb0 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_put+0xc5/0x3d0 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_reint_unlink+0x813/0x14b0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? mdt_thread_info_init+0xa4/0x1e0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: Code: 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 41 8b 40 08 <85> c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b 17 0f b7 c2 Nov 05 20:58:17 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#15 stuck for 22s! [mdt03_073:41348] Nov 05 20:58:17 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:17 fir-md1-s1 kernel: CPU: 15 PID: 41348 Comm: mdt03_073 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:17 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:17 fir-md1-s1 kernel: task: ffffa12ee958a080 ti: ffffa111b79dc000 task.ti: ffffa111b79dc000 Nov 05 20:58:17 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 Nov 05 20:58:17 fir-md1-s1 kernel: RSP: 0018:ffffa111b79df930 EFLAGS: 00000246 Nov 05 20:58:17 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa11c4a2f5f08 RCX: 0000000000790000 Nov 05 20:58:17 fir-md1-s1 kernel: RDX: ffffa11e3f79b780 RSI: 0000000000c90101 RDI: ffffa13e3710f480 Nov 05 20:58:17 fir-md1-s1 kernel: RBP: ffffa111b79df930 R08: ffffa13e7f4db780 R09: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R10: ffffa13e7f4df0c0 R11: ffffda923c80c000 R12: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R13: ffffffffc13da26c R14: ffffa111b79df8c0 R15: ffffa111b79df970 Nov 05 20:58:17 fir-md1-s1 kernel: FS: 00007f960ee14740(0000) GS:ffffa13e7f4c0000(0000) knlGS:0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:17 fir-md1-s1 kernel: CR2: 00007f960e9ff1cc CR3: 000000364fa10000 CR4: 00000000003407e0 Nov 05 20:58:17 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:17 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:17 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_es_lru_del+0x32/0x70 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_clear_inode+0x41/0x90 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_free_inode+0x10b/0x610 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_iloc_dirty+0x68/0x80 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_evict_inode+0x472/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_inode_dirty+0x6f/0x210 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_evict_inode+0x57d/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] evict+0xb4/0x180 Nov 05 20:58:17 fir-md1-s1 kernel: [] iput+0xfc/0x190 Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_object_delete+0x1d2/0x330 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_free.isra.32+0x68/0x170 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? cfs_hash_bd_from_key+0x38/0xb0 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_put+0xc5/0x3d0 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_reint_unlink+0x813/0x14b0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? mdt_thread_info_init+0xa4/0x1e0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __schedule+0x42a/0x860 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b Nov 05 20:58:17 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#16 stuck for 22s! [mdt_io00_043:41776] Nov 05 20:58:17 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:17 fir-md1-s1 kernel: CPU: 16 PID: 41776 Comm: mdt_io00_043 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:17 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:17 fir-md1-s1 kernel: task: ffffa11e283c30c0 ti: ffffa11e29d64000 task.ti: ffffa11e29d64000 Nov 05 20:58:17 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 Nov 05 20:58:17 fir-md1-s1 kernel: RSP: 0018:ffffa11e29d67800 EFLAGS: 00000246 Nov 05 20:58:17 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa1179deae3d0 RCX: 0000000000810000 Nov 05 20:58:17 fir-md1-s1 kernel: RDX: ffffa10e3f0db780 RSI: 0000000001610101 RDI: ffffa13e3710f480 Nov 05 20:58:17 fir-md1-s1 kernel: RBP: ffffa11e29d67800 R08: ffffa10e3ef1b780 R09: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R10: ffffa10e3ef1f140 R11: ffffda91ab5dec00 R12: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R13: ffffa11e29d677a0 R14: ffffa1179deae138 R15: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: FS: 00007f89ef4b3880(0000) GS:ffffa10e3ef00000(0000) knlGS:0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:17 fir-md1-s1 kernel: CR2: 00007f89ec2d4000 CR3: 0000003035674000 CR4: 00000000003407e0 Nov 05 20:58:17 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:17 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:17 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get+0x52/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ___slab_alloc+0x209/0x4f0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? kiblnd_post_tx_locked+0x7bb/0xa50 [ko2iblnd] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? cfs_hash_bd_lookup_intent+0x63/0x170 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_ldiskfs_map_inode_pages+0x143/0x420 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_write_prep+0x2b6/0x360 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_obd_preprw+0x65b/0x10a0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] tgt_brw_write+0xc7c/0x1cf0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? lustre_msg_buf_v2+0x1e0/0x1e0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __enqueue_entity+0x78/0x80 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? enqueue_entity+0x2ef/0xbe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? tgt_lookup_reply+0x2d/0x190 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __schedule+0x42a/0x860 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b Nov 05 20:58:17 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#17 stuck for 22s! [mdt01_077:41351] Nov 05 20:58:17 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:17 fir-md1-s1 kernel: CPU: 17 PID: 41351 Comm: mdt01_077 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:17 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:17 fir-md1-s1 kernel: task: ffffa11e3743a080 ti: ffffa11e343dc000 task.ti: ffffa11e343dc000 Nov 05 20:58:17 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 Nov 05 20:58:17 fir-md1-s1 kernel: RSP: 0018:ffffa11e343df510 EFLAGS: 00000246 Nov 05 20:58:17 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa11c4a34a930 RCX: 0000000000890000 Nov 05 20:58:17 fir-md1-s1 kernel: RDX: ffffa13e7f61b780 RSI: 0000000001190101 RDI: ffffa13e3710f480 Nov 05 20:58:17 fir-md1-s1 kernel: RBP: ffffa11e343df510 R08: ffffa11e3f71b780 R09: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R10: ffffa11e3f71f140 R11: ffffda91b5cd5a00 R12: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R13: ffffa11e343df4b0 R14: ffffa11c4a34a698 R15: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: FS: 00007f3f2c767700(0000) GS:ffffa11e3f700000(0000) knlGS:0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:17 fir-md1-s1 kernel: CR2: 00007f3f2c7d8000 CR3: 000000364fa10000 CR4: 00000000003407e0 Nov 05 20:58:17 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:17 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:17 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? lprocfs_counter_sub+0xc1/0x130 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? dynlock_unlock+0x194/0x1e0 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __brelse+0x3d/0x50 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? iam_path_release+0x42/0x60 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_getblk+0x65/0x200 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_bread+0x27/0xc0 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_append+0x81/0x150 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_init_new_dir+0xcf/0x230 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __brelse+0x3d/0x50 Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_add_dot_dotdot+0x4e/0x90 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_add_dot_dotdot_internal.isra.76+0x5f/0x80 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_index_ea_insert+0xbaa/0x12f0 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lod_sub_insert+0x1c1/0x340 [lod] Nov 05 20:58:17 fir-md1-s1 kernel: [] lod_insert+0x24/0x30 [lod] Nov 05 20:58:17 fir-md1-s1 kernel: [] __mdd_index_insert_only+0x1cc/0x280 [mdd] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdd_create_object+0x6c8/0x820 [mdd] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdd_create+0xe31/0x14e0 [mdd] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_create+0xb54/0x1090 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? lprocfs_stats_lock+0x24/0xd0 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_reint_create+0x16b/0x360 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? mdt_thread_info_init+0xa4/0x1e0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b Nov 05 20:58:17 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#18 stuck for 22s! [ldlm_cn02_016:57179] Nov 05 20:58:17 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:17 fir-md1-s1 kernel: CPU: 18 PID: 57179 Comm: ldlm_cn02_016 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:17 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:17 fir-md1-s1 kernel: task: ffffa12ace242080 ti: ffffa1081faf8000 task.ti: ffffa1081faf8000 Nov 05 20:58:17 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 Nov 05 20:58:17 fir-md1-s1 kernel: RSP: 0018:ffffa1081fafb880 EFLAGS: 00000246 Nov 05 20:58:17 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa11c4a30dad8 RCX: 0000000000910000 Nov 05 20:58:17 fir-md1-s1 kernel: RDX: ffffa11e3f71b780 RSI: 0000000000890101 RDI: ffffa13e3710f480 Nov 05 20:58:17 fir-md1-s1 kernel: RBP: ffffa1081fafb880 R08: ffffa12e3f71b780 R09: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R10: ffffa12e3f71f0c0 R11: ffffda91d6ed6e80 R12: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R13: ffffffffc13da26c R14: ffffa1081fafb810 R15: ffffa1081fafb8c0 Nov 05 20:58:17 fir-md1-s1 kernel: FS: 00007f94d47aa880(0000) GS:ffffa12e3f700000(0000) knlGS:0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:17 fir-md1-s1 kernel: CR2: 0000000000412480 CR3: 000000202a620000 CR4: 00000000003407e0 Nov 05 20:58:17 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:17 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:17 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_es_lru_del+0x32/0x70 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_clear_inode+0x41/0x90 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_free_inode+0x10b/0x610 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_iloc_dirty+0x68/0x80 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_evict_inode+0x472/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_inode_dirty+0x6f/0x210 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_evict_inode+0x57d/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] evict+0xb4/0x180 Nov 05 20:58:17 fir-md1-s1 kernel: [] iput+0xfc/0x190 Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_object_delete+0x1d2/0x330 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_free.isra.32+0x68/0x170 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? cfs_hash_bd_from_key+0x38/0xb0 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_put+0xc5/0x3d0 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? mdt_punch_hpreq_fini+0x10/0x10 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_dom_discard_cp_ast+0xb1/0x2b0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_work_cp_ast_lock+0xa8/0x1d0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x72/0x790 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? kmem_cache_alloc_node_trace+0x11d/0x210 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? lprocfs_counter_add+0xf9/0x160 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldlm_work_gl_ast_lock+0x3a0/0x3a0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_prep_set+0xd2/0x280 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] __ldlm_reprocess_all+0x11f/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_reprocess_all+0x13/0x20 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_request_cancel+0x42f/0x780 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_handle_cancel+0x232/0x2b0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_cancel_handler+0x158/0x590 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __schedule+0x42a/0x860 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b Nov 05 20:58:17 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#19 stuck for 22s! [ldlm_cn03_016:60048] Nov 05 20:58:17 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:17 fir-md1-s1 kernel: CPU: 19 PID: 60048 Comm: ldlm_cn03_016 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:17 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:17 fir-md1-s1 kernel: task: ffffa124c1344100 ti: ffffa121df2c4000 task.ti: ffffa121df2c4000 Nov 05 20:58:17 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 Nov 05 20:58:17 fir-md1-s1 kernel: RSP: 0018:ffffa121df2c7880 EFLAGS: 00000246 Nov 05 20:58:17 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa1257bb17828 RCX: 0000000000990000 Nov 05 20:58:17 fir-md1-s1 kernel: RDX: ffffa11e3f8db780 RSI: 0000000001690101 RDI: ffffa13e3710f480 Nov 05 20:58:17 fir-md1-s1 kernel: RBP: ffffa121df2c7880 R08: ffffa13e7f51b780 R09: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R10: ffffa13e7f51f0c0 R11: ffffda920af313c0 R12: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R13: ffffffffc13da26c R14: ffffa121df2c7810 R15: ffffa121df2c78c0 Nov 05 20:58:17 fir-md1-s1 kernel: FS: 00007f63eb995740(0000) GS:ffffa13e7f500000(0000) knlGS:0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:17 fir-md1-s1 kernel: CR2: 00000000006e6bd4 CR3: 000000364fa10000 CR4: 00000000003407e0 Nov 05 20:58:17 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:17 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:17 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_es_lru_del+0x32/0x70 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_clear_inode+0x41/0x90 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_free_inode+0x10b/0x610 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_iloc_dirty+0x68/0x80 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_evict_inode+0x472/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_inode_dirty+0x6f/0x210 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_evict_inode+0x57d/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] evict+0xb4/0x180 Nov 05 20:58:17 fir-md1-s1 kernel: [] iput+0xfc/0x190 Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_object_delete+0x1d2/0x330 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_free.isra.32+0x68/0x170 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? cfs_hash_bd_from_key+0x38/0xb0 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_put+0xc5/0x3d0 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? mdt_punch_hpreq_fini+0x10/0x10 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_dom_discard_cp_ast+0xb1/0x2b0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_work_cp_ast_lock+0xa8/0x1d0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x72/0x790 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldlm_run_ast_work+0x38/0x3a0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? kmem_cache_alloc_node_trace+0x11d/0x210 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? lprocfs_counter_add+0xf9/0x160 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldlm_work_gl_ast_lock+0x3a0/0x3a0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_prep_set+0xd2/0x280 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] __ldlm_reprocess_all+0x11f/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_reprocess_all+0x13/0x20 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_request_cancel+0x42f/0x780 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_handle_cancel+0x232/0x2b0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_cancel_handler+0x158/0x590 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b Nov 05 20:58:17 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#20 stuck for 22s! [ldlm_cn00_001:21578] Nov 05 20:58:17 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:17 fir-md1-s1 kernel: CPU: 20 PID: 21578 Comm: ldlm_cn00_001 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:17 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:17 fir-md1-s1 kernel: task: ffffa12e16016180 ti: ffffa12e01e04000 task.ti: ffffa12e01e04000 Nov 05 20:58:17 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x126/0x200 Nov 05 20:58:17 fir-md1-s1 kernel: RSP: 0018:ffffa12e01e07880 EFLAGS: 00000246 Nov 05 20:58:17 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa1163623b0f8 RCX: 0000000000a10000 Nov 05 20:58:17 fir-md1-s1 kernel: RDX: ffffa11e3f81b780 RSI: 0000000001090101 RDI: ffffa13e3710f480 Nov 05 20:58:17 fir-md1-s1 kernel: RBP: ffffa12e01e07880 R08: ffffa10e3ef5b780 R09: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R10: ffffa10e3ef5f0c0 R11: ffffda91a87a73c0 R12: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R13: ffffffffc13da26c R14: ffffa12e01e07810 R15: ffffa12e01e078c0 Nov 05 20:58:17 fir-md1-s1 kernel: FS: 00007f5587c09880(0000) GS:ffffa10e3ef40000(0000) knlGS:0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:17 fir-md1-s1 kernel: CR2: 00007f38cf67e000 CR3: 000000101efc0000 CR4: 00000000003407e0 Nov 05 20:58:17 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:17 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:17 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_es_lru_del+0x32/0x70 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_clear_inode+0x41/0x90 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_free_inode+0x10b/0x610 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_iloc_dirty+0x68/0x80 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_evict_inode+0x472/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_inode_dirty+0x6f/0x210 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_evict_inode+0x57d/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] evict+0xb4/0x180 Nov 05 20:58:17 fir-md1-s1 kernel: [] iput+0xfc/0x190 Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_object_delete+0x1d2/0x330 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_free.isra.32+0x68/0x170 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? cfs_hash_bd_from_key+0x38/0xb0 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_put+0xc5/0x3d0 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? mdt_punch_hpreq_fini+0x10/0x10 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_dom_discard_cp_ast+0xb1/0x2b0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_work_cp_ast_lock+0xa8/0x1d0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x72/0x790 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? kmem_cache_alloc_node_trace+0x11d/0x210 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? lprocfs_counter_add+0xf9/0x160 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldlm_work_gl_ast_lock+0x3a0/0x3a0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_prep_set+0xd2/0x280 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] __ldlm_reprocess_all+0x11f/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_reprocess_all+0x13/0x20 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_request_cancel+0x42f/0x780 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_handle_cancel+0x232/0x2b0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_cancel_handler+0x158/0x590 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __schedule+0x42a/0x860 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: Code: 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 41 8b 40 08 <85> c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b 17 0f b7 c2 Nov 05 20:58:17 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#21 stuck for 22s! [ldlm_cn01_004:22624] Nov 05 20:58:17 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:17 fir-md1-s1 kernel: CPU: 21 PID: 22624 Comm: ldlm_cn01_004 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:17 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:17 fir-md1-s1 kernel: task: ffffa12ddfc1e180 ti: ffffa12e36a18000 task.ti: ffffa12e36a18000 Nov 05 20:58:17 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 Nov 05 20:58:17 fir-md1-s1 kernel: RSP: 0018:ffffa12e36a1b880 EFLAGS: 00000246 Nov 05 20:58:17 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa11c4a2af828 RCX: 0000000000a90000 Nov 05 20:58:17 fir-md1-s1 kernel: RDX: ffffa10e3ef5b780 RSI: 0000000000a10101 RDI: ffffa13e3710f480 Nov 05 20:58:17 fir-md1-s1 kernel: RBP: ffffa12e36a1b880 R08: ffffa11e3f75b780 R09: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R10: ffffa11e3f75f0c0 R11: ffffda9196c8ff00 R12: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R13: ffffffffc13da26c R14: ffffa12e36a1b810 R15: ffffa12e36a1b8c0 Nov 05 20:58:17 fir-md1-s1 kernel: FS: 00007f0d40dc7700(0000) GS:ffffa11e3f740000(0000) knlGS:0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:17 fir-md1-s1 kernel: CR2: 00007f92f9f5f000 CR3: 000000402e2d0000 CR4: 00000000003407e0 Nov 05 20:58:17 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:17 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:17 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_es_lru_del+0x32/0x70 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_clear_inode+0x41/0x90 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_free_inode+0x10b/0x610 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_iloc_dirty+0x68/0x80 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_evict_inode+0x472/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_inode_dirty+0x6f/0x210 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_evict_inode+0x57d/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] evict+0xb4/0x180 Nov 05 20:58:17 fir-md1-s1 kernel: [] iput+0xfc/0x190 Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_object_delete+0x1d2/0x330 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_free.isra.32+0x68/0x170 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? cfs_hash_bd_from_key+0x38/0xb0 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_put+0xc5/0x3d0 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? mdt_punch_hpreq_fini+0x10/0x10 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_dom_discard_cp_ast+0xb1/0x2b0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_work_cp_ast_lock+0xa8/0x1d0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x72/0x790 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldlm_run_ast_work+0x38/0x3a0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? kmem_cache_alloc_node_trace+0x11d/0x210 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? lprocfs_counter_add+0xf9/0x160 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldlm_work_gl_ast_lock+0x3a0/0x3a0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_prep_set+0xd2/0x280 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] __ldlm_reprocess_all+0x11f/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_reprocess_all+0x13/0x20 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_request_cancel+0x42f/0x780 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_handle_cancel+0x232/0x2b0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_cancel_handler+0x158/0x590 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? wake_up_state+0x20/0x20 Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __schedule+0x42a/0x860 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b Nov 05 20:58:17 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#22 stuck for 22s! [mdt02_029:41121] Nov 05 20:58:17 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:17 fir-md1-s1 kernel: CPU: 22 PID: 41121 Comm: mdt02_029 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:17 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:17 fir-md1-s1 kernel: task: ffffa11e302e9040 ti: ffffa11e39f40000 task.ti: ffffa11e39f40000 Nov 05 20:58:17 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x1d6/0x200 Nov 05 20:58:17 fir-md1-s1 kernel: RSP: 0018:ffffa11e39f43930 EFLAGS: 00000293 Nov 05 20:58:17 fir-md1-s1 kernel: RAX: 0000000000000001 RBX: ffffa116363a2898 RCX: 0000000000000001 Nov 05 20:58:17 fir-md1-s1 kernel: RDX: 0000000000000101 RSI: 0000000000000001 RDI: ffffa13e3710f480 Nov 05 20:58:17 fir-md1-s1 kernel: RBP: ffffa11e39f43930 R08: 0000000000000101 R09: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R10: ffffa12e3f75f0c0 R11: ffffda91d4977d80 R12: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R13: ffffffffc13da26c R14: ffffa11e39f438c0 R15: ffffa11e39f43970 Nov 05 20:58:17 fir-md1-s1 kernel: FS: 00007fc8705cf740(0000) GS:ffffa12e3f740000(0000) knlGS:0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:17 fir-md1-s1 kernel: CR2: 00007fe17d741024 CR3: 0000004029492000 CR4: 00000000003407e0 Nov 05 20:58:17 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:17 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:17 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_es_lru_del+0x32/0x70 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_clear_inode+0x41/0x90 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_free_inode+0x10b/0x610 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_iloc_dirty+0x68/0x80 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_evict_inode+0x472/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_inode_dirty+0x6f/0x210 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_evict_inode+0x57d/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] evict+0xb4/0x180 Nov 05 20:58:17 fir-md1-s1 kernel: [] iput+0xfc/0x190 Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_object_delete+0x1d2/0x330 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_free.isra.32+0x68/0x170 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? cfs_hash_bd_from_key+0x38/0xb0 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_put+0xc5/0x3d0 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_reint_unlink+0x813/0x14b0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? mdt_thread_info_init+0xa4/0x1e0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: Code: f4 e9 93 fe ff ff 0f 1f 80 00 00 00 00 83 fa 01 75 11 0f 1f 00 e9 68 fe ff ff 0f 1f 00 85 c0 74 0c f3 90 8b 07 0f b6 c0 83 f8 03 <75> f0 b8 01 00 00 00 66 89 07 5d c3 66 0f 1f 44 00 00 f3 90 4d Nov 05 20:58:17 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#23 stuck for 22s! [mdt_io03_040:41746] Nov 05 20:58:17 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:17 fir-md1-s1 kernel: CPU: 23 PID: 41746 Comm: mdt_io03_040 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:17 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:17 fir-md1-s1 kernel: task: ffffa11e282c8000 ti: ffffa11e282b8000 task.ti: ffffa11e282b8000 Nov 05 20:58:17 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 Nov 05 20:58:17 fir-md1-s1 kernel: RSP: 0018:ffffa11e282bb800 EFLAGS: 00000246 Nov 05 20:58:17 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa130a6142d60 RCX: 0000000000b90000 Nov 05 20:58:17 fir-md1-s1 kernel: RDX: ffffa12e3f79b780 RSI: 0000000000d10101 RDI: ffffa13e3710f480 Nov 05 20:58:17 fir-md1-s1 kernel: RBP: ffffa11e282bb800 R08: ffffa13e7f55b780 R09: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R10: ffffa13e7f55f140 R11: ffffda920af4e800 R12: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R13: ffffa11e282bb7a0 R14: ffffa130a6142ac8 R15: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: FS: 00007f28fc018840(0000) GS:ffffa13e7f540000(0000) knlGS:0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:17 fir-md1-s1 kernel: CR2: 00007fff565d8ec8 CR3: 000000364fa10000 CR4: 00000000003407e0 Nov 05 20:58:17 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:17 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:17 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get+0x52/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? kiblnd_post_tx_locked+0x7bb/0xa50 [ko2iblnd] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? cfs_hash_bd_lookup_intent+0x63/0x170 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_ldiskfs_map_inode_pages+0x143/0x420 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_write_prep+0x2b6/0x360 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_obd_preprw+0x65b/0x10a0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] tgt_brw_write+0xc7c/0x1cf0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? update_curr+0x14c/0x1e0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __enqueue_entity+0x78/0x80 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? enqueue_entity+0x2ef/0xbe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? tgt_lookup_reply+0x2d/0x190 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b Nov 05 20:58:17 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#24 stuck for 22s! [ldlm_cn00_019:53024] Nov 05 20:58:17 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:17 fir-md1-s1 kernel: CPU: 24 PID: 53024 Comm: ldlm_cn00_019 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:17 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:17 fir-md1-s1 kernel: task: ffffa12d55e3e180 ti: ffffa1392db80000 task.ti: ffffa1392db80000 Nov 05 20:58:17 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 Nov 05 20:58:17 fir-md1-s1 kernel: RSP: 0018:ffffa1392db83880 EFLAGS: 00000246 Nov 05 20:58:17 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa11c4a0b93a8 RCX: 0000000000c10000 Nov 05 20:58:17 fir-md1-s1 kernel: RDX: ffffa13e7f5db780 RSI: 0000000000f90101 RDI: ffffa13e3710f480 Nov 05 20:58:17 fir-md1-s1 kernel: RBP: ffffa1392db83880 R08: ffffa10e3ef9b780 R09: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R10: ffffa10e3ef9f0c0 R11: ffffda9184666b00 R12: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R13: ffffffffc13da26c R14: ffffa1392db83810 R15: ffffa1392db838c0 Nov 05 20:58:17 fir-md1-s1 kernel: FS: 00007f92f9f50900(0000) GS:ffffa10e3ef80000(0000) knlGS:0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:17 fir-md1-s1 kernel: CR2: 00007f92f9f5f000 CR3: 0000002024620000 CR4: 00000000003407e0 Nov 05 20:58:17 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:17 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:17 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_es_lru_del+0x32/0x70 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_clear_inode+0x41/0x90 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_free_inode+0x10b/0x610 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_iloc_dirty+0x68/0x80 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_evict_inode+0x472/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_inode_dirty+0x6f/0x210 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_evict_inode+0x57d/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] evict+0xb4/0x180 Nov 05 20:58:17 fir-md1-s1 kernel: [] iput+0xfc/0x190 Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_object_delete+0x1d2/0x330 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_free.isra.32+0x68/0x170 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? cfs_hash_bd_from_key+0x38/0xb0 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_put+0xc5/0x3d0 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? mdt_punch_hpreq_fini+0x10/0x10 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_dom_discard_cp_ast+0xb1/0x2b0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_work_cp_ast_lock+0xa8/0x1d0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x72/0x790 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? kmem_cache_alloc_node_trace+0x11d/0x210 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? lprocfs_counter_add+0xf9/0x160 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldlm_work_gl_ast_lock+0x3a0/0x3a0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_prep_set+0xd2/0x280 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] __ldlm_reprocess_all+0x11f/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_reprocess_all+0x13/0x20 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_request_cancel+0x42f/0x780 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_handle_cancel+0x232/0x2b0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_cancel_handler+0x158/0x590 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __schedule+0x42a/0x860 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b Nov 05 20:58:17 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#25 stuck for 22s! [mdt01_035:41154] Nov 05 20:58:17 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:17 fir-md1-s1 kernel: CPU: 25 PID: 41154 Comm: mdt01_035 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:17 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:17 fir-md1-s1 kernel: task: ffffa11e28699040 ti: ffffa119ede9c000 task.ti: ffffa119ede9c000 Nov 05 20:58:17 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 Nov 05 20:58:17 fir-md1-s1 kernel: RSP: 0018:ffffa119ede9f930 EFLAGS: 00000246 Nov 05 20:58:17 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa11c4a09c5e8 RCX: 0000000000c90000 Nov 05 20:58:17 fir-md1-s1 kernel: RDX: ffffa12e3f6db780 RSI: 0000000000710101 RDI: ffffa13e3710f480 Nov 05 20:58:17 fir-md1-s1 kernel: RBP: ffffa119ede9f930 R08: ffffa11e3f79b780 R09: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R10: ffffa11e3f79f0c0 R11: ffffda919b6f8a80 R12: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R13: ffffffffc13da26c R14: ffffa119ede9f8c0 R15: ffffa119ede9f970 Nov 05 20:58:17 fir-md1-s1 kernel: FS: 00007f3f2c767700(0000) GS:ffffa11e3f780000(0000) knlGS:0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:17 fir-md1-s1 kernel: CR2: 00007f3f2c7d8000 CR3: 000000364fa10000 CR4: 00000000003407e0 Nov 05 20:58:17 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:17 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:17 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_es_lru_del+0x32/0x70 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_clear_inode+0x41/0x90 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_free_inode+0x10b/0x610 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_iloc_dirty+0x68/0x80 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_evict_inode+0x472/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_inode_dirty+0x6f/0x210 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_evict_inode+0x57d/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] evict+0xb4/0x180 Nov 05 20:58:17 fir-md1-s1 kernel: [] iput+0xfc/0x190 Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_object_delete+0x1d2/0x330 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_free.isra.32+0x68/0x170 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? cfs_hash_bd_from_key+0x38/0xb0 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_put+0xc5/0x3d0 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_reint_unlink+0x813/0x14b0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? mdt_thread_info_init+0xa4/0x1e0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b Nov 05 20:58:17 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#26 stuck for 22s! [mdt_io02_020:41632] Nov 05 20:58:17 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:17 fir-md1-s1 kernel: CPU: 26 PID: 41632 Comm: mdt_io02_020 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:17 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:17 fir-md1-s1 kernel: task: ffffa11e344a30c0 ti: ffffa13e32768000 task.ti: ffffa13e32768000 Nov 05 20:58:17 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 Nov 05 20:58:17 fir-md1-s1 kernel: RSP: 0018:ffffa13e3276b800 EFLAGS: 00000246 Nov 05 20:58:17 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa114cf21c680 RCX: 0000000000d10000 Nov 05 20:58:17 fir-md1-s1 kernel: RDX: ffffa12e3f89b780 RSI: 0000000001510101 RDI: ffffa13e3710f480 Nov 05 20:58:17 fir-md1-s1 kernel: RBP: ffffa13e3276b800 R08: ffffa12e3f79b780 R09: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R10: ffffa12e3f79f140 R11: ffffda91de5c0e00 R12: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R13: ffffa13e3276b7a0 R14: ffffa114cf21c3e8 R15: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: FS: 00007f30a908b700(0000) GS:ffffa12e3f780000(0000) knlGS:0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:17 fir-md1-s1 kernel: CR2: 00007f12d03b0be8 CR3: 0000004025876000 CR4: 00000000003407e0 Nov 05 20:58:17 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:17 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:17 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get+0x52/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? kiblnd_post_tx_locked+0x7bb/0xa50 [ko2iblnd] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? cfs_hash_bd_lookup_intent+0x63/0x170 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_ldiskfs_map_inode_pages+0x143/0x420 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_write_prep+0x2b6/0x360 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_obd_preprw+0x65b/0x10a0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] tgt_brw_write+0xc7c/0x1cf0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? lustre_msg_buf_v2+0x1e0/0x1e0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? update_curr+0x14c/0x1e0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __enqueue_entity+0x78/0x80 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? enqueue_entity+0x2ef/0xbe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? tgt_lookup_reply+0x2d/0x190 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b Nov 05 20:58:17 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#27 stuck for 22s! [ldlm_cn03_020:61203] Nov 05 20:58:17 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:17 fir-md1-s1 kernel: CPU: 27 PID: 61203 Comm: ldlm_cn03_020 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:17 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:17 fir-md1-s1 kernel: task: ffffa10e1cc41040 ti: ffffa10befa90000 task.ti: ffffa10befa90000 Nov 05 20:58:17 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 Nov 05 20:58:17 fir-md1-s1 kernel: RSP: 0018:ffffa10befa93880 EFLAGS: 00000246 Nov 05 20:58:17 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa129c7f16338 RCX: 0000000000d90000 Nov 05 20:58:17 fir-md1-s1 kernel: RDX: ffffa13e7f69b780 RSI: 0000000001590101 RDI: ffffa13e3710f480 Nov 05 20:58:17 fir-md1-s1 kernel: RBP: ffffa10befa93880 R08: ffffa13e7f59b780 R09: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R10: ffffa13e7f59f0c0 R11: ffffda920b34cc00 R12: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R13: ffffffffc13da26c R14: ffffa10befa93810 R15: ffffa10befa938c0 Nov 05 20:58:17 fir-md1-s1 kernel: FS: 00007fe66ee55740(0000) GS:ffffa13e7f580000(0000) knlGS:0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:17 fir-md1-s1 kernel: CR2: 00007fe66ea401cc CR3: 000000364fa10000 CR4: 00000000003407e0 Nov 05 20:58:17 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:17 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:17 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_es_lru_del+0x32/0x70 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_clear_inode+0x41/0x90 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_free_inode+0x10b/0x610 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_iloc_dirty+0x68/0x80 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_evict_inode+0x472/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_inode_dirty+0x6f/0x210 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_evict_inode+0x57d/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] evict+0xb4/0x180 Nov 05 20:58:17 fir-md1-s1 kernel: [] iput+0xfc/0x190 Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_object_delete+0x1d2/0x330 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_free.isra.32+0x68/0x170 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? cfs_hash_bd_from_key+0x38/0xb0 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_put+0xc5/0x3d0 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? mdt_punch_hpreq_fini+0x10/0x10 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_dom_discard_cp_ast+0xb1/0x2b0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_work_cp_ast_lock+0xa8/0x1d0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x72/0x790 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? kmem_cache_alloc_node_trace+0x11d/0x210 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? lprocfs_counter_add+0xf9/0x160 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldlm_work_gl_ast_lock+0x3a0/0x3a0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_prep_set+0xd2/0x280 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] __ldlm_reprocess_all+0x11f/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_reprocess_all+0x13/0x20 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_request_cancel+0x42f/0x780 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_handle_cancel+0x232/0x2b0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_cancel_handler+0x158/0x590 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __schedule+0x42a/0x860 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b Nov 05 20:58:17 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#28 stuck for 22s! [mdt_io00_010:41568] Nov 05 20:58:17 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:17 fir-md1-s1 kernel: CPU: 28 PID: 41568 Comm: mdt_io00_010 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:17 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:17 fir-md1-s1 kernel: task: ffffa12d807b2080 ti: ffffa122a83c8000 task.ti: ffffa122a83c8000 Nov 05 20:58:17 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 Nov 05 20:58:17 fir-md1-s1 kernel: RSP: 0018:ffffa122a83cb800 EFLAGS: 00000246 Nov 05 20:58:17 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa10b9ff1d740 RCX: 0000000000e10000 Nov 05 20:58:17 fir-md1-s1 kernel: RDX: ffffa12e3f65b780 RSI: 0000000000310101 RDI: ffffa13e3710f480 Nov 05 20:58:17 fir-md1-s1 kernel: RBP: ffffa122a83cb800 R08: ffffa10e3efdb780 R09: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R10: ffffa10e3efdf140 R11: ffffda9161adfa00 R12: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R13: ffffa122a83cb7a0 R14: ffffa10b9ff1d4a8 R15: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: FS: 00007f3f2c704700(0000) GS:ffffa10e3efc0000(0000) knlGS:0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:17 fir-md1-s1 kernel: CR2: 00007f3f2c7d8000 CR3: 000000202120e000 CR4: 00000000003407e0 Nov 05 20:58:17 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:17 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:17 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get+0x52/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? kiblnd_post_tx_locked+0x7bb/0xa50 [ko2iblnd] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? cfs_hash_bd_lookup_intent+0x63/0x170 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_ldiskfs_map_inode_pages+0x143/0x420 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_write_prep+0x2b6/0x360 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_obd_preprw+0x65b/0x10a0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] tgt_brw_write+0xc7c/0x1cf0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? lustre_msg_buf_v2+0x1e0/0x1e0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? lustre_msg_buf+0x17/0x60 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? update_curr+0x14c/0x1e0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? account_entity_dequeue+0xae/0xd0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? dequeue_entity+0x11c/0x5e0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? cpumask_next_and+0x35/0x50 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? tgt_lookup_reply+0x2d/0x190 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __schedule+0x42a/0x860 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b Nov 05 20:58:17 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#29 stuck for 22s! [ldlm_cn01_031:61424] Nov 05 20:58:17 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:17 fir-md1-s1 kernel: CPU: 29 PID: 61424 Comm: ldlm_cn01_031 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:17 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:17 fir-md1-s1 kernel: task: ffffa11e353da080 ti: ffffa12d26e28000 task.ti: ffffa12d26e28000 Nov 05 20:58:17 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 Nov 05 20:58:17 fir-md1-s1 kernel: RSP: 0018:ffffa12d26e2b880 EFLAGS: 00000246 Nov 05 20:58:17 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa1122d5993a8 RCX: 0000000000e90000 Nov 05 20:58:17 fir-md1-s1 kernel: RDX: ffffa11e3f6db780 RSI: 0000000000690101 RDI: ffffa13e3710f480 Nov 05 20:58:17 fir-md1-s1 kernel: RBP: ffffa12d26e2b880 R08: ffffa11e3f7db780 R09: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R10: ffffa11e3f7df0c0 R11: ffffda91b69a8c80 R12: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R13: ffffffffc13da26c R14: ffffa12d26e2b810 R15: ffffa12d26e2b8c0 Nov 05 20:58:17 fir-md1-s1 kernel: FS: 00007f0d40dc7700(0000) GS:ffffa11e3f7c0000(0000) knlGS:0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:17 fir-md1-s1 kernel: CR2: 00007f6f9994f000 CR3: 000000402e2d0000 CR4: 00000000003407e0 Nov 05 20:58:17 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:17 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:17 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_es_lru_del+0x32/0x70 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_clear_inode+0x41/0x90 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_free_inode+0x10b/0x610 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_iloc_dirty+0x68/0x80 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_evict_inode+0x472/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_inode_dirty+0x6f/0x210 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_evict_inode+0x57d/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] evict+0xb4/0x180 Nov 05 20:58:17 fir-md1-s1 kernel: [] iput+0xfc/0x190 Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_object_delete+0x1d2/0x330 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_free.isra.32+0x68/0x170 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? cfs_hash_bd_from_key+0x38/0xb0 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_put+0xc5/0x3d0 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? mdt_punch_hpreq_fini+0x10/0x10 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_dom_discard_cp_ast+0xb1/0x2b0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_work_cp_ast_lock+0xa8/0x1d0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x72/0x790 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? kmem_cache_alloc_node_trace+0x11d/0x210 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? lprocfs_counter_add+0xf9/0x160 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldlm_work_gl_ast_lock+0x3a0/0x3a0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_prep_set+0xd2/0x280 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] __ldlm_reprocess_all+0x11f/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_reprocess_all+0x13/0x20 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_request_cancel+0x42f/0x780 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_handle_cancel+0x232/0x2b0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_cancel_handler+0x158/0x590 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? wake_up_state+0x20/0x20 Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __schedule+0x42a/0x860 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b Nov 05 20:58:17 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#30 stuck for 22s! [mdt02_039:41166] Nov 05 20:58:17 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:17 fir-md1-s1 kernel: CPU: 30 PID: 41166 Comm: mdt02_039 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:17 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:17 fir-md1-s1 kernel: task: ffffa11e2dcfe180 ti: ffffa11e2a110000 task.ti: ffffa11e2a110000 Nov 05 20:58:17 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 Nov 05 20:58:17 fir-md1-s1 kernel: RSP: 0018:ffffa11e2a113510 EFLAGS: 00000246 Nov 05 20:58:17 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa129c7f48be0 RCX: 0000000000f10000 Nov 05 20:58:17 fir-md1-s1 kernel: RDX: ffffa10e3ee5b780 RSI: 0000000000210101 RDI: ffffa13e3710f480 Nov 05 20:58:17 fir-md1-s1 kernel: RBP: ffffa11e2a113510 R08: ffffa12e3f7db780 R09: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R10: ffffa12e3f7df140 R11: ffffda91c3043a00 R12: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R13: ffffa11e2a1134b0 R14: ffffa129c7f48948 R15: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: FS: 00007f7a69e73740(0000) GS:ffffa12e3f7c0000(0000) knlGS:0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:17 fir-md1-s1 kernel: CR2: 0000000000412480 CR3: 00000030172e8000 CR4: 00000000003407e0 Nov 05 20:58:17 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:17 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:17 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? lprocfs_counter_sub+0xc1/0x130 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? dynlock_unlock+0x194/0x1e0 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __brelse+0x3d/0x50 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? iam_path_release+0x42/0x60 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_getblk+0x65/0x200 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_bread+0x27/0xc0 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_append+0x81/0x150 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_init_new_dir+0xcf/0x230 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __brelse+0x3d/0x50 Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_add_dot_dotdot+0x4e/0x90 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_add_dot_dotdot_internal.isra.76+0x5f/0x80 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_index_ea_insert+0xbaa/0x12f0 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lod_sub_insert+0x1c1/0x340 [lod] Nov 05 20:58:17 fir-md1-s1 kernel: [] lod_insert+0x24/0x30 [lod] Nov 05 20:58:17 fir-md1-s1 kernel: [] __mdd_index_insert_only+0x1cc/0x280 [mdd] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdd_create_object+0x6c8/0x820 [mdd] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdd_create+0xe31/0x14e0 [mdd] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_create+0xb54/0x1090 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? lprocfs_stats_lock+0x24/0xd0 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_reint_create+0x16b/0x360 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? mdt_thread_info_init+0xa4/0x1e0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __schedule+0x42a/0x860 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b Nov 05 20:58:17 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#31 stuck for 22s! [ldlm_cn03_009:26820] Nov 05 20:58:17 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:17 fir-md1-s1 kernel: CPU: 31 PID: 26820 Comm: ldlm_cn03_009 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:17 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:17 fir-md1-s1 kernel: task: ffffa13e3a2db0c0 ti: ffffa13dc9ea8000 task.ti: ffffa13dc9ea8000 Nov 05 20:58:17 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 Nov 05 20:58:17 fir-md1-s1 kernel: RSP: 0018:ffffa13dc9eab880 EFLAGS: 00000246 Nov 05 20:58:17 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa130a12b2038 RCX: 0000000000f90000 Nov 05 20:58:17 fir-md1-s1 kernel: RDX: ffffa11e3f85b780 RSI: 0000000001290101 RDI: ffffa13e3710f480 Nov 05 20:58:17 fir-md1-s1 kernel: RBP: ffffa13dc9eab880 R08: ffffa13e7f5db780 R09: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R10: ffffa13e7f5df0c0 R11: ffffda920b623580 R12: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R13: ffffffffc13da26c R14: ffffa13dc9eab810 R15: ffffa13dc9eab8c0 Nov 05 20:58:17 fir-md1-s1 kernel: FS: 00007f3e14491740(0000) GS:ffffa13e7f5c0000(0000) knlGS:0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:17 fir-md1-s1 kernel: CR2: 00000000027d0fe8 CR3: 000000364fa10000 CR4: 00000000003407e0 Nov 05 20:58:17 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:17 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:17 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_es_lru_del+0x32/0x70 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_clear_inode+0x41/0x90 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_free_inode+0x10b/0x610 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_iloc_dirty+0x68/0x80 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_evict_inode+0x472/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_inode_dirty+0x6f/0x210 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_evict_inode+0x57d/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] evict+0xb4/0x180 Nov 05 20:58:17 fir-md1-s1 kernel: [] iput+0xfc/0x190 Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_object_delete+0x1d2/0x330 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_free.isra.32+0x68/0x170 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? cfs_hash_bd_from_key+0x38/0xb0 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_put+0xc5/0x3d0 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? mdt_punch_hpreq_fini+0x10/0x10 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_dom_discard_cp_ast+0xb1/0x2b0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_work_cp_ast_lock+0xa8/0x1d0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x72/0x790 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? kmem_cache_alloc_node_trace+0x11d/0x210 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? lprocfs_counter_add+0xf9/0x160 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldlm_work_gl_ast_lock+0x3a0/0x3a0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_prep_set+0xd2/0x280 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] __ldlm_reprocess_all+0x11f/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_reprocess_all+0x13/0x20 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_request_cancel+0x42f/0x780 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_handle_cancel+0x232/0x2b0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_cancel_handler+0x158/0x590 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_server_handle_req_in+0x8df/0xd60 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __schedule+0x42a/0x860 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b Nov 05 20:58:17 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#32 stuck for 22s! [mdt_io00_015:41626] Nov 05 20:58:17 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:17 fir-md1-s1 kernel: CPU: 32 PID: 41626 Comm: mdt_io00_015 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:17 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:17 fir-md1-s1 kernel: task: ffffa13705f09040 ti: ffffa13e33318000 task.ti: ffffa13e33318000 Nov 05 20:58:17 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 Nov 05 20:58:17 fir-md1-s1 kernel: RSP: 0018:ffffa13e3331b800 EFLAGS: 00000246 Nov 05 20:58:17 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa1179df1dfa0 RCX: 0000000001010000 Nov 05 20:58:17 fir-md1-s1 kernel: RDX: ffffa10e3efdb780 RSI: 0000000000e10101 RDI: ffffa13e3710f480 Nov 05 20:58:17 fir-md1-s1 kernel: RBP: ffffa13e3331b800 R08: ffffa10e3f01b780 R09: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R10: ffffa10e3f01f140 R11: ffffda9161b7e400 R12: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R13: ffffa13e3331b7a0 R14: ffffa1179df1dd08 R15: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: FS: 00007fecb56f0740(0000) GS:ffffa10e3f000000(0000) knlGS:0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:17 fir-md1-s1 kernel: CR2: 00007fecb52da248 CR3: 000000364fa10000 CR4: 00000000003407e0 Nov 05 20:58:17 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:17 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:17 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get+0x52/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? kiblnd_post_tx_locked+0x7bb/0xa50 [ko2iblnd] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? cfs_hash_bd_lookup_intent+0x63/0x170 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_ldiskfs_map_inode_pages+0x143/0x420 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_write_prep+0x2b6/0x360 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_obd_preprw+0x65b/0x10a0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] tgt_brw_write+0xc7c/0x1cf0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? update_curr+0x14c/0x1e0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? account_entity_dequeue+0xae/0xd0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? dequeue_entity+0x11c/0x5e0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? tgt_lookup_reply+0x2d/0x190 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b Nov 05 20:58:17 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#33 stuck for 22s! [ldlm_cn01_029:61422] Nov 05 20:58:17 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:17 fir-md1-s1 kernel: CPU: 33 PID: 61422 Comm: ldlm_cn01_029 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:17 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:17 fir-md1-s1 kernel: task: ffffa11e353dc100 ti: ffffa122caae8000 task.ti: ffffa122caae8000 Nov 05 20:58:17 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 Nov 05 20:58:17 fir-md1-s1 kernel: RSP: 0018:ffffa122caaeb880 EFLAGS: 00000246 Nov 05 20:58:17 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa1122d4241b8 RCX: 0000000001090000 Nov 05 20:58:17 fir-md1-s1 kernel: RDX: ffffa12e3f71b780 RSI: 0000000000910101 RDI: ffffa13e3710f480 Nov 05 20:58:17 fir-md1-s1 kernel: RBP: ffffa122caaeb880 R08: ffffa11e3f81b780 R09: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R10: ffffa11e3f81f0c0 R11: ffffda9184296d00 R12: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R13: ffffffffc13da26c R14: ffffa122caaeb810 R15: ffffa122caaeb8c0 Nov 05 20:58:17 fir-md1-s1 kernel: FS: 00007f3f2c767700(0000) GS:ffffa11e3f800000(0000) knlGS:0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:17 fir-md1-s1 kernel: CR2: 00007f3f2c7d8000 CR3: 000000364fa10000 CR4: 00000000003407e0 Nov 05 20:58:17 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:17 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:17 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_es_lru_del+0x32/0x70 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_clear_inode+0x41/0x90 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_free_inode+0x10b/0x610 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_iloc_dirty+0x68/0x80 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_evict_inode+0x472/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_inode_dirty+0x6f/0x210 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_evict_inode+0x57d/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] evict+0xb4/0x180 Nov 05 20:58:17 fir-md1-s1 kernel: [] iput+0xfc/0x190 Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_object_delete+0x1d2/0x330 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_free.isra.32+0x68/0x170 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? cfs_hash_bd_from_key+0x38/0xb0 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_put+0xc5/0x3d0 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? mdt_punch_hpreq_fini+0x10/0x10 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_dom_discard_cp_ast+0xb1/0x2b0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_work_cp_ast_lock+0xa8/0x1d0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x72/0x790 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? kmem_cache_alloc_node_trace+0x11d/0x210 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? lprocfs_counter_add+0xf9/0x160 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldlm_work_gl_ast_lock+0x3a0/0x3a0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_prep_set+0xd2/0x280 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] __ldlm_reprocess_all+0x11f/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_reprocess_all+0x13/0x20 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_request_cancel+0x42f/0x780 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_handle_cancel+0x232/0x2b0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_cancel_handler+0x158/0x590 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __schedule+0x42a/0x860 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b Nov 05 20:58:17 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#34 stuck for 22s! [ldlm_cn02_011:44699] Nov 05 20:58:17 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:17 fir-md1-s1 kernel: CPU: 34 PID: 44699 Comm: ldlm_cn02_011 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:17 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:17 fir-md1-s1 kernel: task: ffffa13e35956180 ti: ffffa12ba4644000 task.ti: ffffa12ba4644000 Nov 05 20:58:17 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x126/0x200 Nov 05 20:58:17 fir-md1-s1 kernel: RSP: 0018:ffffa12ba4647880 EFLAGS: 00000246 Nov 05 20:58:17 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa129c7e87c58 RCX: 0000000001110000 Nov 05 20:58:17 fir-md1-s1 kernel: RDX: ffffa12e3f69b780 RSI: 0000000000510101 RDI: ffffa13e3710f480 Nov 05 20:58:17 fir-md1-s1 kernel: RBP: ffffa12ba4647880 R08: ffffa12e3f81b780 R09: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R10: ffffa12e3f81f0c0 R11: ffffda91fb48c140 R12: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R13: ffffffffc13da26c R14: ffffa12ba4647810 R15: ffffa12ba46478c0 Nov 05 20:58:17 fir-md1-s1 kernel: FS: 00007f74af776740(0000) GS:ffffa12e3f800000(0000) knlGS:0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:17 fir-md1-s1 kernel: CR2: 00007f74ae443330 CR3: 000000364fa10000 CR4: 00000000003407e0 Nov 05 20:58:17 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:17 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:17 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_es_lru_del+0x32/0x70 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_clear_inode+0x41/0x90 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_free_inode+0x10b/0x610 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_iloc_dirty+0x68/0x80 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_evict_inode+0x472/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_inode_dirty+0x6f/0x210 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_evict_inode+0x57d/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] evict+0xb4/0x180 Nov 05 20:58:17 fir-md1-s1 kernel: [] iput+0xfc/0x190 Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_object_delete+0x1d2/0x330 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_free.isra.32+0x68/0x170 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? cfs_hash_bd_from_key+0x38/0xb0 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_put+0xc5/0x3d0 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? mdt_punch_hpreq_fini+0x10/0x10 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_dom_discard_cp_ast+0xb1/0x2b0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_work_cp_ast_lock+0xa8/0x1d0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x72/0x790 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? kmem_cache_alloc_node_trace+0x11d/0x210 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? lprocfs_counter_add+0xf9/0x160 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldlm_work_gl_ast_lock+0x3a0/0x3a0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_prep_set+0xd2/0x280 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] __ldlm_reprocess_all+0x11f/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_reprocess_all+0x13/0x20 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_request_cancel+0x42f/0x780 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_handle_cancel+0x232/0x2b0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_cancel_handler+0x158/0x590 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __schedule+0x42a/0x860 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: Code: 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 41 8b 40 08 <85> c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b 17 0f b7 c2 Nov 05 20:58:17 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#35 stuck for 22s! [ldlm_cn03_000:21586] Nov 05 20:58:17 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:17 fir-md1-s1 kernel: CPU: 35 PID: 21586 Comm: ldlm_cn03_000 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:17 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:17 fir-md1-s1 kernel: task: ffffa12e149d9040 ti: ffffa12e01e38000 task.ti: ffffa12e01e38000 Nov 05 20:58:17 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 Nov 05 20:58:17 fir-md1-s1 kernel: RSP: 0018:ffffa12e01e3b880 EFLAGS: 00000246 Nov 05 20:58:17 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa130a13373f8 RCX: 0000000001190000 Nov 05 20:58:17 fir-md1-s1 kernel: RDX: ffffa11e3f69b780 RSI: 0000000000490101 RDI: ffffa13e3710f480 Nov 05 20:58:17 fir-md1-s1 kernel: RBP: ffffa12e01e3b880 R08: ffffa13e7f61b780 R09: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R10: ffffa13e7f61f0c0 R11: ffffda920b1686c0 R12: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R13: ffffffffc13da26c R14: ffffa12e01e3b810 R15: ffffa12e01e3b8c0 Nov 05 20:58:17 fir-md1-s1 kernel: FS: 00007f076ac83740(0000) GS:ffffa13e7f600000(0000) knlGS:0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:17 fir-md1-s1 kernel: CR2: 00007f076a506320 CR3: 000000364fa10000 CR4: 00000000003407e0 Nov 05 20:58:17 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:17 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:17 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_es_lru_del+0x32/0x70 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_clear_inode+0x41/0x90 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_free_inode+0x10b/0x610 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_iloc_dirty+0x68/0x80 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_evict_inode+0x472/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_inode_dirty+0x6f/0x210 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_evict_inode+0x57d/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] evict+0xb4/0x180 Nov 05 20:58:17 fir-md1-s1 kernel: [] iput+0xfc/0x190 Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_object_delete+0x1d2/0x330 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_free.isra.32+0x68/0x170 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? cfs_hash_bd_from_key+0x38/0xb0 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_put+0xc5/0x3d0 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? mdt_punch_hpreq_fini+0x10/0x10 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_dom_discard_cp_ast+0xb1/0x2b0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_work_cp_ast_lock+0xa8/0x1d0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x72/0x790 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? kmem_cache_alloc_node_trace+0x11d/0x210 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? lprocfs_counter_add+0xf9/0x160 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldlm_work_gl_ast_lock+0x3a0/0x3a0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_prep_set+0xd2/0x280 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] __ldlm_reprocess_all+0x11f/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_reprocess_all+0x13/0x20 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_request_cancel+0x42f/0x780 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_handle_cancel+0x232/0x2b0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_cancel_handler+0x158/0x590 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __schedule+0x42a/0x860 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b Nov 05 20:58:17 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#36 stuck for 22s! [mdt_io00_032:41737] Nov 05 20:58:17 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:17 fir-md1-s1 kernel: CPU: 36 PID: 41737 Comm: mdt_io00_032 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:17 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:17 fir-md1-s1 kernel: task: ffffa11e2a1e2080 ti: ffffa13abe3dc000 task.ti: ffffa13abe3dc000 Nov 05 20:58:17 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x126/0x200 Nov 05 20:58:17 fir-md1-s1 kernel: RSP: 0018:ffffa13abe3df800 EFLAGS: 00000246 Nov 05 20:58:17 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa1179ddf1ca0 RCX: 0000000001210000 Nov 05 20:58:17 fir-md1-s1 kernel: RDX: ffffa10e3ee1b780 RSI: 0000000000010101 RDI: ffffa13e3710f480 Nov 05 20:58:17 fir-md1-s1 kernel: RBP: ffffa13abe3df800 R08: ffffa10e3f05b780 R09: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R10: ffffa10e3f05f140 R11: ffffda91c3226e00 R12: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R13: ffffa13abe3df7a0 R14: ffffa1179ddf1a08 R15: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: FS: 00007f38cab19700(0000) GS:ffffa10e3f040000(0000) knlGS:0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:17 fir-md1-s1 kernel: CR2: 00007f38cf67f000 CR3: 0000004026a50000 CR4: 00000000003407e0 Nov 05 20:58:17 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:17 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:17 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get+0x52/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? kiblnd_post_tx_locked+0x7bb/0xa50 [ko2iblnd] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? cfs_hash_bd_lookup_intent+0x63/0x170 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_ldiskfs_map_inode_pages+0x143/0x420 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_write_prep+0x2b6/0x360 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_obd_preprw+0x65b/0x10a0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] tgt_brw_write+0xc7c/0x1cf0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? load_balance+0x1be/0x9a0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __enqueue_entity+0x78/0x80 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? tgt_lookup_reply+0x2d/0x190 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __schedule+0x42a/0x860 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: Code: 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 41 8b 40 08 <85> c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b 17 0f b7 c2 Nov 05 20:58:17 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#37 stuck for 22s! [ldlm_cn01_025:60053] Nov 05 20:58:17 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:17 fir-md1-s1 kernel: CPU: 37 PID: 60053 Comm: ldlm_cn01_025 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:17 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:17 fir-md1-s1 kernel: task: ffffa129b1a81040 ti: ffffa1257dff0000 task.ti: ffffa1257dff0000 Nov 05 20:58:17 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 Nov 05 20:58:17 fir-md1-s1 kernel: RSP: 0018:ffffa1257dff3880 EFLAGS: 00000246 Nov 05 20:58:17 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa11c4a0cc5e8 RCX: 0000000001290000 Nov 05 20:58:17 fir-md1-s1 kernel: RDX: ffffa10e3eedb780 RSI: 0000000000610101 RDI: ffffa13e3710f480 Nov 05 20:58:17 fir-md1-s1 kernel: RBP: ffffa1257dff3880 R08: ffffa11e3f85b780 R09: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R10: ffffa11e3f85f0c0 R11: ffffda91921d70c0 R12: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R13: ffffffffc13da26c R14: ffffa1257dff3810 R15: ffffa1257dff38c0 Nov 05 20:58:17 fir-md1-s1 kernel: FS: 00007f2faba6b740(0000) GS:ffffa11e3f840000(0000) knlGS:0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:17 fir-md1-s1 kernel: CR2: 00007f2fab52c8f0 CR3: 000000364fa10000 CR4: 00000000003407e0 Nov 05 20:58:17 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:17 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:17 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_es_lru_del+0x32/0x70 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_clear_inode+0x41/0x90 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_free_inode+0x10b/0x610 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_iloc_dirty+0x68/0x80 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_evict_inode+0x472/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_inode_dirty+0x6f/0x210 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_evict_inode+0x57d/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] evict+0xb4/0x180 Nov 05 20:58:17 fir-md1-s1 kernel: [] iput+0xfc/0x190 Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_object_delete+0x1d2/0x330 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_free.isra.32+0x68/0x170 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? cfs_hash_bd_from_key+0x38/0xb0 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_put+0xc5/0x3d0 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? mdt_punch_hpreq_fini+0x10/0x10 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_dom_discard_cp_ast+0xb1/0x2b0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_work_cp_ast_lock+0xa8/0x1d0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x72/0x790 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? kmem_cache_alloc_node_trace+0x11d/0x210 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? lprocfs_counter_add+0xf9/0x160 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldlm_work_gl_ast_lock+0x3a0/0x3a0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_prep_set+0xd2/0x280 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] __ldlm_reprocess_all+0x11f/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_reprocess_all+0x13/0x20 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_request_cancel+0x42f/0x780 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_handle_cancel+0x232/0x2b0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_cancel_handler+0x158/0x590 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __schedule+0x42a/0x860 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b Nov 05 20:58:17 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#38 stuck for 22s! [mdt_io02_034:41734] Nov 05 20:58:17 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:17 fir-md1-s1 kernel: CPU: 38 PID: 41734 Comm: mdt_io02_034 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:17 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:17 fir-md1-s1 kernel: task: ffffa11e2a1e4100 ti: ffffa13cd3a70000 task.ti: ffffa13cd3a70000 Nov 05 20:58:17 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 Nov 05 20:58:17 fir-md1-s1 kernel: RSP: 0018:ffffa13cd3a73800 EFLAGS: 00000246 Nov 05 20:58:17 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa130a9500be0 RCX: 0000000001310000 Nov 05 20:58:17 fir-md1-s1 kernel: RDX: ffffa12e3f8db780 RSI: 0000000001710101 RDI: ffffa13e3710f480 Nov 05 20:58:17 fir-md1-s1 kernel: RBP: ffffa13cd3a73800 R08: ffffa12e3f85b780 R09: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R10: ffffa12e3f85f140 R11: ffffda91d59da200 R12: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R13: ffffa13cd3a737a0 R14: ffffa130a9500948 R15: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: FS: 00007f38ccb1d700(0000) GS:ffffa12e3f840000(0000) knlGS:0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:17 fir-md1-s1 kernel: CR2: 000000000124f178 CR3: 000000364fa10000 CR4: 00000000003407e0 Nov 05 20:58:17 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:17 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:17 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get+0x52/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? kiblnd_post_tx_locked+0x7bb/0xa50 [ko2iblnd] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? cfs_hash_bd_lookup_intent+0x63/0x170 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_ldiskfs_map_inode_pages+0x143/0x420 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_write_prep+0x2b6/0x360 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_obd_preprw+0x65b/0x10a0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] tgt_brw_write+0xc7c/0x1cf0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? load_balance+0x178/0x9a0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? update_curr+0x14c/0x1e0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __enqueue_entity+0x78/0x80 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? enqueue_entity+0x2ef/0xbe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? tgt_lookup_reply+0x2d/0x190 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b Nov 05 20:58:17 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#39 stuck for 22s! [mdt_io03_033:41722] Nov 05 20:58:17 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:17 fir-md1-s1 kernel: CPU: 39 PID: 41722 Comm: mdt_io03_033 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:17 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:17 fir-md1-s1 kernel: task: ffffa12114f0a080 ti: ffffa12d8d798000 task.ti: ffffa12d8d798000 Nov 05 20:58:17 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x126/0x200 Nov 05 20:58:17 fir-md1-s1 kernel: RSP: 0018:ffffa12d8d79b800 EFLAGS: 00000246 Nov 05 20:58:17 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa130a610a500 RCX: 0000000001390000 Nov 05 20:58:17 fir-md1-s1 kernel: RDX: ffffa12e3f7db780 RSI: 0000000000f10101 RDI: ffffa13e3710f480 Nov 05 20:58:17 fir-md1-s1 kernel: RBP: ffffa12d8d79b800 R08: ffffa13e7f65b780 R09: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R10: ffffa13e7f65f140 R11: ffffda920b340e00 R12: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R13: ffffa12d8d79b7a0 R14: ffffa130a610a268 R15: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: FS: 00007f28fc018840(0000) GS:ffffa13e7f640000(0000) knlGS:0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:17 fir-md1-s1 kernel: CR2: 00007fff565b7e28 CR3: 000000364fa10000 CR4: 00000000003407e0 Nov 05 20:58:17 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:17 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:17 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get+0x52/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? kiblnd_post_tx_locked+0x7bb/0xa50 [ko2iblnd] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? cfs_hash_bd_lookup_intent+0x63/0x170 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_ldiskfs_map_inode_pages+0x143/0x420 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_write_prep+0x2b6/0x360 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_obd_preprw+0x65b/0x10a0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] tgt_brw_write+0xc7c/0x1cf0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? tgt_free_reply_data+0x128/0x3b0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? kfree+0x106/0x140 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? tgt_free_reply_data+0x128/0x3b0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? tgt_lookup_reply+0x2d/0x190 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: Code: 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 41 8b 40 08 <85> c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b 17 0f b7 c2 Nov 05 20:58:17 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#40 stuck for 22s! [mdt_io00_028:41672] Nov 05 20:58:17 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:17 fir-md1-s1 kernel: CPU: 40 PID: 41672 Comm: mdt_io00_028 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:17 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:17 fir-md1-s1 kernel: task: ffffa12a51e7e180 ti: ffffa12df7f5c000 task.ti: ffffa12df7f5c000 Nov 05 20:58:17 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 Nov 05 20:58:17 fir-md1-s1 kernel: RSP: 0018:ffffa12df7f5f800 EFLAGS: 00000246 Nov 05 20:58:17 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa1179dd7d740 RCX: 0000000001410000 Nov 05 20:58:17 fir-md1-s1 kernel: RDX: ffffa11e3f89b780 RSI: 0000000001490101 RDI: ffffa13e3710f480 Nov 05 20:58:17 fir-md1-s1 kernel: RBP: ffffa12df7f5f800 R08: ffffa10e3f09b780 R09: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R10: ffffa10e3f09f140 R11: ffffda9191921000 R12: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R13: ffffa12df7f5f7a0 R14: ffffa1179dd7d4a8 R15: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: FS: 00007f8f6cd9d880(0000) GS:ffffa10e3f080000(0000) knlGS:0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:17 fir-md1-s1 kernel: CR2: 00007f8f6cdb3000 CR3: 0000004022bfc000 CR4: 00000000003407e0 Nov 05 20:58:17 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:17 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:17 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get+0x52/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? kiblnd_post_tx_locked+0x7bb/0xa50 [ko2iblnd] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? cfs_hash_bd_lookup_intent+0x63/0x170 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_ldiskfs_map_inode_pages+0x143/0x420 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_write_prep+0x2b6/0x360 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_obd_preprw+0x65b/0x10a0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] tgt_brw_write+0xc7c/0x1cf0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? load_balance+0x1be/0x9a0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __enqueue_entity+0x78/0x80 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? tgt_lookup_reply+0x2d/0x190 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b Nov 05 20:58:17 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#41 stuck for 22s! [mdt01_031:41137] Nov 05 20:58:17 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:17 fir-md1-s1 kernel: CPU: 41 PID: 41137 Comm: mdt01_031 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:17 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:17 fir-md1-s1 kernel: task: ffffa11e34580000 ti: ffffa11e366c0000 task.ti: ffffa11e366c0000 Nov 05 20:58:17 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 Nov 05 20:58:17 fir-md1-s1 kernel: RSP: 0018:ffffa11e366c3930 EFLAGS: 00000246 Nov 05 20:58:17 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa129c7c0c5e8 RCX: 0000000001490000 Nov 05 20:58:17 fir-md1-s1 kernel: RDX: ffffa13e7f6db780 RSI: 0000000001790101 RDI: ffffa13e3710f480 Nov 05 20:58:17 fir-md1-s1 kernel: RBP: ffffa11e366c3930 R08: ffffa11e3f89b780 R09: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R10: ffffa11e3f89f0c0 R11: ffffda919bac6780 R12: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R13: ffffffffc13da26c R14: ffffa11e366c38c0 R15: ffffa11e366c3970 Nov 05 20:58:17 fir-md1-s1 kernel: FS: 00007f3f2c704700(0000) GS:ffffa11e3f880000(0000) knlGS:0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:17 fir-md1-s1 kernel: CR2: 00007f3f2c7d8000 CR3: 000000364fa10000 CR4: 00000000003407e0 Nov 05 20:58:17 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:17 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:17 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_es_lru_del+0x32/0x70 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_clear_inode+0x41/0x90 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_free_inode+0x10b/0x610 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_iloc_dirty+0x68/0x80 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_evict_inode+0x472/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_inode_dirty+0x6f/0x210 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_evict_inode+0x57d/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] evict+0xb4/0x180 Nov 05 20:58:17 fir-md1-s1 kernel: [] iput+0xfc/0x190 Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_object_delete+0x1d2/0x330 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_free.isra.32+0x68/0x170 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? cfs_hash_bd_from_key+0x38/0xb0 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_put+0xc5/0x3d0 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_reint_unlink+0x813/0x14b0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? mdt_thread_info_init+0xa4/0x1e0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? wake_up_state+0x20/0x20 Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b Nov 05 20:58:17 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#42 stuck for 22s! [mdt_io02_017:41591] Nov 05 20:58:17 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:17 fir-md1-s1 kernel: CPU: 42 PID: 41591 Comm: mdt_io02_017 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:17 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:17 fir-md1-s1 kernel: task: ffffa12ddfd31040 ti: ffffa12d54f0c000 task.ti: ffffa12d54f0c000 Nov 05 20:58:17 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x126/0x200 Nov 05 20:58:17 fir-md1-s1 kernel: RSP: 0018:ffffa12d54f0f800 EFLAGS: 00000246 Nov 05 20:58:17 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa1179ddf5740 RCX: 0000000001510000 Nov 05 20:58:17 fir-md1-s1 kernel: RDX: ffffa13e7f65b780 RSI: 0000000001390101 RDI: ffffa13e3710f480 Nov 05 20:58:17 fir-md1-s1 kernel: RBP: ffffa12d54f0f800 R08: ffffa12e3f89b780 R09: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R10: ffffa12e3f89f140 R11: ffffda91c31d4e00 R12: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R13: ffffa12d54f0f7a0 R14: ffffa1179ddf54a8 R15: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: FS: 00007f94d47aa880(0000) GS:ffffa12e3f880000(0000) knlGS:0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:17 fir-md1-s1 kernel: CR2: 00007f94c30e1010 CR3: 000000202a620000 CR4: 00000000003407e0 Nov 05 20:58:17 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:17 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:17 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get+0x52/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? kiblnd_post_tx_locked+0x7bb/0xa50 [ko2iblnd] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? cfs_hash_bd_lookup_intent+0x63/0x170 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_ldiskfs_map_inode_pages+0x143/0x420 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_write_prep+0x2b6/0x360 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_obd_preprw+0x65b/0x10a0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] tgt_brw_write+0xc7c/0x1cf0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? lustre_msg_buf_v2+0x1e0/0x1e0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? lustre_msg_buf+0x17/0x60 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? update_curr+0x14c/0x1e0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? account_entity_dequeue+0xae/0xd0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? dequeue_entity+0x11c/0x5e0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? tgt_lookup_reply+0x2d/0x190 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? wake_up_state+0x20/0x20 Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: Code: 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 41 8b 40 08 <85> c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b 17 0f b7 c2 Nov 05 20:58:17 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#43 stuck for 22s! [ldlm_cn03_028:61247] Nov 05 20:58:17 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:17 fir-md1-s1 kernel: CPU: 43 PID: 61247 Comm: ldlm_cn03_028 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:17 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:17 fir-md1-s1 kernel: task: ffffa129b1a82080 ti: ffffa128f4b88000 task.ti: ffffa128f4b88000 Nov 05 20:58:17 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x126/0x200 Nov 05 20:58:17 fir-md1-s1 kernel: RSP: 0018:ffffa128f4b8b880 EFLAGS: 00000246 Nov 05 20:58:17 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa12b7c0c56a8 RCX: 0000000001590000 Nov 05 20:58:17 fir-md1-s1 kernel: RDX: ffffa11e3f61b780 RSI: 0000000000090101 RDI: ffffa13e3710f480 Nov 05 20:58:17 fir-md1-s1 kernel: RBP: ffffa128f4b8b880 R08: ffffa13e7f69b780 R09: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R10: ffffa13e7f69f0c0 R11: ffffda91c3097340 R12: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R13: ffffffffc13da26c R14: ffffa128f4b8b810 R15: ffffa128f4b8b8c0 Nov 05 20:58:17 fir-md1-s1 kernel: FS: 00007f9cd359a740(0000) GS:ffffa13e7f680000(0000) knlGS:0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:17 fir-md1-s1 kernel: CR2: 00007f9cd31851cc CR3: 000000364fa10000 CR4: 00000000003407e0 Nov 05 20:58:17 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:17 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:17 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_es_lru_del+0x32/0x70 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_clear_inode+0x41/0x90 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_free_inode+0x10b/0x610 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_iloc_dirty+0x68/0x80 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_evict_inode+0x472/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_inode_dirty+0x6f/0x210 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_evict_inode+0x57d/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] evict+0xb4/0x180 Nov 05 20:58:17 fir-md1-s1 kernel: [] iput+0xfc/0x190 Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_object_delete+0x1d2/0x330 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_free.isra.32+0x68/0x170 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? cfs_hash_bd_from_key+0x38/0xb0 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_put+0xc5/0x3d0 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? mdt_punch_hpreq_fini+0x10/0x10 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_dom_discard_cp_ast+0xb1/0x2b0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_work_cp_ast_lock+0xa8/0x1d0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x72/0x790 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? kmem_cache_alloc_node_trace+0x11d/0x210 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? lprocfs_counter_add+0xf9/0x160 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldlm_work_gl_ast_lock+0x3a0/0x3a0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_prep_set+0xd2/0x280 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] __ldlm_reprocess_all+0x11f/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_reprocess_all+0x13/0x20 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_request_cancel+0x42f/0x780 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_handle_cancel+0x232/0x2b0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_cancel_handler+0x158/0x590 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __schedule+0x42a/0x860 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: Code: 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 41 8b 40 08 <85> c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b 17 0f b7 c2 Nov 05 20:58:17 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#44 stuck for 22s! [mdt_io00_001:40674] Nov 05 20:58:17 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:17 fir-md1-s1 kernel: CPU: 44 PID: 40674 Comm: mdt_io00_001 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:17 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:17 fir-md1-s1 kernel: task: ffffa1113b326180 ti: ffffa1126fef8000 task.ti: ffffa1126fef8000 Nov 05 20:58:17 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x126/0x200 Nov 05 20:58:17 fir-md1-s1 kernel: RSP: 0018:ffffa1126fefb800 EFLAGS: 00000246 Nov 05 20:58:17 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa12b7c054ee0 RCX: 0000000001610000 Nov 05 20:58:17 fir-md1-s1 kernel: RDX: ffffa10e3f01b780 RSI: 0000000001010101 RDI: ffffa13e3710f480 Nov 05 20:58:17 fir-md1-s1 kernel: RBP: ffffa1126fefb800 R08: ffffa10e3f0db780 R09: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R10: ffffa10e3f0df140 R11: ffffda920a9b6800 R12: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R13: ffffa1126fefb7a0 R14: ffffa12b7c054c48 R15: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: FS: 00007f5420c78700(0000) GS:ffffa10e3f0c0000(0000) knlGS:0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:17 fir-md1-s1 kernel: CR2: 00007f636f0e8000 CR3: 0000003014466000 CR4: 00000000003407e0 Nov 05 20:58:17 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:17 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:17 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get+0x52/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? kiblnd_post_tx_locked+0x7bb/0xa50 [ko2iblnd] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? cfs_hash_bd_lookup_intent+0x63/0x170 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_ldiskfs_map_inode_pages+0x143/0x420 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_write_prep+0x2b6/0x360 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_obd_preprw+0x65b/0x10a0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] tgt_brw_write+0xc7c/0x1cf0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? lustre_msg_buf_v2+0x1e0/0x1e0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __slab_free+0x81/0x2f0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? update_curr+0x14c/0x1e0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? account_entity_dequeue+0xae/0xd0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? dequeue_entity+0x11c/0x5e0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? tgt_lookup_reply+0x2d/0x190 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: Code: 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 41 8b 40 08 <85> c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b 17 0f b7 c2 Nov 05 20:58:17 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#45 stuck for 22s! [ldlm_cn01_005:22632] Nov 05 20:58:17 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:17 fir-md1-s1 kernel: CPU: 45 PID: 22632 Comm: ldlm_cn01_005 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:17 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:17 fir-md1-s1 kernel: task: ffffa12ddfd36180 ti: ffffa12ddfd24000 task.ti: ffffa12ddfd24000 Nov 05 20:58:17 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 Nov 05 20:58:17 fir-md1-s1 kernel: RSP: 0018:ffffa12ddfd27880 EFLAGS: 00000246 Nov 05 20:58:17 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa130a8286338 RCX: 0000000001690000 Nov 05 20:58:17 fir-md1-s1 kernel: RDX: ffffa11e3f7db780 RSI: 0000000000e90101 RDI: ffffa13e3710f480 Nov 05 20:58:17 fir-md1-s1 kernel: RBP: ffffa12ddfd27880 R08: ffffa11e3f8db780 R09: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R10: ffffa11e3f8df0c0 R11: ffffda91d4c6b100 R12: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R13: ffffffffc13da26c R14: ffffa12ddfd27810 R15: ffffa12ddfd278c0 Nov 05 20:58:17 fir-md1-s1 kernel: FS: 00007f92f9f50900(0000) GS:ffffa11e3f8c0000(0000) knlGS:0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:17 fir-md1-s1 kernel: CR2: 00007f6e8b3441cc CR3: 000000364fa10000 CR4: 00000000003407e0 Nov 05 20:58:17 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:17 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:17 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_es_lru_del+0x32/0x70 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_clear_inode+0x41/0x90 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_free_inode+0x10b/0x610 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_iloc_dirty+0x68/0x80 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] __ldlm_reprocess_all+0x11f/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_reprocess_all+0x13/0x20 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_request_cancel+0x42f/0x780 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_handle_cancel+0x232/0x2b0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_cancel_handler+0x158/0x590 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b Nov 05 20:58:17 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#46 stuck for 22s! [mdt_io02_046:41773] Nov 05 20:58:17 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:17 fir-md1-s1 kernel: CPU: 46 PID: 41773 Comm: mdt_io02_046 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:17 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:17 fir-md1-s1 kernel: task: ffffa11e283c6180 ti: ffffa11e2a6e8000 task.ti: ffffa11e2a6e8000 Nov 05 20:58:17 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 Nov 05 20:58:17 fir-md1-s1 kernel: RSP: 0018:ffffa11e2a6eb800 EFLAGS: 00000246 Nov 05 20:58:17 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa121143a2500 RCX: 0000000001710000 Nov 05 20:58:17 fir-md1-s1 kernel: RDX: ffffa13e7f45b780 RSI: 0000000000390101 RDI: ffffa13e3710f480 Nov 05 20:58:17 fir-md1-s1 kernel: RBP: ffffa11e2a6eb800 R08: ffffa12e3f8db780 R09: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R10: ffffa12e3f8df140 R11: ffffda91c30ef600 R12: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R13: ffffa11e2a6eb7a0 R14: ffffa121143a2268 R15: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: FS: 00007f3f2c767700(0000) GS:ffffa12e3f8c0000(0000) knlGS:0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:17 fir-md1-s1 kernel: CR2: 000000000280e248 CR3: 000000364fa10000 CR4: 00000000003407e0 Nov 05 20:58:17 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:17 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:17 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get+0x52/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? kiblnd_post_tx_locked+0x7bb/0xa50 [ko2iblnd] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? cfs_hash_bd_lookup_intent+0x63/0x170 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_ldiskfs_map_inode_pages+0x143/0x420 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_write_prep+0x2b6/0x360 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_obd_preprw+0x65b/0x10a0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] tgt_brw_write+0xc7c/0x1cf0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? update_curr+0x14c/0x1e0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __enqueue_entity+0x78/0x80 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? enqueue_entity+0x2ef/0xbe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? tgt_lookup_reply+0x2d/0x190 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __schedule+0x42a/0x860 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b Nov 05 20:58:17 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#47 stuck for 22s! [mdt03_107:41490] Nov 05 20:58:17 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:17 fir-md1-s1 kernel: CPU: 47 PID: 41490 Comm: mdt03_107 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:17 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:17 fir-md1-s1 kernel: task: ffffa12ce5d10000 ti: ffffa12a3a350000 task.ti: ffffa12a3a350000 Nov 05 20:58:17 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 Nov 05 20:58:17 fir-md1-s1 kernel: RSP: 0018:ffffa12a3a353510 EFLAGS: 00000246 Nov 05 20:58:17 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa130a137ec30 RCX: 0000000001790000 Nov 05 20:58:17 fir-md1-s1 kernel: RDX: ffffa13e7f4db780 RSI: 0000000000790101 RDI: ffffa13e3710f480 Nov 05 20:58:17 fir-md1-s1 kernel: RBP: ffffa12a3a353510 R08: ffffa13e7f6db780 R09: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R10: ffffa13e7f6df140 R11: ffffda91b8d46800 R12: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R13: ffffa12a3a3534b0 R14: ffffa130a137e998 R15: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: FS: 00007f67bd132740(0000) GS:ffffa13e7f6c0000(0000) knlGS:0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:17 fir-md1-s1 kernel: CR2: 0000000000412480 CR3: 000000364fa10000 CR4: 00000000003407e0 Nov 05 20:58:17 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:17 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:17 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? lprocfs_counter_sub+0xc1/0x130 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? dynlock_unlock+0x194/0x1e0 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __brelse+0x3d/0x50 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? iam_path_release+0x42/0x60 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_getblk+0x65/0x200 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_bread+0x27/0xc0 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_append+0x81/0x150 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_init_new_dir+0xcf/0x230 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __brelse+0x3d/0x50 Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_add_dot_dotdot+0x4e/0x90 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_add_dot_dotdot_internal.isra.76+0x5f/0x80 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_index_ea_insert+0xbaa/0x12f0 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lod_sub_insert+0x1c1/0x340 [lod] Nov 05 20:58:17 fir-md1-s1 kernel: [] lod_insert+0x24/0x30 [lod] Nov 05 20:58:17 fir-md1-s1 kernel: [] __mdd_index_insert_only+0x1cc/0x280 [mdd] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdd_create_object+0x6c8/0x820 [mdd] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdd_create+0xe31/0x14e0 [mdd] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_create+0xb54/0x1090 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? lprocfs_stats_lock+0x24/0xd0 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_reint_create+0x16b/0x360 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? mdt_thread_info_init+0xa4/0x1e0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __schedule+0x42a/0x860 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b Nov 05 20:58:17 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#8 stuck for 23s! [mdt_io00_016:41638] Nov 05 20:58:17 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:17 fir-md1-s1 kernel: CPU: 8 PID: 41638 Comm: mdt_io00_016 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:17 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:17 fir-md1-s1 kernel: task: ffffa13947bde180 ti: ffffa13ce1bc4000 task.ti: ffffa13ce1bc4000 Nov 05 20:58:17 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 Nov 05 20:58:17 fir-md1-s1 kernel: RSP: 0018:ffffa13ce1bc7800 EFLAGS: 00000246 Nov 05 20:58:17 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa1179dd3b5c0 RCX: 0000000000410000 Nov 05 20:58:17 fir-md1-s1 kernel: RDX: ffffa10e3ef1b780 RSI: 0000000000810101 RDI: ffffa13e3710f480 Nov 05 20:58:17 fir-md1-s1 kernel: RBP: ffffa13ce1bc7800 R08: ffffa10e3ee9b780 R09: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R10: ffffa10e3ee9f140 R11: ffffda916229f000 R12: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R13: ffffa13ce1bc77a0 R14: ffffa1179dd3b328 R15: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: FS: 00007f7a69e73740(0000) GS:ffffa10e3ee80000(0000) knlGS:0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:17 fir-md1-s1 kernel: CR2: 00007f6f9994f000 CR3: 00000030172e8000 CR4: 00000000003407e0 Nov 05 20:58:17 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:17 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:17 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get+0x52/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ___slab_alloc+0x209/0x4f0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? kiblnd_post_tx_locked+0x7bb/0xa50 [ko2iblnd] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? cfs_hash_bd_lookup_intent+0x63/0x170 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_ldiskfs_map_inode_pages+0x143/0x420 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_write_prep+0x2b6/0x360 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_obd_preprw+0x65b/0x10a0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] tgt_brw_write+0xc7c/0x1cf0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? lustre_msg_buf_v2+0x1e0/0x1e0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? update_curr+0x14c/0x1e0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __enqueue_entity+0x78/0x80 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? enqueue_entity+0x2ef/0xbe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? tgt_lookup_reply+0x2d/0x190 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __schedule+0x42a/0x860 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_evict_inode+0x472/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_inode_dirty+0x6f/0x210 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_evict_inode+0x57d/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] evict+0xb4/0x180 Nov 05 20:58:17 fir-md1-s1 kernel: [] iput+0xfc/0x190 Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_object_delete+0x1d2/0x330 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_free.isra.32+0x68/0x170 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? cfs_hash_bd_from_key+0x38/0xb0 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_put+0xc5/0x3d0 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? mdt_punch_hpreq_fini+0x10/0x10 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_dom_discard_cp_ast+0xb1/0x2b0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_work_cp_ast_lock+0xa8/0x1d0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x72/0x790 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? kmem_cache_alloc_node_trace+0x11d/0x210 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? lprocfs_counter_add+0xf9/0x160 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldlm_work_gl_ast_lock+0x3a0/0x3a0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_prep_set+0xd2/0x280 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] __ldlm_reprocess_all+0x11f/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_reprocess_all+0x13/0x20 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_request_cancel+0x42f/0x780 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_handle_cancel+0x232/0x2b0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_cancel_handler+0x158/0x590 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __schedule+0x42a/0x860 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b Nov 05 20:58:17 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [mdt_io00_052:41794] Nov 05 20:58:17 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [mdt01_055:41234] Nov 05 20:58:17 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:17 fir-md1-s1 kernel: CPU: 1 PID: 41234 Comm: mdt01_055 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:17 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:17 fir-md1-s1 kernel: task: ffffa11e370bc100 ti: ffffa11e2812c000 task.ti: ffffa11e2812c000 Nov 05 20:58:17 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x126/0x200 Nov 05 20:58:17 fir-md1-s1 kernel: RSP: 0018:ffffa11e2812f930 EFLAGS: 00000246 Nov 05 20:58:17 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa11c4a2f7828 RCX: 0000000000090000 Nov 05 20:58:17 fir-md1-s1 kernel: RDX: ffffa13e7f49b780 RSI: 0000000000590101 RDI: ffffa13e3710f480 Nov 05 20:58:17 fir-md1-s1 kernel: RBP: ffffa11e2812f930 R08: ffffa11e3f61b780 R09: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R10: ffffa11e3f61f0c0 R11: ffffda9191d2c600 R12: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R13: ffffffffc13da26c R14: ffffa11e2812f8c0 R15: ffffa11e2812f970 Nov 05 20:58:17 fir-md1-s1 kernel: FS: 00007f3f2c704700(0000) GS:ffffa11e3f600000(0000) knlGS:0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:17 fir-md1-s1 kernel: CR2: 00007f3f2c7d8000 CR3: 000000364fa10000 CR4: 00000000003407e0 Nov 05 20:58:17 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:17 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:17 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_es_lru_del+0x32/0x70 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_clear_inode+0x41/0x90 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_free_inode+0x10b/0x610 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_iloc_dirty+0x68/0x80 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_evict_inode+0x472/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_inode_dirty+0x6f/0x210 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_evict_inode+0x57d/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] evict+0xb4/0x180 Nov 05 20:58:17 fir-md1-s1 kernel: [] iput+0xfc/0x190 Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_object_delete+0x1d2/0x330 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_free.isra.32+0x68/0x170 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? cfs_hash_bd_from_key+0x38/0xb0 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_put+0xc5/0x3d0 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_reint_unlink+0x813/0x14b0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? mdt_thread_info_init+0xa4/0x1e0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? wake_up_state+0x20/0x20 Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: Code: 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 41 8b 40 08 <85> c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b 17 0f b7 c2 Nov 05 20:58:17 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [ldlm_cn02_013:52318] Nov 05 20:58:17 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:17 fir-md1-s1 kernel: CPU: 2 PID: 52318 Comm: ldlm_cn02_013 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:17 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:17 fir-md1-s1 kernel: task: ffffa11e32cfa080 ti: ffffa13716318000 task.ti: ffffa13716318000 Nov 05 20:58:17 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 Nov 05 20:58:17 fir-md1-s1 kernel: RSP: 0018:ffffa1371631b880 EFLAGS: 00000246 Nov 05 20:58:17 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa129c7caa038 RCX: 0000000000110000 Nov 05 20:58:17 fir-md1-s1 kernel: RDX: ffffa13e7f59b780 RSI: 0000000000d90101 RDI: ffffa13e3710f480 Nov 05 20:58:17 fir-md1-s1 kernel: RBP: ffffa1371631b880 R08: ffffa12e3f61b780 R09: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R10: ffffa12e3f61f0c0 R11: ffffda91ee4ef9c0 R12: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R13: ffffffffc13da26c R14: ffffa1371631b810 R15: ffffa1371631b8c0 Nov 05 20:58:17 fir-md1-s1 kernel: FS: 00007f94d47aa880(0000) GS:ffffa12e3f600000(0000) knlGS:0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:17 fir-md1-s1 kernel: CR2: 00007f94c3685010 CR3: 000000202a620000 CR4: 00000000003407e0 Nov 05 20:58:17 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:17 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:17 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_es_lru_del+0x32/0x70 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_clear_inode+0x41/0x90 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_free_inode+0x10b/0x610 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_iloc_dirty+0x68/0x80 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_evict_inode+0x472/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_inode_dirty+0x6f/0x210 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_evict_inode+0x57d/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] evict+0xb4/0x180 Nov 05 20:58:17 fir-md1-s1 kernel: [] iput+0xfc/0x190 Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_object_delete+0x1d2/0x330 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_free.isra.32+0x68/0x170 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? cfs_hash_bd_from_key+0x38/0xb0 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_put+0xc5/0x3d0 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? mdt_punch_hpreq_fini+0x10/0x10 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_dom_discard_cp_ast+0xb1/0x2b0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_work_cp_ast_lock+0xa8/0x1d0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x72/0x790 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? kmem_cache_alloc_node_trace+0x11d/0x210 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? lprocfs_counter_add+0xf9/0x160 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldlm_work_gl_ast_lock+0x3a0/0x3a0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_prep_set+0xd2/0x280 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] __ldlm_reprocess_all+0x11f/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_reprocess_all+0x13/0x20 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_request_cancel+0x42f/0x780 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_handle_cancel+0x232/0x2b0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_cancel_handler+0x158/0x590 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b Nov 05 20:58:17 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [ldlm_cn03_025:61222] Nov 05 20:58:17 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:17 fir-md1-s1 kernel: CPU: 3 PID: 61222 Comm: ldlm_cn03_025 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:17 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:17 fir-md1-s1 kernel: task: ffffa13e397dc100 ti: ffffa13679694000 task.ti: ffffa13679694000 Nov 05 20:58:17 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 Nov 05 20:58:17 fir-md1-s1 kernel: RSP: 0018:ffffa13679697880 EFLAGS: 00000246 Nov 05 20:58:17 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa11c4a320f78 RCX: 0000000000190000 Nov 05 20:58:17 fir-md1-s1 kernel: RDX: ffffa12e3f81b780 RSI: 0000000001110101 RDI: ffffa13e3710f480 Nov 05 20:58:17 fir-md1-s1 kernel: RBP: ffffa13679697880 R08: ffffa13e7f41b780 R09: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R10: ffffa13e7f41f0c0 R11: ffffda920c54c8c0 R12: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R13: ffffffffc13da26c R14: ffffa13679697810 R15: ffffa136796978c0 Nov 05 20:58:17 fir-md1-s1 kernel: FS: 00007f69c209d740(0000) GS:ffffa13e7f400000(0000) knlGS:0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:17 fir-md1-s1 kernel: CR2: 00007f69c20aa000 CR3: 000000364fa10000 CR4: 00000000003407e0 Nov 05 20:58:17 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:17 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:17 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_es_lru_del+0x32/0x70 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_clear_inode+0x41/0x90 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_free_inode+0x10b/0x610 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_iloc_dirty+0x68/0x80 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_evict_inode+0x472/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_inode_dirty+0x6f/0x210 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_evict_inode+0x57d/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] evict+0xb4/0x180 Nov 05 20:58:17 fir-md1-s1 kernel: [] iput+0xfc/0x190 Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_object_delete+0x1d2/0x330 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_free.isra.32+0x68/0x170 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? cfs_hash_bd_from_key+0x38/0xb0 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_put+0xc5/0x3d0 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? mdt_punch_hpreq_fini+0x10/0x10 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_dom_discard_cp_ast+0xb1/0x2b0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_work_cp_ast_lock+0xa8/0x1d0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x72/0x790 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? kmem_cache_alloc_node_trace+0x11d/0x210 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? lprocfs_counter_add+0xf9/0x160 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldlm_work_gl_ast_lock+0x3a0/0x3a0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_prep_set+0xd2/0x280 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] __ldlm_reprocess_all+0x11f/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_reprocess_all+0x13/0x20 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_request_cancel+0x42f/0x780 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_handle_cancel+0x232/0x2b0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_cancel_handler+0x158/0x590 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __schedule+0x42a/0x860 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b Nov 05 20:58:17 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#4 stuck for 22s! [mdt00_083:41365] Nov 05 20:58:17 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:17 fir-md1-s1 kernel: CPU: 4 PID: 41365 Comm: mdt00_083 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:17 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:17 fir-md1-s1 kernel: task: ffffa127be26c100 ti: ffffa127c1ed8000 task.ti: ffffa127c1ed8000 Nov 05 20:58:17 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 Nov 05 20:58:17 fir-md1-s1 kernel: RSP: 0018:ffffa127c1edb510 EFLAGS: 00000246 Nov 05 20:58:17 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa129c7cb4250 RCX: 0000000000210000 Nov 05 20:58:17 fir-md1-s1 kernel: RDX: ffffa10e3f05b780 RSI: 0000000001210101 RDI: ffffa13e3710f480 Nov 05 20:58:17 fir-md1-s1 kernel: RBP: ffffa127c1edb510 R08: ffffa10e3ee5b780 R09: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R10: ffffa10e3ee5f140 R11: ffffda9173e6a400 R12: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R13: ffffa127c1edb4b0 R14: ffffa129c7cb3fb8 R15: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: FS: 00007fb30a5e6880(0000) GS:ffffa10e3ee40000(0000) knlGS:0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:17 fir-md1-s1 kernel: CR2: 00007f38cf67e000 CR3: 00000010209aa000 CR4: 00000000003407e0 Nov 05 20:58:17 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:17 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:17 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? lprocfs_counter_sub+0xc1/0x130 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? dynlock_unlock+0x194/0x1e0 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __brelse+0x3d/0x50 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? iam_path_release+0x42/0x60 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_getblk+0x65/0x200 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_bread+0x27/0xc0 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_append+0x81/0x150 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_init_new_dir+0xcf/0x230 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __brelse+0x3d/0x50 Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_add_dot_dotdot+0x4e/0x90 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_add_dot_dotdot_internal.isra.76+0x5f/0x80 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_index_ea_insert+0xbaa/0x12f0 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lod_sub_insert+0x1c1/0x340 [lod] Nov 05 20:58:17 fir-md1-s1 kernel: [] lod_insert+0x24/0x30 [lod] Nov 05 20:58:17 fir-md1-s1 kernel: [] __mdd_index_insert_only+0x1cc/0x280 [mdd] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdd_create_object+0x6c8/0x820 [mdd] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdd_create+0xe31/0x14e0 [mdd] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_create+0xb54/0x1090 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldlm_resource_putref+0x199/0x260 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? lprocfs_stats_lock+0x24/0xd0 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_reint_create+0x16b/0x360 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? mdt_thread_info_init+0xa4/0x1e0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b Nov 05 20:58:17 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#5 stuck for 22s! [mdt_rdpg01_042:61620] Nov 05 20:58:17 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:17 fir-md1-s1 kernel: CPU: 5 PID: 61620 Comm: mdt_rdpg01_042 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:17 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:17 fir-md1-s1 kernel: task: ffffa128b43b0000 ti: ffffa11f28f08000 task.ti: ffffa11f28f08000 Nov 05 20:58:17 fir-md1-s1 kernel: RIP: 0010:[] [] ldiskfs_inode_touch_time_cmp+0xd/0x90 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: RSP: 0000:ffffa11f28f0b728 EFLAGS: 00000282 Nov 05 20:58:17 fir-md1-s1 kernel: RAX: 8000040400080000 RBX: ffffffffbe89b6f4 RCX: 00000001061bb104 Nov 05 20:58:17 fir-md1-s1 kernel: RDX: ffffa10fdb619878 RSI: ffffa1248602b5c8 RDI: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: RBP: ffffa11f28f0b778 R08: 000000000000000a R09: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R10: 0000000000001831 R11: ffffa11f28f0b486 R12: 0000000000000006 Nov 05 20:58:17 fir-md1-s1 kernel: R13: 0000000000000032 R14: 0000000000000000 R15: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: FS: 00007f2faba6b740(0000) GS:ffffa11e3f640000(0000) knlGS:0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:17 fir-md1-s1 kernel: CR2: 00000000025a6028 CR3: 000000364fa10000 CR4: 00000000003407e0 Nov 05 20:58:17 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:17 fir-md1-s1 kernel: [] ? merge+0x62/0xc0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_init_inode_table+0x410/0x410 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] list_sort+0x9b/0x250 Nov 05 20:58:17 fir-md1-s1 kernel: [] __ldiskfs_es_shrink+0x1ce/0x2a0 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_es_shrink+0xb4/0x130 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] shrink_slab+0x175/0x340 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? vmpressure+0x61/0x90 Nov 05 20:58:17 fir-md1-s1 kernel: [] zone_reclaim+0x1d1/0x2f0 Nov 05 20:58:17 fir-md1-s1 kernel: [] get_page_from_freelist+0x87b/0xa70 Nov 05 20:58:17 fir-md1-s1 kernel: [] __alloc_pages_nodemask+0x176/0x420 Nov 05 20:58:17 fir-md1-s1 kernel: [] alloc_pages_current+0x98/0x110 Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_readpage+0x3cc/0x880 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __schedule+0x42a/0x860 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: Code: ff 8d 4a 01 89 d0 f0 0f b1 0f 39 d0 0f 84 fb fd ff ff 89 c2 eb e2 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 8b 86 e0 fc ff ff <48> 89 e5 48 c1 e8 2b a8 01 74 15 48 8b 8a e0 fc ff ff b8 01 00 Nov 05 20:58:17 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#6 stuck for 22s! [mdt_io02_036:41739] Nov 05 20:58:17 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:17 fir-md1-s1 kernel: CPU: 6 PID: 41739 Comm: mdt_io02_036 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:17 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:17 fir-md1-s1 kernel: task: ffffa11e344a1040 ti: ffffa13e27604000 task.ti: ffffa13e27604000 Nov 05 20:58:17 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 Nov 05 20:58:17 fir-md1-s1 kernel: RSP: 0018:ffffa13e27607800 EFLAGS: 00000246 Nov 05 20:58:17 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa130a6107490 RCX: 0000000000310000 Nov 05 20:58:17 fir-md1-s1 kernel: RDX: ffffa12e3f85b780 RSI: 0000000001310101 RDI: ffffa13e3710f480 Nov 05 20:58:17 fir-md1-s1 kernel: RBP: ffffa13e27607800 R08: ffffa12e3f65b780 R09: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R10: ffffa12e3f65f140 R11: ffffda91d9456000 R12: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R13: ffffa13e276077a0 R14: ffffa130a61071f8 R15: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: FS: 00007f7a69e73740(0000) GS:ffffa12e3f640000(0000) knlGS:0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:17 fir-md1-s1 kernel: CR2: 0000000000e8cc28 CR3: 00000030172e8000 CR4: 00000000003407e0 Nov 05 20:58:17 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:17 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:17 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get+0x52/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ___slab_alloc+0x209/0x4f0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? kiblnd_post_tx_locked+0x7bb/0xa50 [ko2iblnd] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? cfs_hash_bd_lookup_intent+0x63/0x170 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_ldiskfs_map_inode_pages+0x143/0x420 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_write_prep+0x2b6/0x360 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_obd_preprw+0x65b/0x10a0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] tgt_brw_write+0xc7c/0x1cf0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? lustre_msg_buf_v2+0x1e0/0x1e0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __slab_free+0x81/0x2f0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? update_curr+0x14c/0x1e0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? account_entity_dequeue+0xae/0xd0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? dequeue_entity+0x11c/0x5e0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? tgt_lookup_reply+0x2d/0x190 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? wake_up_state+0x20/0x20 Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b Nov 05 20:58:17 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#7 stuck for 22s! [mdt_io03_013:41597] Nov 05 20:58:17 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:17 fir-md1-s1 kernel: CPU: 7 PID: 41597 Comm: mdt_io03_013 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:17 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:17 fir-md1-s1 kernel: task: ffffa12dd3985140 ti: ffffa12d54fe0000 task.ti: ffffa12d54fe0000 Nov 05 20:58:17 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 Nov 05 20:58:17 fir-md1-s1 kernel: RSP: 0018:ffffa12d54fe3800 EFLAGS: 00000246 Nov 05 20:58:17 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa130a614d740 RCX: 0000000000390000 Nov 05 20:58:17 fir-md1-s1 kernel: RDX: ffffa13e7f55b780 RSI: 0000000000b90101 RDI: ffffa13e3710f480 Nov 05 20:58:17 fir-md1-s1 kernel: RBP: ffffa12d54fe3800 R08: ffffa13e7f45b780 R09: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R10: ffffa13e7f45f140 R11: ffffda920a915400 R12: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R13: ffffa12d54fe37a0 R14: ffffa130a614d4a8 R15: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: FS: 00007fe333ba7740(0000) GS:ffffa13e7f440000(0000) knlGS:0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:17 fir-md1-s1 kernel: CR2: 00007fe3337921cc CR3: 000000364fa10000 CR4: 00000000003407e0 Nov 05 20:58:17 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:17 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:17 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get+0x52/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? kiblnd_post_tx_locked+0x7bb/0xa50 [ko2iblnd] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? cfs_hash_bd_lookup_intent+0x63/0x170 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_ldiskfs_map_inode_pages+0x143/0x420 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_write_prep+0x2b6/0x360 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_obd_preprw+0x65b/0x10a0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] tgt_brw_write+0xc7c/0x1cf0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __slab_free+0x81/0x2f0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? update_curr+0x14c/0x1e0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __enqueue_entity+0x78/0x80 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? enqueue_entity+0x2ef/0xbe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? tgt_lookup_reply+0x2d/0x190 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b Nov 05 20:58:17 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#9 stuck for 22s! [ldlm_cn01_010:28356] Nov 05 20:58:17 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:17 fir-md1-s1 kernel: CPU: 9 PID: 28356 Comm: ldlm_cn01_010 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:17 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:17 fir-md1-s1 kernel: task: ffffa11e2c98e180 ti: ffffa1375d76c000 task.ti: ffffa1375d76c000 Nov 05 20:58:17 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 Nov 05 20:58:17 fir-md1-s1 kernel: RSP: 0018:ffffa1375d76f880 EFLAGS: 00000246 Nov 05 20:58:17 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa11c4a0e5278 RCX: 0000000000490000 Nov 05 20:58:17 fir-md1-s1 kernel: RDX: ffffa12e3f61b780 RSI: 0000000000110101 RDI: ffffa13e3710f480 Nov 05 20:58:17 fir-md1-s1 kernel: RBP: ffffa1375d76f880 R08: ffffa11e3f69b780 R09: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R10: ffffa11e3f69f0c0 R11: ffffda91a4dd3e80 R12: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R13: ffffffffc13da26c R14: ffffa1375d76f810 R15: ffffa1375d76f8c0 Nov 05 20:58:17 fir-md1-s1 kernel: FS: 00007f3f2c767700(0000) GS:ffffa11e3f680000(0000) knlGS:0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:17 fir-md1-s1 kernel: CR2: 00007f3f2c7d8000 CR3: 000000364fa10000 CR4: 00000000003407e0 Nov 05 20:58:17 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:17 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:17 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_es_lru_del+0x32/0x70 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_clear_inode+0x41/0x90 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_free_inode+0x10b/0x610 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_iloc_dirty+0x68/0x80 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_evict_inode+0x472/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_inode_dirty+0x6f/0x210 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_evict_inode+0x57d/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] evict+0xb4/0x180 Nov 05 20:58:17 fir-md1-s1 kernel: [] iput+0xfc/0x190 Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_object_delete+0x1d2/0x330 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_free.isra.32+0x68/0x170 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? cfs_hash_bd_from_key+0x38/0xb0 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_put+0xc5/0x3d0 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? mdt_punch_hpreq_fini+0x10/0x10 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_dom_discard_cp_ast+0xb1/0x2b0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_work_cp_ast_lock+0xa8/0x1d0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x72/0x790 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? kmem_cache_alloc_node_trace+0x11d/0x210 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? lprocfs_counter_add+0xf9/0x160 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldlm_work_gl_ast_lock+0x3a0/0x3a0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_prep_set+0xd2/0x280 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] __ldlm_reprocess_all+0x11f/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_reprocess_all+0x13/0x20 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_request_cancel+0x42f/0x780 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_handle_cancel+0x232/0x2b0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_cancel_handler+0x158/0x590 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b Nov 05 20:58:17 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#10 stuck for 22s! [ldlm_cn02_005:22634] Nov 05 20:58:17 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:17 fir-md1-s1 kernel: CPU: 10 PID: 22634 Comm: ldlm_cn02_005 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:17 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:17 fir-md1-s1 kernel: task: ffffa12de0bd0000 ti: ffffa12dd3e4c000 task.ti: ffffa12dd3e4c000 Nov 05 20:58:17 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x156/0x200 Nov 05 20:58:17 fir-md1-s1 kernel: RSP: 0018:ffffa12dd3e4f880 EFLAGS: 00000202 Nov 05 20:58:17 fir-md1-s1 kernel: RAX: 0000000000000101 RBX: ffffa123e488bd88 RCX: 0000000000510000 Nov 05 20:58:17 fir-md1-s1 kernel: RDX: 0000000000410101 RSI: 0000000000000101 RDI: ffffa13e3710f480 Nov 05 20:58:17 fir-md1-s1 kernel: RBP: ffffa12dd3e4f880 R08: ffffa12e3f69b780 R09: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R10: ffffa12e3f69f0c0 R11: ffffda91d5dcf580 R12: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R13: ffffffffc13da26c R14: ffffa12dd3e4f810 R15: ffffa12dd3e4f8c0 Nov 05 20:58:17 fir-md1-s1 kernel: FS: 00007f30a909c700(0000) GS:ffffa12e3f680000(0000) knlGS:0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:17 fir-md1-s1 kernel: CR2: 00007f94c3031010 CR3: 0000004025876000 CR4: 00000000003407e0 Nov 05 20:58:17 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:17 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:17 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_es_lru_del+0x32/0x70 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_clear_inode+0x41/0x90 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_free_inode+0x10b/0x610 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_iloc_dirty+0x68/0x80 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_evict_inode+0x472/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_inode_dirty+0x6f/0x210 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_evict_inode+0x57d/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] evict+0xb4/0x180 Nov 05 20:58:17 fir-md1-s1 kernel: [] iput+0xfc/0x190 Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_object_delete+0x1d2/0x330 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_free.isra.32+0x68/0x170 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? cfs_hash_bd_from_key+0x38/0xb0 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_put+0xc5/0x3d0 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? mdt_punch_hpreq_fini+0x10/0x10 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_dom_discard_cp_ast+0xb1/0x2b0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_work_cp_ast_lock+0xa8/0x1d0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x72/0x790 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? kmem_cache_alloc_node_trace+0x11d/0x210 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? lprocfs_counter_add+0xf9/0x160 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldlm_work_gl_ast_lock+0x3a0/0x3a0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_prep_set+0xd2/0x280 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] __ldlm_reprocess_all+0x11f/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_reprocess_all+0x13/0x20 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_request_cancel+0x42f/0x780 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_handle_cancel+0x232/0x2b0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_cancel_handler+0x158/0x590 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? finish_task_switch+0xa9/0x1c0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: Code: 8b 08 4d 85 c9 74 04 41 0f 18 09 8b 17 0f b7 c2 85 c0 74 21 83 f8 03 75 10 eb 1a 66 2e 0f 1f 84 00 00 00 00 00 85 c0 74 0c f3 90 <8b> 17 0f b7 c2 83 f8 03 75 f0 be 01 00 00 00 eb 15 66 0f 1f 84 Nov 05 20:58:17 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#11 stuck for 22s! [ldlm_cn03_018:60052] Nov 05 20:58:17 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:17 fir-md1-s1 kernel: CPU: 11 PID: 60052 Comm: ldlm_cn03_018 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:17 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:17 fir-md1-s1 kernel: task: ffffa129b1a80000 ti: ffffa12553a58000 task.ti: ffffa12553a58000 Nov 05 20:58:17 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 Nov 05 20:58:17 fir-md1-s1 kernel: RSP: 0018:ffffa12553a5b880 EFLAGS: 00000246 Nov 05 20:58:17 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa130a1358b48 RCX: 0000000000590000 Nov 05 20:58:17 fir-md1-s1 kernel: RDX: ffffa13e7f51b780 RSI: 0000000000990101 RDI: ffffa13e3710f480 Nov 05 20:58:17 fir-md1-s1 kernel: RBP: ffffa12553a5b880 R08: ffffa13e7f49b780 R09: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R10: ffffa13e7f49f0c0 R11: ffffda920b3b9a80 R12: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R13: ffffffffc13da26c R14: ffffa12553a5b810 R15: ffffa12553a5b8c0 Nov 05 20:58:17 fir-md1-s1 kernel: FS: 00007f4fff165880(0000) GS:ffffa13e7f480000(0000) knlGS:0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:17 fir-md1-s1 kernel: CR2: 00007f4fec4861c4 CR3: 000000364fa10000 CR4: 00000000003407e0 Nov 05 20:58:17 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:17 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:17 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_es_lru_del+0x32/0x70 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_clear_inode+0x41/0x90 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_free_inode+0x10b/0x610 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_iloc_dirty+0x68/0x80 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_evict_inode+0x472/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_inode_dirty+0x6f/0x210 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_evict_inode+0x57d/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] evict+0xb4/0x180 Nov 05 20:58:17 fir-md1-s1 kernel: [] iput+0xfc/0x190 Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_object_delete+0x1d2/0x330 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_free.isra.32+0x68/0x170 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? cfs_hash_bd_from_key+0x38/0xb0 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_put+0xc5/0x3d0 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? mdt_punch_hpreq_fini+0x10/0x10 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_dom_discard_cp_ast+0xb1/0x2b0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_work_cp_ast_lock+0xa8/0x1d0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x72/0x790 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? kmem_cache_alloc_node_trace+0x11d/0x210 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? lprocfs_counter_add+0xf9/0x160 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldlm_work_gl_ast_lock+0x3a0/0x3a0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_prep_set+0xd2/0x280 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] __ldlm_reprocess_all+0x11f/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_reprocess_all+0x13/0x20 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_request_cancel+0x42f/0x780 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_handle_cancel+0x232/0x2b0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_cancel_handler+0x158/0x590 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b Nov 05 20:58:17 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#12 stuck for 22s! [ldlm_cn00_018:53023] Nov 05 20:58:17 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:17 fir-md1-s1 kernel: CPU: 12 PID: 53023 Comm: ldlm_cn00_018 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:17 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:17 fir-md1-s1 kernel: task: ffffa13dcda91040 ti: ffffa1377131c000 task.ti: ffffa1377131c000 Nov 05 20:58:17 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 Nov 05 20:58:17 fir-md1-s1 kernel: RSP: 0018:ffffa1377131f880 EFLAGS: 00000246 Nov 05 20:58:17 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa11636187c58 RCX: 0000000000610000 Nov 05 20:58:17 fir-md1-s1 kernel: RDX: ffffa11e3f75b780 RSI: 0000000000a90101 RDI: ffffa13e3710f480 Nov 05 20:58:17 fir-md1-s1 kernel: RBP: ffffa1377131f880 R08: ffffa10e3eedb780 R09: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R10: ffffa10e3eedf0c0 R11: ffffda91911e8400 R12: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R13: ffffffffc13da26c R14: ffffa1377131f810 R15: ffffa1377131f8c0 Nov 05 20:58:17 fir-md1-s1 kernel: FS: 00007f5587c09880(0000) GS:ffffa10e3eec0000(0000) knlGS:0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:17 fir-md1-s1 kernel: CR2: 00007f38cf67e000 CR3: 000000101efc0000 CR4: 00000000003407e0 Nov 05 20:58:17 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:17 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:17 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_es_lru_del+0x32/0x70 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_clear_inode+0x41/0x90 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_free_inode+0x10b/0x610 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_iloc_dirty+0x68/0x80 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_evict_inode+0x472/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_inode_dirty+0x6f/0x210 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_evict_inode+0x57d/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] evict+0xb4/0x180 Nov 05 20:58:17 fir-md1-s1 kernel: [] iput+0xfc/0x190 Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_object_delete+0x1d2/0x330 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_free.isra.32+0x68/0x170 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? cfs_hash_bd_from_key+0x38/0xb0 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_put+0xc5/0x3d0 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? mdt_punch_hpreq_fini+0x10/0x10 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_dom_discard_cp_ast+0xb1/0x2b0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_work_cp_ast_lock+0xa8/0x1d0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x72/0x790 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? kmem_cache_alloc_node_trace+0x11d/0x210 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? lprocfs_counter_add+0xf9/0x160 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldlm_work_gl_ast_lock+0x3a0/0x3a0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_prep_set+0xd2/0x280 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] __ldlm_reprocess_all+0x11f/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_reprocess_all+0x13/0x20 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_request_cancel+0x42f/0x780 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_handle_cancel+0x232/0x2b0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_cancel_handler+0x158/0x590 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __schedule+0x42a/0x860 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b Nov 05 20:58:17 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#13 stuck for 22s! [ldlm_cn01_018:60043] Nov 05 20:58:17 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:17 fir-md1-s1 kernel: CPU: 13 PID: 60043 Comm: ldlm_cn01_018 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:17 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:17 fir-md1-s1 kernel: task: ffffa124c1346180 ti: ffffa12af5ae8000 task.ti: ffffa12af5ae8000 Nov 05 20:58:17 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x126/0x200 Nov 05 20:58:17 fir-md1-s1 kernel: RSP: 0018:ffffa12af5aeb880 EFLAGS: 00000246 Nov 05 20:58:17 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa129c7f1ca18 RCX: 0000000000690000 Nov 05 20:58:17 fir-md1-s1 kernel: RDX: ffffa13e7f41b780 RSI: 0000000000190101 RDI: ffffa13e3710f480 Nov 05 20:58:17 fir-md1-s1 kernel: RBP: ffffa12af5aeb880 R08: ffffa11e3f6db780 R09: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R10: ffffa11e3f6df0c0 R11: ffffda919566ec00 R12: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R13: ffffffffc13da26c R14: ffffa12af5aeb810 R15: ffffa12af5aeb8c0 Nov 05 20:58:17 fir-md1-s1 kernel: FS: 00007f8f6cd9d880(0000) GS:ffffa11e3f6c0000(0000) knlGS:0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:17 fir-md1-s1 kernel: CR2: 00007f92f9f5f000 CR3: 000000364fa10000 CR4: 00000000003407e0 Nov 05 20:58:17 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:17 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:17 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_es_lru_del+0x32/0x70 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_clear_inode+0x41/0x90 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_free_inode+0x10b/0x610 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_iloc_dirty+0x68/0x80 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_evict_inode+0x472/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_inode_dirty+0x6f/0x210 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_evict_inode+0x57d/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] evict+0xb4/0x180 Nov 05 20:58:17 fir-md1-s1 kernel: [] iput+0xfc/0x190 Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_object_delete+0x1d2/0x330 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_free.isra.32+0x68/0x170 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? cfs_hash_bd_from_key+0x38/0xb0 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_put+0xc5/0x3d0 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? mdt_punch_hpreq_fini+0x10/0x10 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_dom_discard_cp_ast+0xb1/0x2b0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_work_cp_ast_lock+0xa8/0x1d0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x72/0x790 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? kmem_cache_alloc_node_trace+0x11d/0x210 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? lprocfs_counter_add+0xf9/0x160 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldlm_work_gl_ast_lock+0x3a0/0x3a0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_prep_set+0xd2/0x280 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] __ldlm_reprocess_all+0x11f/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_reprocess_all+0x13/0x20 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_request_cancel+0x42f/0x780 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_handle_cancel+0x232/0x2b0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_cancel_handler+0x158/0x590 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __schedule+0x42a/0x860 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: Code: 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 41 8b 40 08 <85> c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b 17 0f b7 c2 Nov 05 20:58:17 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#14 stuck for 22s! [mdt02_063:41272] Nov 05 20:58:17 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:17 fir-md1-s1 kernel: CPU: 14 PID: 41272 Comm: mdt02_063 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:17 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:17 fir-md1-s1 kernel: task: ffffa11e2b2f0000 ti: ffffa11e2ad0c000 task.ti: ffffa11e2ad0c000 Nov 05 20:58:17 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 Nov 05 20:58:17 fir-md1-s1 kernel: RSP: 0018:ffffa11e2ad0f930 EFLAGS: 00000246 Nov 05 20:58:17 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa130a129dad8 RCX: 0000000000710000 Nov 05 20:58:17 fir-md1-s1 kernel: RDX: ffffa10e3ef9b780 RSI: 0000000000c10101 RDI: ffffa13e3710f480 Nov 05 20:58:17 fir-md1-s1 kernel: RBP: ffffa11e2ad0f930 R08: ffffa12e3f6db780 R09: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R10: ffffa12e3f6df0c0 R11: ffffda91f6155bc0 R12: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R13: ffffffffc13da26c R14: ffffa11e2ad0f8c0 R15: ffffa11e2ad0f970 Nov 05 20:58:17 fir-md1-s1 kernel: FS: 00007f7a69e73740(0000) GS:ffffa12e3f6c0000(0000) knlGS:0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:17 fir-md1-s1 kernel: CR2: 00007f38cf67e000 CR3: 00000030172e8000 CR4: 00000000003407e0 Nov 05 20:58:17 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:17 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:17 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_es_lru_del+0x32/0x70 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_clear_inode+0x41/0x90 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_free_inode+0x10b/0x610 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_iloc_dirty+0x68/0x80 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_evict_inode+0x472/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_inode_dirty+0x6f/0x210 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_evict_inode+0x57d/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] evict+0xb4/0x180 Nov 05 20:58:17 fir-md1-s1 kernel: [] iput+0xfc/0x190 Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_object_delete+0x1d2/0x330 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_free.isra.32+0x68/0x170 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? cfs_hash_bd_from_key+0x38/0xb0 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_put+0xc5/0x3d0 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_reint_unlink+0x813/0x14b0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? mdt_thread_info_init+0xa4/0x1e0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b Nov 05 20:58:17 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#15 stuck for 22s! [mdt03_073:41348] Nov 05 20:58:17 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:17 fir-md1-s1 kernel: CPU: 15 PID: 41348 Comm: mdt03_073 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:17 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:17 fir-md1-s1 kernel: task: ffffa12ee958a080 ti: ffffa111b79dc000 task.ti: ffffa111b79dc000 Nov 05 20:58:17 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 Nov 05 20:58:17 fir-md1-s1 kernel: RSP: 0018:ffffa111b79df930 EFLAGS: 00000246 Nov 05 20:58:17 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa11c4a2f5f08 RCX: 0000000000790000 Nov 05 20:58:17 fir-md1-s1 kernel: RDX: ffffa11e3f79b780 RSI: 0000000000c90101 RDI: ffffa13e3710f480 Nov 05 20:58:17 fir-md1-s1 kernel: RBP: ffffa111b79df930 R08: ffffa13e7f4db780 R09: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R10: ffffa13e7f4df0c0 R11: ffffda923c80c000 R12: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R13: ffffffffc13da26c R14: ffffa111b79df8c0 R15: ffffa111b79df970 Nov 05 20:58:17 fir-md1-s1 kernel: FS: 00007f960ee14740(0000) GS:ffffa13e7f4c0000(0000) knlGS:0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:17 fir-md1-s1 kernel: CR2: 00007f960e9ff1cc CR3: 000000364fa10000 CR4: 00000000003407e0 Nov 05 20:58:17 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:17 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:17 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_es_lru_del+0x32/0x70 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_clear_inode+0x41/0x90 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_free_inode+0x10b/0x610 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_iloc_dirty+0x68/0x80 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_evict_inode+0x472/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_inode_dirty+0x6f/0x210 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_evict_inode+0x57d/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] evict+0xb4/0x180 Nov 05 20:58:17 fir-md1-s1 kernel: [] iput+0xfc/0x190 Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_object_delete+0x1d2/0x330 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_free.isra.32+0x68/0x170 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? cfs_hash_bd_from_key+0x38/0xb0 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_put+0xc5/0x3d0 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_reint_unlink+0x813/0x14b0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? mdt_thread_info_init+0xa4/0x1e0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __schedule+0x42a/0x860 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b Nov 05 20:58:17 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#16 stuck for 22s! [mdt_io00_043:41776] Nov 05 20:58:17 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:17 fir-md1-s1 kernel: CPU: 16 PID: 41776 Comm: mdt_io00_043 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:17 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:17 fir-md1-s1 kernel: task: ffffa11e283c30c0 ti: ffffa11e29d64000 task.ti: ffffa11e29d64000 Nov 05 20:58:17 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 Nov 05 20:58:17 fir-md1-s1 kernel: RSP: 0018:ffffa11e29d67800 EFLAGS: 00000246 Nov 05 20:58:17 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa1179deae3d0 RCX: 0000000000810000 Nov 05 20:58:17 fir-md1-s1 kernel: RDX: ffffa10e3f0db780 RSI: 0000000001610101 RDI: ffffa13e3710f480 Nov 05 20:58:17 fir-md1-s1 kernel: RBP: ffffa11e29d67800 R08: ffffa10e3ef1b780 R09: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R10: ffffa10e3ef1f140 R11: ffffda91ab5dec00 R12: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R13: ffffa11e29d677a0 R14: ffffa1179deae138 R15: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: FS: 00007f89ef4b3880(0000) GS:ffffa10e3ef00000(0000) knlGS:0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:17 fir-md1-s1 kernel: CR2: 00007f89ec2d4000 CR3: 0000003035674000 CR4: 00000000003407e0 Nov 05 20:58:17 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:17 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:17 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get+0x52/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ___slab_alloc+0x209/0x4f0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? kiblnd_post_tx_locked+0x7bb/0xa50 [ko2iblnd] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? cfs_hash_bd_lookup_intent+0x63/0x170 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_ldiskfs_map_inode_pages+0x143/0x420 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_write_prep+0x2b6/0x360 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_obd_preprw+0x65b/0x10a0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] tgt_brw_write+0xc7c/0x1cf0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? lustre_msg_buf_v2+0x1e0/0x1e0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __enqueue_entity+0x78/0x80 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? enqueue_entity+0x2ef/0xbe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? tgt_lookup_reply+0x2d/0x190 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __schedule+0x42a/0x860 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b Nov 05 20:58:17 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#17 stuck for 22s! [mdt01_077:41351] Nov 05 20:58:17 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:17 fir-md1-s1 kernel: CPU: 17 PID: 41351 Comm: mdt01_077 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:17 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:17 fir-md1-s1 kernel: task: ffffa11e3743a080 ti: ffffa11e343dc000 task.ti: ffffa11e343dc000 Nov 05 20:58:17 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x126/0x200 Nov 05 20:58:17 fir-md1-s1 kernel: RSP: 0018:ffffa11e343df510 EFLAGS: 00000246 Nov 05 20:58:17 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa11c4a34a930 RCX: 0000000000890000 Nov 05 20:58:17 fir-md1-s1 kernel: RDX: ffffa13e7f61b780 RSI: 0000000001190101 RDI: ffffa13e3710f480 Nov 05 20:58:17 fir-md1-s1 kernel: RBP: ffffa11e343df510 R08: ffffa11e3f71b780 R09: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R10: ffffa11e3f71f140 R11: ffffda91b5cd5a00 R12: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R13: ffffa11e343df4b0 R14: ffffa11c4a34a698 R15: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: FS: 00007f3f2c767700(0000) GS:ffffa11e3f700000(0000) knlGS:0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:17 fir-md1-s1 kernel: CR2: 00007f3f2c7d8000 CR3: 000000364fa10000 CR4: 00000000003407e0 Nov 05 20:58:17 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:17 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:17 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? lprocfs_counter_sub+0xc1/0x130 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? dynlock_unlock+0x194/0x1e0 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __brelse+0x3d/0x50 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? iam_path_release+0x42/0x60 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_getblk+0x65/0x200 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_bread+0x27/0xc0 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_append+0x81/0x150 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_init_new_dir+0xcf/0x230 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __brelse+0x3d/0x50 Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_add_dot_dotdot+0x4e/0x90 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_add_dot_dotdot_internal.isra.76+0x5f/0x80 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_index_ea_insert+0xbaa/0x12f0 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lod_sub_insert+0x1c1/0x340 [lod] Nov 05 20:58:17 fir-md1-s1 kernel: [] lod_insert+0x24/0x30 [lod] Nov 05 20:58:17 fir-md1-s1 kernel: [] __mdd_index_insert_only+0x1cc/0x280 [mdd] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdd_create_object+0x6c8/0x820 [mdd] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdd_create+0xe31/0x14e0 [mdd] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_create+0xb54/0x1090 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? lprocfs_stats_lock+0x24/0xd0 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_reint_create+0x16b/0x360 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? mdt_thread_info_init+0xa4/0x1e0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: Code: 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 41 8b 40 08 <85> c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b 17 0f b7 c2 Nov 05 20:58:17 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#18 stuck for 22s! [ldlm_cn02_016:57179] Nov 05 20:58:17 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:17 fir-md1-s1 kernel: CPU: 18 PID: 57179 Comm: ldlm_cn02_016 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:17 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:17 fir-md1-s1 kernel: task: ffffa12ace242080 ti: ffffa1081faf8000 task.ti: ffffa1081faf8000 Nov 05 20:58:17 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 Nov 05 20:58:17 fir-md1-s1 kernel: RSP: 0018:ffffa1081fafb880 EFLAGS: 00000246 Nov 05 20:58:17 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa11c4a30dad8 RCX: 0000000000910000 Nov 05 20:58:17 fir-md1-s1 kernel: RDX: ffffa11e3f71b780 RSI: 0000000000890101 RDI: ffffa13e3710f480 Nov 05 20:58:17 fir-md1-s1 kernel: RBP: ffffa1081fafb880 R08: ffffa12e3f71b780 R09: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R10: ffffa12e3f71f0c0 R11: ffffda91d6ed6e80 R12: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R13: ffffffffc13da26c R14: ffffa1081fafb810 R15: ffffa1081fafb8c0 Nov 05 20:58:17 fir-md1-s1 kernel: FS: 00007f94d47aa880(0000) GS:ffffa12e3f700000(0000) knlGS:0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:17 fir-md1-s1 kernel: CR2: 0000000000412480 CR3: 000000202a620000 CR4: 00000000003407e0 Nov 05 20:58:17 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:17 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:17 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_es_lru_del+0x32/0x70 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_clear_inode+0x41/0x90 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_free_inode+0x10b/0x610 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_iloc_dirty+0x68/0x80 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_evict_inode+0x472/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_inode_dirty+0x6f/0x210 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_evict_inode+0x57d/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] evict+0xb4/0x180 Nov 05 20:58:17 fir-md1-s1 kernel: [] iput+0xfc/0x190 Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_object_delete+0x1d2/0x330 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_free.isra.32+0x68/0x170 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? cfs_hash_bd_from_key+0x38/0xb0 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_put+0xc5/0x3d0 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? mdt_punch_hpreq_fini+0x10/0x10 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_dom_discard_cp_ast+0xb1/0x2b0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_work_cp_ast_lock+0xa8/0x1d0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x72/0x790 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? kmem_cache_alloc_node_trace+0x11d/0x210 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? lprocfs_counter_add+0xf9/0x160 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldlm_work_gl_ast_lock+0x3a0/0x3a0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_prep_set+0xd2/0x280 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] __ldlm_reprocess_all+0x11f/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_reprocess_all+0x13/0x20 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_request_cancel+0x42f/0x780 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_handle_cancel+0x232/0x2b0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_cancel_handler+0x158/0x590 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __schedule+0x42a/0x860 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b Nov 05 20:58:17 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#19 stuck for 22s! [ldlm_cn03_016:60048] Nov 05 20:58:17 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:17 fir-md1-s1 kernel: CPU: 19 PID: 60048 Comm: ldlm_cn03_016 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:17 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:17 fir-md1-s1 kernel: task: ffffa124c1344100 ti: ffffa121df2c4000 task.ti: ffffa121df2c4000 Nov 05 20:58:17 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 Nov 05 20:58:17 fir-md1-s1 kernel: RSP: 0018:ffffa121df2c7880 EFLAGS: 00000246 Nov 05 20:58:17 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa1257bb17828 RCX: 0000000000990000 Nov 05 20:58:17 fir-md1-s1 kernel: RDX: ffffa11e3f8db780 RSI: 0000000001690101 RDI: ffffa13e3710f480 Nov 05 20:58:17 fir-md1-s1 kernel: RBP: ffffa121df2c7880 R08: ffffa13e7f51b780 R09: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R10: ffffa13e7f51f0c0 R11: ffffda920af313c0 R12: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R13: ffffffffc13da26c R14: ffffa121df2c7810 R15: ffffa121df2c78c0 Nov 05 20:58:17 fir-md1-s1 kernel: FS: 00007f63eb995740(0000) GS:ffffa13e7f500000(0000) knlGS:0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:17 fir-md1-s1 kernel: CR2: 00000000006e6bd4 CR3: 000000364fa10000 CR4: 00000000003407e0 Nov 05 20:58:17 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:17 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:17 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_es_lru_del+0x32/0x70 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_clear_inode+0x41/0x90 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_free_inode+0x10b/0x610 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_iloc_dirty+0x68/0x80 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_evict_inode+0x472/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_inode_dirty+0x6f/0x210 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_evict_inode+0x57d/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] evict+0xb4/0x180 Nov 05 20:58:17 fir-md1-s1 kernel: [] iput+0xfc/0x190 Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_object_delete+0x1d2/0x330 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_free.isra.32+0x68/0x170 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? cfs_hash_bd_from_key+0x38/0xb0 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_put+0xc5/0x3d0 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? mdt_punch_hpreq_fini+0x10/0x10 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_dom_discard_cp_ast+0xb1/0x2b0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_work_cp_ast_lock+0xa8/0x1d0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x72/0x790 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldlm_run_ast_work+0x38/0x3a0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? kmem_cache_alloc_node_trace+0x11d/0x210 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? lprocfs_counter_add+0xf9/0x160 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldlm_work_gl_ast_lock+0x3a0/0x3a0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_prep_set+0xd2/0x280 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] __ldlm_reprocess_all+0x11f/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_reprocess_all+0x13/0x20 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_request_cancel+0x42f/0x780 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_handle_cancel+0x232/0x2b0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_cancel_handler+0x158/0x590 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b Nov 05 20:58:17 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#20 stuck for 22s! [ldlm_cn00_001:21578] Nov 05 20:58:17 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:17 fir-md1-s1 kernel: CPU: 20 PID: 21578 Comm: ldlm_cn00_001 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:17 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:17 fir-md1-s1 kernel: task: ffffa12e16016180 ti: ffffa12e01e04000 task.ti: ffffa12e01e04000 Nov 05 20:58:17 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x126/0x200 Nov 05 20:58:17 fir-md1-s1 kernel: RSP: 0018:ffffa12e01e07880 EFLAGS: 00000246 Nov 05 20:58:17 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa1163623b0f8 RCX: 0000000000a10000 Nov 05 20:58:17 fir-md1-s1 kernel: RDX: ffffa11e3f81b780 RSI: 0000000001090101 RDI: ffffa13e3710f480 Nov 05 20:58:17 fir-md1-s1 kernel: RBP: ffffa12e01e07880 R08: ffffa10e3ef5b780 R09: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R10: ffffa10e3ef5f0c0 R11: ffffda91a87a73c0 R12: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R13: ffffffffc13da26c R14: ffffa12e01e07810 R15: ffffa12e01e078c0 Nov 05 20:58:17 fir-md1-s1 kernel: FS: 00007f5587c09880(0000) GS:ffffa10e3ef40000(0000) knlGS:0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:17 fir-md1-s1 kernel: CR2: 00007f38cf67e000 CR3: 000000101efc0000 CR4: 00000000003407e0 Nov 05 20:58:17 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:17 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:17 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_es_lru_del+0x32/0x70 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_clear_inode+0x41/0x90 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_free_inode+0x10b/0x610 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_iloc_dirty+0x68/0x80 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_evict_inode+0x472/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_inode_dirty+0x6f/0x210 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_evict_inode+0x57d/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] evict+0xb4/0x180 Nov 05 20:58:17 fir-md1-s1 kernel: [] iput+0xfc/0x190 Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_object_delete+0x1d2/0x330 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_free.isra.32+0x68/0x170 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? cfs_hash_bd_from_key+0x38/0xb0 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_put+0xc5/0x3d0 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? mdt_punch_hpreq_fini+0x10/0x10 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_dom_discard_cp_ast+0xb1/0x2b0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_work_cp_ast_lock+0xa8/0x1d0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x72/0x790 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? kmem_cache_alloc_node_trace+0x11d/0x210 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? lprocfs_counter_add+0xf9/0x160 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldlm_work_gl_ast_lock+0x3a0/0x3a0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_prep_set+0xd2/0x280 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] __ldlm_reprocess_all+0x11f/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_reprocess_all+0x13/0x20 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_request_cancel+0x42f/0x780 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_handle_cancel+0x232/0x2b0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_cancel_handler+0x158/0x590 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __schedule+0x42a/0x860 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: Code: 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 41 8b 40 08 <85> c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b 17 0f b7 c2 Nov 05 20:58:17 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#21 stuck for 22s! [ldlm_cn01_004:22624] Nov 05 20:58:17 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:17 fir-md1-s1 kernel: CPU: 21 PID: 22624 Comm: ldlm_cn01_004 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:17 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:17 fir-md1-s1 kernel: task: ffffa12ddfc1e180 ti: ffffa12e36a18000 task.ti: ffffa12e36a18000 Nov 05 20:58:17 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 Nov 05 20:58:17 fir-md1-s1 kernel: RSP: 0018:ffffa12e36a1b880 EFLAGS: 00000246 Nov 05 20:58:17 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa11c4a2af828 RCX: 0000000000a90000 Nov 05 20:58:17 fir-md1-s1 kernel: RDX: ffffa10e3ef5b780 RSI: 0000000000a10101 RDI: ffffa13e3710f480 Nov 05 20:58:17 fir-md1-s1 kernel: RBP: ffffa12e36a1b880 R08: ffffa11e3f75b780 R09: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R10: ffffa11e3f75f0c0 R11: ffffda9196c8ff00 R12: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R13: ffffffffc13da26c R14: ffffa12e36a1b810 R15: ffffa12e36a1b8c0 Nov 05 20:58:17 fir-md1-s1 kernel: FS: 00007f0d40dc7700(0000) GS:ffffa11e3f740000(0000) knlGS:0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:17 fir-md1-s1 kernel: CR2: 00007f92f9f5f000 CR3: 000000402e2d0000 CR4: 00000000003407e0 Nov 05 20:58:17 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:17 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:17 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_es_lru_del+0x32/0x70 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_clear_inode+0x41/0x90 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_free_inode+0x10b/0x610 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_iloc_dirty+0x68/0x80 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_evict_inode+0x472/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_inode_dirty+0x6f/0x210 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_evict_inode+0x57d/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] evict+0xb4/0x180 Nov 05 20:58:17 fir-md1-s1 kernel: [] iput+0xfc/0x190 Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_object_delete+0x1d2/0x330 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_free.isra.32+0x68/0x170 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? cfs_hash_bd_from_key+0x38/0xb0 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_put+0xc5/0x3d0 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? mdt_punch_hpreq_fini+0x10/0x10 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_dom_discard_cp_ast+0xb1/0x2b0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_work_cp_ast_lock+0xa8/0x1d0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x72/0x790 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldlm_run_ast_work+0x38/0x3a0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? kmem_cache_alloc_node_trace+0x11d/0x210 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? lprocfs_counter_add+0xf9/0x160 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldlm_work_gl_ast_lock+0x3a0/0x3a0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_prep_set+0xd2/0x280 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] __ldlm_reprocess_all+0x11f/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_reprocess_all+0x13/0x20 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_request_cancel+0x42f/0x780 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_handle_cancel+0x232/0x2b0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_cancel_handler+0x158/0x590 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? wake_up_state+0x20/0x20 Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __schedule+0x42a/0x860 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b Nov 05 20:58:17 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#22 stuck for 22s! [mdt02_029:41121] Nov 05 20:58:17 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:17 fir-md1-s1 kernel: CPU: 22 PID: 41121 Comm: mdt02_029 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:17 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:17 fir-md1-s1 kernel: task: ffffa11e302e9040 ti: ffffa11e39f40000 task.ti: ffffa11e39f40000 Nov 05 20:58:17 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x1d0/0x200 Nov 05 20:58:17 fir-md1-s1 kernel: RSP: 0018:ffffa11e39f43930 EFLAGS: 00000202 Nov 05 20:58:17 fir-md1-s1 kernel: RAX: 0000000000410101 RBX: ffffa116363a2898 RCX: 0000000000000001 Nov 05 20:58:17 fir-md1-s1 kernel: RDX: 0000000000000101 RSI: 0000000000000001 RDI: ffffa13e3710f480 Nov 05 20:58:17 fir-md1-s1 kernel: RBP: ffffa11e39f43930 R08: 0000000000000101 R09: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R10: ffffa12e3f75f0c0 R11: ffffda91d4977d80 R12: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R13: ffffffffc13da26c R14: ffffa11e39f438c0 R15: ffffa11e39f43970 Nov 05 20:58:17 fir-md1-s1 kernel: FS: 00007fc8705cf740(0000) GS:ffffa12e3f740000(0000) knlGS:0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:17 fir-md1-s1 kernel: CR2: 00007fe17d741024 CR3: 0000004029492000 CR4: 00000000003407e0 Nov 05 20:58:17 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:17 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:17 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_es_lru_del+0x32/0x70 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_clear_inode+0x41/0x90 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_free_inode+0x10b/0x610 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_iloc_dirty+0x68/0x80 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_evict_inode+0x472/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_inode_dirty+0x6f/0x210 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_evict_inode+0x57d/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] evict+0xb4/0x180 Nov 05 20:58:17 fir-md1-s1 kernel: [] iput+0xfc/0x190 Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_object_delete+0x1d2/0x330 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_free.isra.32+0x68/0x170 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? cfs_hash_bd_from_key+0x38/0xb0 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_put+0xc5/0x3d0 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_reint_unlink+0x813/0x14b0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? mdt_thread_info_init+0xa4/0x1e0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: Code: fe 00 01 00 00 74 f4 e9 93 fe ff ff 0f 1f 80 00 00 00 00 83 fa 01 75 11 0f 1f 00 e9 68 fe ff ff 0f 1f 00 85 c0 74 0c f3 90 8b 07 <0f> b6 c0 83 f8 03 75 f0 b8 01 00 00 00 66 89 07 5d c3 66 0f 1f Nov 05 20:58:17 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#23 stuck for 22s! [mdt_io03_040:41746] Nov 05 20:58:17 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:17 fir-md1-s1 kernel: CPU: 23 PID: 41746 Comm: mdt_io03_040 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:17 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:17 fir-md1-s1 kernel: task: ffffa11e282c8000 ti: ffffa11e282b8000 task.ti: ffffa11e282b8000 Nov 05 20:58:17 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 Nov 05 20:58:17 fir-md1-s1 kernel: RSP: 0018:ffffa11e282bb800 EFLAGS: 00000246 Nov 05 20:58:17 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa130a6142d60 RCX: 0000000000b90000 Nov 05 20:58:17 fir-md1-s1 kernel: RDX: ffffa12e3f79b780 RSI: 0000000000d10101 RDI: ffffa13e3710f480 Nov 05 20:58:17 fir-md1-s1 kernel: RBP: ffffa11e282bb800 R08: ffffa13e7f55b780 R09: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R10: ffffa13e7f55f140 R11: ffffda920af4e800 R12: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R13: ffffa11e282bb7a0 R14: ffffa130a6142ac8 R15: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: FS: 00007f28fc018840(0000) GS:ffffa13e7f540000(0000) knlGS:0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:17 fir-md1-s1 kernel: CR2: 00007fff565d8ec8 CR3: 000000364fa10000 CR4: 00000000003407e0 Nov 05 20:58:17 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:17 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:17 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get+0x52/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? kiblnd_post_tx_locked+0x7bb/0xa50 [ko2iblnd] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? cfs_hash_bd_lookup_intent+0x63/0x170 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_ldiskfs_map_inode_pages+0x143/0x420 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_write_prep+0x2b6/0x360 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] mdt_obd_preprw+0x65b/0x10a0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] tgt_brw_write+0xc7c/0x1cf0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? update_curr+0x14c/0x1e0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __enqueue_entity+0x78/0x80 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? enqueue_entity+0x2ef/0xbe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? tgt_lookup_reply+0x2d/0x190 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:17 fir-md1-s1 kernel: Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b Nov 05 20:58:17 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#24 stuck for 22s! [ldlm_cn00_019:53024] Nov 05 20:58:17 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:17 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:17 fir-md1-s1 kernel: CPU: 24 PID: 53024 Comm: ldlm_cn00_019 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:17 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:17 fir-md1-s1 kernel: task: ffffa12d55e3e180 ti: ffffa1392db80000 task.ti: ffffa1392db80000 Nov 05 20:58:17 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 Nov 05 20:58:17 fir-md1-s1 kernel: RSP: 0018:ffffa1392db83880 EFLAGS: 00000246 Nov 05 20:58:17 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa11c4a0b93a8 RCX: 0000000000c10000 Nov 05 20:58:17 fir-md1-s1 kernel: RDX: ffffa13e7f5db780 RSI: 0000000000f90101 RDI: ffffa13e3710f480 Nov 05 20:58:17 fir-md1-s1 kernel: RBP: ffffa1392db83880 R08: ffffa10e3ef9b780 R09: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R10: ffffa10e3ef9f0c0 R11: ffffda9184666b00 R12: 0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: R13: ffffffffc13da26c R14: ffffa1392db83810 R15: ffffa1392db838c0 Nov 05 20:58:17 fir-md1-s1 kernel: FS: 00007f92f9f50900(0000) GS:ffffa10e3ef80000(0000) knlGS:0000000000000000 Nov 05 20:58:17 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:17 fir-md1-s1 kernel: CR2: 00007f92f9f5f000 CR3: 0000002024620000 CR4: 00000000003407e0 Nov 05 20:58:17 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:17 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:17 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_es_lru_del+0x32/0x70 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_clear_inode+0x41/0x90 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_free_inode+0x10b/0x610 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_iloc_dirty+0x68/0x80 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_evict_inode+0x472/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldiskfs_mark_inode_dirty+0x6f/0x210 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldiskfs_evict_inode+0x57d/0x630 [ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] evict+0xb4/0x180 Nov 05 20:58:17 fir-md1-s1 kernel: [] iput+0xfc/0x190 Nov 05 20:58:17 fir-md1-s1 kernel: [] osd_object_delete+0x1d2/0x330 [osd_ldiskfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_free.isra.32+0x68/0x170 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? cfs_hash_bd_from_key+0x38/0xb0 [libcfs] Nov 05 20:58:17 fir-md1-s1 kernel: [] lu_object_put+0xc5/0x3d0 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? mdt_punch_hpreq_fini+0x10/0x10 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_dom_discard_cp_ast+0xb1/0x2b0 [mdt] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_work_cp_ast_lock+0xa8/0x1d0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x72/0x790 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? kmem_cache_alloc_node_trace+0x11d/0x210 Nov 05 20:58:17 fir-md1-s1 kernel: [] ? lprocfs_counter_add+0xf9/0x160 [obdclass] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ldlm_work_gl_ast_lock+0x3a0/0x3a0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ? ptlrpc_prep_set+0xd2/0x280 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] Nov 05 20:58:17 fir-md1-s1 kernel: [] __ldlm_reprocess_all+0x11f/0x360 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldlm_reprocess_all+0x13/0x20 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldlm_request_cancel+0x42f/0x780 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldlm_handle_cancel+0x232/0x2b0 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldlm_cancel_handler+0x158/0x590 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:18 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? __schedule+0x42a/0x860 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:18 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:18 fir-md1-s1 kernel: Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b Nov 05 20:58:18 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#25 stuck for 22s! [mdt01_035:41154] Nov 05 20:58:18 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:18 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:18 fir-md1-s1 kernel: CPU: 25 PID: 41154 Comm: mdt01_035 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:18 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:18 fir-md1-s1 kernel: task: ffffa11e28699040 ti: ffffa119ede9c000 task.ti: ffffa119ede9c000 Nov 05 20:58:18 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 Nov 05 20:58:18 fir-md1-s1 kernel: RSP: 0018:ffffa119ede9f930 EFLAGS: 00000246 Nov 05 20:58:18 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa11c4a09c5e8 RCX: 0000000000c90000 Nov 05 20:58:18 fir-md1-s1 kernel: RDX: ffffa12e3f6db780 RSI: 0000000000710101 RDI: ffffa13e3710f480 Nov 05 20:58:18 fir-md1-s1 kernel: RBP: ffffa119ede9f930 R08: ffffa11e3f79b780 R09: 0000000000000000 Nov 05 20:58:18 fir-md1-s1 kernel: R10: ffffa11e3f79f0c0 R11: ffffda919b6f8a80 R12: 0000000000000000 Nov 05 20:58:18 fir-md1-s1 kernel: R13: ffffffffc13da26c R14: ffffa119ede9f8c0 R15: ffffa119ede9f970 Nov 05 20:58:18 fir-md1-s1 kernel: FS: 00007f3f2c767700(0000) GS:ffffa11e3f780000(0000) knlGS:0000000000000000 Nov 05 20:58:18 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:18 fir-md1-s1 kernel: CR2: 00007f3f2c7d8000 CR3: 000000364fa10000 CR4: 00000000003407e0 Nov 05 20:58:18 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:18 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:18 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_es_lru_del+0x32/0x70 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_clear_inode+0x41/0x90 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_free_inode+0x10b/0x610 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ldiskfs_mark_iloc_dirty+0x68/0x80 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ldiskfs_evict_inode+0x472/0x630 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ldiskfs_mark_inode_dirty+0x6f/0x210 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_evict_inode+0x57d/0x630 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] evict+0xb4/0x180 Nov 05 20:58:18 fir-md1-s1 kernel: [] iput+0xfc/0x190 Nov 05 20:58:18 fir-md1-s1 kernel: [] osd_object_delete+0x1d2/0x330 [osd_ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] lu_object_free.isra.32+0x68/0x170 [obdclass] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? cfs_hash_bd_from_key+0x38/0xb0 [libcfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] lu_object_put+0xc5/0x3d0 [obdclass] Nov 05 20:58:18 fir-md1-s1 kernel: [] mdt_reint_unlink+0x813/0x14b0 [mdt] Nov 05 20:58:18 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Nov 05 20:58:18 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? mdt_thread_info_init+0xa4/0x1e0 [mdt] Nov 05 20:58:18 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Nov 05 20:58:18 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:18 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:18 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:18 fir-md1-s1 kernel: Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b Nov 05 20:58:18 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#26 stuck for 22s! [mdt_io02_020:41632] Nov 05 20:58:18 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:18 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:18 fir-md1-s1 kernel: CPU: 26 PID: 41632 Comm: mdt_io02_020 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:18 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:18 fir-md1-s1 kernel: task: ffffa11e344a30c0 ti: ffffa13e32768000 task.ti: ffffa13e32768000 Nov 05 20:58:18 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 Nov 05 20:58:18 fir-md1-s1 kernel: RSP: 0018:ffffa13e3276b800 EFLAGS: 00000246 Nov 05 20:58:18 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa114cf21c680 RCX: 0000000000d10000 Nov 05 20:58:18 fir-md1-s1 kernel: RDX: ffffa12e3f89b780 RSI: 0000000001510101 RDI: ffffa13e3710f480 Nov 05 20:58:18 fir-md1-s1 kernel: RBP: ffffa13e3276b800 R08: ffffa12e3f79b780 R09: 0000000000000000 Nov 05 20:58:18 fir-md1-s1 kernel: R10: ffffa12e3f79f140 R11: ffffda91de5c0e00 R12: 0000000000000000 Nov 05 20:58:18 fir-md1-s1 kernel: R13: ffffa13e3276b7a0 R14: ffffa114cf21c3e8 R15: 0000000000000000 Nov 05 20:58:18 fir-md1-s1 kernel: FS: 00007f30a908b700(0000) GS:ffffa12e3f780000(0000) knlGS:0000000000000000 Nov 05 20:58:18 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:18 fir-md1-s1 kernel: CR2: 00007f12d03b0be8 CR3: 0000004025876000 CR4: 00000000003407e0 Nov 05 20:58:18 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:18 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:18 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ktime_get+0x52/0xe0 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? kiblnd_post_tx_locked+0x7bb/0xa50 [ko2iblnd] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? cfs_hash_bd_lookup_intent+0x63/0x170 [libcfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 Nov 05 20:58:18 fir-md1-s1 kernel: [] osd_ldiskfs_map_inode_pages+0x143/0x420 [osd_ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] osd_write_prep+0x2b6/0x360 [osd_ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] mdt_obd_preprw+0x65b/0x10a0 [mdt] Nov 05 20:58:18 fir-md1-s1 kernel: [] tgt_brw_write+0xc7c/0x1cf0 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? lustre_msg_buf_v2+0x1e0/0x1e0 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? update_curr+0x14c/0x1e0 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? __enqueue_entity+0x78/0x80 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? enqueue_entity+0x2ef/0xbe0 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? tgt_lookup_reply+0x2d/0x190 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:18 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:18 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:18 fir-md1-s1 kernel: Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b Nov 05 20:58:18 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#27 stuck for 22s! [ldlm_cn03_020:61203] Nov 05 20:58:18 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:18 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:18 fir-md1-s1 kernel: CPU: 27 PID: 61203 Comm: ldlm_cn03_020 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:18 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:18 fir-md1-s1 kernel: task: ffffa10e1cc41040 ti: ffffa10befa90000 task.ti: ffffa10befa90000 Nov 05 20:58:18 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x126/0x200 Nov 05 20:58:18 fir-md1-s1 kernel: RSP: 0018:ffffa10befa93880 EFLAGS: 00000246 Nov 05 20:58:18 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa129c7f16338 RCX: 0000000000d90000 Nov 05 20:58:18 fir-md1-s1 kernel: RDX: ffffa13e7f69b780 RSI: 0000000001590101 RDI: ffffa13e3710f480 Nov 05 20:58:18 fir-md1-s1 kernel: RBP: ffffa10befa93880 R08: ffffa13e7f59b780 R09: 0000000000000000 Nov 05 20:58:18 fir-md1-s1 kernel: R10: ffffa13e7f59f0c0 R11: ffffda920b34cc00 R12: 0000000000000000 Nov 05 20:58:18 fir-md1-s1 kernel: R13: ffffffffc13da26c R14: ffffa10befa93810 R15: ffffa10befa938c0 Nov 05 20:58:18 fir-md1-s1 kernel: FS: 00007fe66ee55740(0000) GS:ffffa13e7f580000(0000) knlGS:0000000000000000 Nov 05 20:58:18 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:18 fir-md1-s1 kernel: CR2: 00007fe66ea401cc CR3: 000000364fa10000 CR4: 00000000003407e0 Nov 05 20:58:18 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:18 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:18 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_es_lru_del+0x32/0x70 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_clear_inode+0x41/0x90 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_free_inode+0x10b/0x610 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ldiskfs_mark_iloc_dirty+0x68/0x80 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ldiskfs_evict_inode+0x472/0x630 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ldiskfs_mark_inode_dirty+0x6f/0x210 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_evict_inode+0x57d/0x630 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] evict+0xb4/0x180 Nov 05 20:58:18 fir-md1-s1 kernel: [] iput+0xfc/0x190 Nov 05 20:58:18 fir-md1-s1 kernel: [] osd_object_delete+0x1d2/0x330 [osd_ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] lu_object_free.isra.32+0x68/0x170 [obdclass] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? cfs_hash_bd_from_key+0x38/0xb0 [libcfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] lu_object_put+0xc5/0x3d0 [obdclass] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? mdt_punch_hpreq_fini+0x10/0x10 [mdt] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldlm_dom_discard_cp_ast+0xb1/0x2b0 [mdt] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldlm_work_cp_ast_lock+0xa8/0x1d0 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x72/0x790 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? kmem_cache_alloc_node_trace+0x11d/0x210 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? lprocfs_counter_add+0xf9/0x160 [obdclass] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ldlm_work_gl_ast_lock+0x3a0/0x3a0 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ptlrpc_prep_set+0xd2/0x280 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] __ldlm_reprocess_all+0x11f/0x360 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldlm_reprocess_all+0x13/0x20 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldlm_request_cancel+0x42f/0x780 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldlm_handle_cancel+0x232/0x2b0 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldlm_cancel_handler+0x158/0x590 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:18 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? __schedule+0x42a/0x860 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:18 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:18 fir-md1-s1 kernel: Code: 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 41 8b 40 08 <85> c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b 17 0f b7 c2 Nov 05 20:58:18 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#28 stuck for 22s! [mdt_io00_010:41568] Nov 05 20:58:18 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:18 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:18 fir-md1-s1 kernel: CPU: 28 PID: 41568 Comm: mdt_io00_010 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:18 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:18 fir-md1-s1 kernel: task: ffffa12d807b2080 ti: ffffa122a83c8000 task.ti: ffffa122a83c8000 Nov 05 20:58:18 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x126/0x200 Nov 05 20:58:18 fir-md1-s1 kernel: RSP: 0018:ffffa122a83cb800 EFLAGS: 00000246 Nov 05 20:58:18 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa10b9ff1d740 RCX: 0000000000e10000 Nov 05 20:58:18 fir-md1-s1 kernel: RDX: ffffa12e3f65b780 RSI: 0000000000310101 RDI: ffffa13e3710f480 Nov 05 20:58:18 fir-md1-s1 kernel: RBP: ffffa122a83cb800 R08: ffffa10e3efdb780 R09: 0000000000000000 Nov 05 20:58:18 fir-md1-s1 kernel: R10: ffffa10e3efdf140 R11: ffffda9161adfa00 R12: 0000000000000000 Nov 05 20:58:18 fir-md1-s1 kernel: R13: ffffa122a83cb7a0 R14: ffffa10b9ff1d4a8 R15: 0000000000000000 Nov 05 20:58:18 fir-md1-s1 kernel: FS: 00007f3f2c704700(0000) GS:ffffa10e3efc0000(0000) knlGS:0000000000000000 Nov 05 20:58:18 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:18 fir-md1-s1 kernel: CR2: 00007f3f2c7d8000 CR3: 000000202120e000 CR4: 00000000003407e0 Nov 05 20:58:18 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:18 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:18 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ktime_get+0x52/0xe0 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? kiblnd_post_tx_locked+0x7bb/0xa50 [ko2iblnd] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? cfs_hash_bd_lookup_intent+0x63/0x170 [libcfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 Nov 05 20:58:18 fir-md1-s1 kernel: [] osd_ldiskfs_map_inode_pages+0x143/0x420 [osd_ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] osd_write_prep+0x2b6/0x360 [osd_ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] mdt_obd_preprw+0x65b/0x10a0 [mdt] Nov 05 20:58:18 fir-md1-s1 kernel: [] tgt_brw_write+0xc7c/0x1cf0 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? lustre_msg_buf_v2+0x1e0/0x1e0 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? lustre_msg_buf+0x17/0x60 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? update_curr+0x14c/0x1e0 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? account_entity_dequeue+0xae/0xd0 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? dequeue_entity+0x11c/0x5e0 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? cpumask_next_and+0x35/0x50 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? tgt_lookup_reply+0x2d/0x190 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:18 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? __schedule+0x42a/0x860 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:18 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:18 fir-md1-s1 kernel: Code: 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 41 8b 40 08 <85> c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b 17 0f b7 c2 Nov 05 20:58:18 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#29 stuck for 22s! [ldlm_cn01_031:61424] Nov 05 20:58:18 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:18 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:18 fir-md1-s1 kernel: CPU: 29 PID: 61424 Comm: ldlm_cn01_031 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:18 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:18 fir-md1-s1 kernel: task: ffffa11e353da080 ti: ffffa12d26e28000 task.ti: ffffa12d26e28000 Nov 05 20:58:18 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x126/0x200 Nov 05 20:58:18 fir-md1-s1 kernel: RSP: 0018:ffffa12d26e2b880 EFLAGS: 00000246 Nov 05 20:58:18 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa1122d5993a8 RCX: 0000000000e90000 Nov 05 20:58:18 fir-md1-s1 kernel: RDX: ffffa11e3f6db780 RSI: 0000000000690101 RDI: ffffa13e3710f480 Nov 05 20:58:18 fir-md1-s1 kernel: RBP: ffffa12d26e2b880 R08: ffffa11e3f7db780 R09: 0000000000000000 Nov 05 20:58:18 fir-md1-s1 kernel: R10: ffffa11e3f7df0c0 R11: ffffda91b69a8c80 R12: 0000000000000000 Nov 05 20:58:18 fir-md1-s1 kernel: R13: ffffffffc13da26c R14: ffffa12d26e2b810 R15: ffffa12d26e2b8c0 Nov 05 20:58:18 fir-md1-s1 kernel: FS: 00007f0d40dc7700(0000) GS:ffffa11e3f7c0000(0000) knlGS:0000000000000000 Nov 05 20:58:18 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:18 fir-md1-s1 kernel: CR2: 00007f6f9994f000 CR3: 000000402e2d0000 CR4: 00000000003407e0 Nov 05 20:58:18 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:18 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:18 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_es_lru_del+0x32/0x70 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_clear_inode+0x41/0x90 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_free_inode+0x10b/0x610 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ldiskfs_mark_iloc_dirty+0x68/0x80 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ldiskfs_evict_inode+0x472/0x630 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ldiskfs_mark_inode_dirty+0x6f/0x210 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_evict_inode+0x57d/0x630 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] evict+0xb4/0x180 Nov 05 20:58:18 fir-md1-s1 kernel: [] iput+0xfc/0x190 Nov 05 20:58:18 fir-md1-s1 kernel: [] osd_object_delete+0x1d2/0x330 [osd_ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] lu_object_free.isra.32+0x68/0x170 [obdclass] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? cfs_hash_bd_from_key+0x38/0xb0 [libcfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] lu_object_put+0xc5/0x3d0 [obdclass] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? mdt_punch_hpreq_fini+0x10/0x10 [mdt] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldlm_dom_discard_cp_ast+0xb1/0x2b0 [mdt] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldlm_work_cp_ast_lock+0xa8/0x1d0 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x72/0x790 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? kmem_cache_alloc_node_trace+0x11d/0x210 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? lprocfs_counter_add+0xf9/0x160 [obdclass] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ldlm_work_gl_ast_lock+0x3a0/0x3a0 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ptlrpc_prep_set+0xd2/0x280 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] __ldlm_reprocess_all+0x11f/0x360 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldlm_reprocess_all+0x13/0x20 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldlm_request_cancel+0x42f/0x780 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldlm_handle_cancel+0x232/0x2b0 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldlm_cancel_handler+0x158/0x590 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? wake_up_state+0x20/0x20 Nov 05 20:58:18 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? __schedule+0x42a/0x860 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:18 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:18 fir-md1-s1 kernel: Code: 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 41 8b 40 08 <85> c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b 17 0f b7 c2 Nov 05 20:58:18 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#30 stuck for 22s! [mdt02_039:41166] Nov 05 20:58:18 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:18 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:18 fir-md1-s1 kernel: CPU: 30 PID: 41166 Comm: mdt02_039 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:18 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:18 fir-md1-s1 kernel: task: ffffa11e2dcfe180 ti: ffffa11e2a110000 task.ti: ffffa11e2a110000 Nov 05 20:58:18 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 Nov 05 20:58:18 fir-md1-s1 kernel: RSP: 0018:ffffa11e2a113510 EFLAGS: 00000246 Nov 05 20:58:18 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa129c7f48be0 RCX: 0000000000f10000 Nov 05 20:58:18 fir-md1-s1 kernel: RDX: ffffa10e3ee5b780 RSI: 0000000000210101 RDI: ffffa13e3710f480 Nov 05 20:58:18 fir-md1-s1 kernel: RBP: ffffa11e2a113510 R08: ffffa12e3f7db780 R09: 0000000000000000 Nov 05 20:58:18 fir-md1-s1 kernel: R10: ffffa12e3f7df140 R11: ffffda91c3043a00 R12: 0000000000000000 Nov 05 20:58:18 fir-md1-s1 kernel: R13: ffffa11e2a1134b0 R14: ffffa129c7f48948 R15: 0000000000000000 Nov 05 20:58:18 fir-md1-s1 kernel: FS: 00007f7a69e73740(0000) GS:ffffa12e3f7c0000(0000) knlGS:0000000000000000 Nov 05 20:58:18 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:18 fir-md1-s1 kernel: CR2: 0000000000412480 CR3: 00000030172e8000 CR4: 00000000003407e0 Nov 05 20:58:18 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:18 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:18 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? lprocfs_counter_sub+0xc1/0x130 [obdclass] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? dynlock_unlock+0x194/0x1e0 [osd_ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? __brelse+0x3d/0x50 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? iam_path_release+0x42/0x60 [osd_ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_getblk+0x65/0x200 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_bread+0x27/0xc0 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_append+0x81/0x150 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_init_new_dir+0xcf/0x230 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? __brelse+0x3d/0x50 Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_add_dot_dotdot+0x4e/0x90 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] osd_add_dot_dotdot_internal.isra.76+0x5f/0x80 [osd_ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] osd_index_ea_insert+0xbaa/0x12f0 [osd_ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] lod_sub_insert+0x1c1/0x340 [lod] Nov 05 20:58:18 fir-md1-s1 kernel: [] lod_insert+0x24/0x30 [lod] Nov 05 20:58:18 fir-md1-s1 kernel: [] __mdd_index_insert_only+0x1cc/0x280 [mdd] Nov 05 20:58:18 fir-md1-s1 kernel: [] mdd_create_object+0x6c8/0x820 [mdd] Nov 05 20:58:18 fir-md1-s1 kernel: [] mdd_create+0xe31/0x14e0 [mdd] Nov 05 20:58:18 fir-md1-s1 kernel: [] mdt_create+0xb54/0x1090 [mdt] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? lprocfs_stats_lock+0x24/0xd0 [obdclass] Nov 05 20:58:18 fir-md1-s1 kernel: [] mdt_reint_create+0x16b/0x360 [mdt] Nov 05 20:58:18 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Nov 05 20:58:18 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? mdt_thread_info_init+0xa4/0x1e0 [mdt] Nov 05 20:58:18 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Nov 05 20:58:18 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:18 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? __schedule+0x42a/0x860 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:18 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:18 fir-md1-s1 kernel: Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b Nov 05 20:58:18 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#31 stuck for 22s! [ldlm_cn03_009:26820] Nov 05 20:58:18 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:18 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:18 fir-md1-s1 kernel: CPU: 31 PID: 26820 Comm: ldlm_cn03_009 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:18 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:18 fir-md1-s1 kernel: task: ffffa13e3a2db0c0 ti: ffffa13dc9ea8000 task.ti: ffffa13dc9ea8000 Nov 05 20:58:18 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 Nov 05 20:58:18 fir-md1-s1 kernel: RSP: 0018:ffffa13dc9eab880 EFLAGS: 00000246 Nov 05 20:58:18 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa130a12b2038 RCX: 0000000000f90000 Nov 05 20:58:18 fir-md1-s1 kernel: RDX: ffffa11e3f85b780 RSI: 0000000001290101 RDI: ffffa13e3710f480 Nov 05 20:58:18 fir-md1-s1 kernel: RBP: ffffa13dc9eab880 R08: ffffa13e7f5db780 R09: 0000000000000000 Nov 05 20:58:18 fir-md1-s1 kernel: R10: ffffa13e7f5df0c0 R11: ffffda920b623580 R12: 0000000000000000 Nov 05 20:58:18 fir-md1-s1 kernel: R13: ffffffffc13da26c R14: ffffa13dc9eab810 R15: ffffa13dc9eab8c0 Nov 05 20:58:18 fir-md1-s1 kernel: FS: 00007f3e14491740(0000) GS:ffffa13e7f5c0000(0000) knlGS:0000000000000000 Nov 05 20:58:18 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:18 fir-md1-s1 kernel: CR2: 00000000027d0fe8 CR3: 000000364fa10000 CR4: 00000000003407e0 Nov 05 20:58:18 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:18 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:18 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_es_lru_del+0x32/0x70 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_clear_inode+0x41/0x90 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_free_inode+0x10b/0x610 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ldiskfs_mark_iloc_dirty+0x68/0x80 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ldiskfs_evict_inode+0x472/0x630 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ldiskfs_mark_inode_dirty+0x6f/0x210 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_evict_inode+0x57d/0x630 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] evict+0xb4/0x180 Nov 05 20:58:18 fir-md1-s1 kernel: [] iput+0xfc/0x190 Nov 05 20:58:18 fir-md1-s1 kernel: [] osd_object_delete+0x1d2/0x330 [osd_ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] lu_object_free.isra.32+0x68/0x170 [obdclass] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? cfs_hash_bd_from_key+0x38/0xb0 [libcfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] lu_object_put+0xc5/0x3d0 [obdclass] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? mdt_punch_hpreq_fini+0x10/0x10 [mdt] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldlm_dom_discard_cp_ast+0xb1/0x2b0 [mdt] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldlm_work_cp_ast_lock+0xa8/0x1d0 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x72/0x790 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? kmem_cache_alloc_node_trace+0x11d/0x210 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? lprocfs_counter_add+0xf9/0x160 [obdclass] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ldlm_work_gl_ast_lock+0x3a0/0x3a0 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ptlrpc_prep_set+0xd2/0x280 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] __ldlm_reprocess_all+0x11f/0x360 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldlm_reprocess_all+0x13/0x20 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldlm_request_cancel+0x42f/0x780 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldlm_handle_cancel+0x232/0x2b0 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldlm_cancel_handler+0x158/0x590 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ptlrpc_server_handle_req_in+0x8df/0xd60 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? __schedule+0x42a/0x860 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:18 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:18 fir-md1-s1 kernel: Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b Nov 05 20:58:18 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#32 stuck for 22s! [mdt_io00_015:41626] Nov 05 20:58:18 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:18 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:18 fir-md1-s1 kernel: CPU: 32 PID: 41626 Comm: mdt_io00_015 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:18 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:18 fir-md1-s1 kernel: task: ffffa13705f09040 ti: ffffa13e33318000 task.ti: ffffa13e33318000 Nov 05 20:58:18 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x126/0x200 Nov 05 20:58:18 fir-md1-s1 kernel: RSP: 0018:ffffa13e3331b800 EFLAGS: 00000246 Nov 05 20:58:18 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa1179df1dfa0 RCX: 0000000001010000 Nov 05 20:58:18 fir-md1-s1 kernel: RDX: ffffa10e3efdb780 RSI: 0000000000e10101 RDI: ffffa13e3710f480 Nov 05 20:58:18 fir-md1-s1 kernel: RBP: ffffa13e3331b800 R08: ffffa10e3f01b780 R09: 0000000000000000 Nov 05 20:58:18 fir-md1-s1 kernel: R10: ffffa10e3f01f140 R11: ffffda9161b7e400 R12: 0000000000000000 Nov 05 20:58:18 fir-md1-s1 kernel: R13: ffffa13e3331b7a0 R14: ffffa1179df1dd08 R15: 0000000000000000 Nov 05 20:58:18 fir-md1-s1 kernel: FS: 00007fecb56f0740(0000) GS:ffffa10e3f000000(0000) knlGS:0000000000000000 Nov 05 20:58:18 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:18 fir-md1-s1 kernel: CR2: 00007fecb52da248 CR3: 000000364fa10000 CR4: 00000000003407e0 Nov 05 20:58:18 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:18 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:18 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ktime_get+0x52/0xe0 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? kiblnd_post_tx_locked+0x7bb/0xa50 [ko2iblnd] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? cfs_hash_bd_lookup_intent+0x63/0x170 [libcfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 Nov 05 20:58:18 fir-md1-s1 kernel: [] osd_ldiskfs_map_inode_pages+0x143/0x420 [osd_ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] osd_write_prep+0x2b6/0x360 [osd_ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] mdt_obd_preprw+0x65b/0x10a0 [mdt] Nov 05 20:58:18 fir-md1-s1 kernel: [] tgt_brw_write+0xc7c/0x1cf0 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? update_curr+0x14c/0x1e0 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? account_entity_dequeue+0xae/0xd0 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? dequeue_entity+0x11c/0x5e0 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? tgt_lookup_reply+0x2d/0x190 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:18 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:18 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:18 fir-md1-s1 kernel: Code: 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 41 8b 40 08 <85> c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b 17 0f b7 c2 Nov 05 20:58:18 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#33 stuck for 22s! [ldlm_cn01_029:61422] Nov 05 20:58:18 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:18 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:18 fir-md1-s1 kernel: CPU: 33 PID: 61422 Comm: ldlm_cn01_029 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:18 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:18 fir-md1-s1 kernel: task: ffffa11e353dc100 ti: ffffa122caae8000 task.ti: ffffa122caae8000 Nov 05 20:58:18 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x126/0x200 Nov 05 20:58:18 fir-md1-s1 kernel: RSP: 0018:ffffa122caaeb880 EFLAGS: 00000246 Nov 05 20:58:18 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa1122d4241b8 RCX: 0000000001090000 Nov 05 20:58:18 fir-md1-s1 kernel: RDX: ffffa12e3f71b780 RSI: 0000000000910101 RDI: ffffa13e3710f480 Nov 05 20:58:18 fir-md1-s1 kernel: RBP: ffffa122caaeb880 R08: ffffa11e3f81b780 R09: 0000000000000000 Nov 05 20:58:18 fir-md1-s1 kernel: R10: ffffa11e3f81f0c0 R11: ffffda9184296d00 R12: 0000000000000000 Nov 05 20:58:18 fir-md1-s1 kernel: R13: ffffffffc13da26c R14: ffffa122caaeb810 R15: ffffa122caaeb8c0 Nov 05 20:58:18 fir-md1-s1 kernel: FS: 00007f3f2c767700(0000) GS:ffffa11e3f800000(0000) knlGS:0000000000000000 Nov 05 20:58:18 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:18 fir-md1-s1 kernel: CR2: 00007f3f2c7d8000 CR3: 000000364fa10000 CR4: 00000000003407e0 Nov 05 20:58:18 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:18 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:18 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_es_lru_del+0x32/0x70 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_clear_inode+0x41/0x90 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_free_inode+0x10b/0x610 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ldiskfs_mark_iloc_dirty+0x68/0x80 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ldiskfs_evict_inode+0x472/0x630 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ldiskfs_mark_inode_dirty+0x6f/0x210 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_evict_inode+0x57d/0x630 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] evict+0xb4/0x180 Nov 05 20:58:18 fir-md1-s1 kernel: [] iput+0xfc/0x190 Nov 05 20:58:18 fir-md1-s1 kernel: [] osd_object_delete+0x1d2/0x330 [osd_ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] lu_object_free.isra.32+0x68/0x170 [obdclass] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? cfs_hash_bd_from_key+0x38/0xb0 [libcfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] lu_object_put+0xc5/0x3d0 [obdclass] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? mdt_punch_hpreq_fini+0x10/0x10 [mdt] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldlm_dom_discard_cp_ast+0xb1/0x2b0 [mdt] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldlm_work_cp_ast_lock+0xa8/0x1d0 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x72/0x790 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? kmem_cache_alloc_node_trace+0x11d/0x210 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? lprocfs_counter_add+0xf9/0x160 [obdclass] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ldlm_work_gl_ast_lock+0x3a0/0x3a0 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ptlrpc_prep_set+0xd2/0x280 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] __ldlm_reprocess_all+0x11f/0x360 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldlm_reprocess_all+0x13/0x20 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldlm_request_cancel+0x42f/0x780 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldlm_handle_cancel+0x232/0x2b0 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldlm_cancel_handler+0x158/0x590 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:18 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? __schedule+0x42a/0x860 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:18 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:18 fir-md1-s1 kernel: Code: 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 41 8b 40 08 <85> c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b 17 0f b7 c2 Nov 05 20:58:18 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#34 stuck for 22s! [ldlm_cn02_011:44699] Nov 05 20:58:18 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:18 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:18 fir-md1-s1 kernel: CPU: 34 PID: 44699 Comm: ldlm_cn02_011 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:18 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:18 fir-md1-s1 kernel: task: ffffa13e35956180 ti: ffffa12ba4644000 task.ti: ffffa12ba4644000 Nov 05 20:58:18 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 Nov 05 20:58:18 fir-md1-s1 kernel: RSP: 0018:ffffa12ba4647880 EFLAGS: 00000246 Nov 05 20:58:18 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa129c7e87c58 RCX: 0000000001110000 Nov 05 20:58:18 fir-md1-s1 kernel: RDX: ffffa12e3f69b780 RSI: 0000000000510101 RDI: ffffa13e3710f480 Nov 05 20:58:18 fir-md1-s1 kernel: RBP: ffffa12ba4647880 R08: ffffa12e3f81b780 R09: 0000000000000000 Nov 05 20:58:18 fir-md1-s1 kernel: R10: ffffa12e3f81f0c0 R11: ffffda91fb48c140 R12: 0000000000000000 Nov 05 20:58:18 fir-md1-s1 kernel: R13: ffffffffc13da26c R14: ffffa12ba4647810 R15: ffffa12ba46478c0 Nov 05 20:58:18 fir-md1-s1 kernel: FS: 00007f74af776740(0000) GS:ffffa12e3f800000(0000) knlGS:0000000000000000 Nov 05 20:58:18 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:18 fir-md1-s1 kernel: CR2: 00007f74ae443330 CR3: 000000364fa10000 CR4: 00000000003407e0 Nov 05 20:58:18 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:18 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:18 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_es_lru_del+0x32/0x70 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_clear_inode+0x41/0x90 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_free_inode+0x10b/0x610 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ldiskfs_mark_iloc_dirty+0x68/0x80 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ldiskfs_evict_inode+0x472/0x630 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ldiskfs_mark_inode_dirty+0x6f/0x210 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_evict_inode+0x57d/0x630 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] evict+0xb4/0x180 Nov 05 20:58:18 fir-md1-s1 kernel: [] iput+0xfc/0x190 Nov 05 20:58:18 fir-md1-s1 kernel: [] osd_object_delete+0x1d2/0x330 [osd_ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] lu_object_free.isra.32+0x68/0x170 [obdclass] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? cfs_hash_bd_from_key+0x38/0xb0 [libcfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] lu_object_put+0xc5/0x3d0 [obdclass] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? mdt_punch_hpreq_fini+0x10/0x10 [mdt] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldlm_dom_discard_cp_ast+0xb1/0x2b0 [mdt] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldlm_work_cp_ast_lock+0xa8/0x1d0 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x72/0x790 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? kmem_cache_alloc_node_trace+0x11d/0x210 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? lprocfs_counter_add+0xf9/0x160 [obdclass] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ldlm_work_gl_ast_lock+0x3a0/0x3a0 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ptlrpc_prep_set+0xd2/0x280 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] __ldlm_reprocess_all+0x11f/0x360 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldlm_reprocess_all+0x13/0x20 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldlm_request_cancel+0x42f/0x780 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldlm_handle_cancel+0x232/0x2b0 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldlm_cancel_handler+0x158/0x590 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:18 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? __schedule+0x42a/0x860 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:18 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:18 fir-md1-s1 kernel: Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b Nov 05 20:58:18 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#35 stuck for 22s! [ldlm_cn03_000:21586] Nov 05 20:58:18 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:18 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:18 fir-md1-s1 kernel: CPU: 35 PID: 21586 Comm: ldlm_cn03_000 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:18 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:18 fir-md1-s1 kernel: task: ffffa12e149d9040 ti: ffffa12e01e38000 task.ti: ffffa12e01e38000 Nov 05 20:58:18 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x126/0x200 Nov 05 20:58:18 fir-md1-s1 kernel: RSP: 0018:ffffa12e01e3b880 EFLAGS: 00000246 Nov 05 20:58:18 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa130a13373f8 RCX: 0000000001190000 Nov 05 20:58:18 fir-md1-s1 kernel: RDX: ffffa11e3f69b780 RSI: 0000000000490101 RDI: ffffa13e3710f480 Nov 05 20:58:18 fir-md1-s1 kernel: RBP: ffffa12e01e3b880 R08: ffffa13e7f61b780 R09: 0000000000000000 Nov 05 20:58:18 fir-md1-s1 kernel: R10: ffffa13e7f61f0c0 R11: ffffda920b1686c0 R12: 0000000000000000 Nov 05 20:58:18 fir-md1-s1 kernel: R13: ffffffffc13da26c R14: ffffa12e01e3b810 R15: ffffa12e01e3b8c0 Nov 05 20:58:18 fir-md1-s1 kernel: FS: 00007f076ac83740(0000) GS:ffffa13e7f600000(0000) knlGS:0000000000000000 Nov 05 20:58:18 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:18 fir-md1-s1 kernel: CR2: 00007f076a506320 CR3: 000000364fa10000 CR4: 00000000003407e0 Nov 05 20:58:18 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:18 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:18 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_es_lru_del+0x32/0x70 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_clear_inode+0x41/0x90 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_free_inode+0x10b/0x610 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ldiskfs_mark_iloc_dirty+0x68/0x80 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ldiskfs_evict_inode+0x472/0x630 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ldiskfs_mark_inode_dirty+0x6f/0x210 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_evict_inode+0x57d/0x630 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] evict+0xb4/0x180 Nov 05 20:58:18 fir-md1-s1 kernel: [] iput+0xfc/0x190 Nov 05 20:58:18 fir-md1-s1 kernel: [] osd_object_delete+0x1d2/0x330 [osd_ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] lu_object_free.isra.32+0x68/0x170 [obdclass] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? cfs_hash_bd_from_key+0x38/0xb0 [libcfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] lu_object_put+0xc5/0x3d0 [obdclass] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? mdt_punch_hpreq_fini+0x10/0x10 [mdt] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldlm_dom_discard_cp_ast+0xb1/0x2b0 [mdt] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldlm_work_cp_ast_lock+0xa8/0x1d0 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x72/0x790 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? kmem_cache_alloc_node_trace+0x11d/0x210 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? lprocfs_counter_add+0xf9/0x160 [obdclass] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ldlm_work_gl_ast_lock+0x3a0/0x3a0 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ptlrpc_prep_set+0xd2/0x280 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] __ldlm_reprocess_all+0x11f/0x360 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldlm_reprocess_all+0x13/0x20 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldlm_request_cancel+0x42f/0x780 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldlm_handle_cancel+0x232/0x2b0 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldlm_cancel_handler+0x158/0x590 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:18 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? __schedule+0x42a/0x860 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:18 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:18 fir-md1-s1 kernel: Code: 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 41 8b 40 08 <85> c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b 17 0f b7 c2 Nov 05 20:58:18 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#36 stuck for 22s! [mdt_io00_032:41737] Nov 05 20:58:18 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:18 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:18 fir-md1-s1 kernel: CPU: 36 PID: 41737 Comm: mdt_io00_032 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:18 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:18 fir-md1-s1 kernel: task: ffffa11e2a1e2080 ti: ffffa13abe3dc000 task.ti: ffffa13abe3dc000 Nov 05 20:58:18 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 Nov 05 20:58:18 fir-md1-s1 kernel: RSP: 0018:ffffa13abe3df800 EFLAGS: 00000246 Nov 05 20:58:18 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa1179ddf1ca0 RCX: 0000000001210000 Nov 05 20:58:18 fir-md1-s1 kernel: RDX: ffffa10e3ee1b780 RSI: 0000000000010101 RDI: ffffa13e3710f480 Nov 05 20:58:18 fir-md1-s1 kernel: RBP: ffffa13abe3df800 R08: ffffa10e3f05b780 R09: 0000000000000000 Nov 05 20:58:18 fir-md1-s1 kernel: R10: ffffa10e3f05f140 R11: ffffda91c3226e00 R12: 0000000000000000 Nov 05 20:58:18 fir-md1-s1 kernel: R13: ffffa13abe3df7a0 R14: ffffa1179ddf1a08 R15: 0000000000000000 Nov 05 20:58:18 fir-md1-s1 kernel: FS: 00007f38cab19700(0000) GS:ffffa10e3f040000(0000) knlGS:0000000000000000 Nov 05 20:58:18 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:18 fir-md1-s1 kernel: CR2: 00007f38cf67f000 CR3: 0000004026a50000 CR4: 00000000003407e0 Nov 05 20:58:18 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:18 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:18 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ktime_get+0x52/0xe0 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? kiblnd_post_tx_locked+0x7bb/0xa50 [ko2iblnd] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? cfs_hash_bd_lookup_intent+0x63/0x170 [libcfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 Nov 05 20:58:18 fir-md1-s1 kernel: [] osd_ldiskfs_map_inode_pages+0x143/0x420 [osd_ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] osd_write_prep+0x2b6/0x360 [osd_ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] mdt_obd_preprw+0x65b/0x10a0 [mdt] Nov 05 20:58:18 fir-md1-s1 kernel: [] tgt_brw_write+0xc7c/0x1cf0 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? load_balance+0x1be/0x9a0 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? __enqueue_entity+0x78/0x80 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? tgt_lookup_reply+0x2d/0x190 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:18 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? __schedule+0x42a/0x860 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:18 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:18 fir-md1-s1 kernel: Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b Nov 05 20:58:18 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#37 stuck for 22s! [ldlm_cn01_025:60053] Nov 05 20:58:18 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:18 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:18 fir-md1-s1 kernel: CPU: 37 PID: 60053 Comm: ldlm_cn01_025 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:18 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:18 fir-md1-s1 kernel: task: ffffa129b1a81040 ti: ffffa1257dff0000 task.ti: ffffa1257dff0000 Nov 05 20:58:18 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 Nov 05 20:58:18 fir-md1-s1 kernel: RSP: 0018:ffffa1257dff3880 EFLAGS: 00000246 Nov 05 20:58:18 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa11c4a0cc5e8 RCX: 0000000001290000 Nov 05 20:58:18 fir-md1-s1 kernel: RDX: ffffa10e3eedb780 RSI: 0000000000610101 RDI: ffffa13e3710f480 Nov 05 20:58:18 fir-md1-s1 kernel: RBP: ffffa1257dff3880 R08: ffffa11e3f85b780 R09: 0000000000000000 Nov 05 20:58:18 fir-md1-s1 kernel: R10: ffffa11e3f85f0c0 R11: ffffda91921d70c0 R12: 0000000000000000 Nov 05 20:58:18 fir-md1-s1 kernel: R13: ffffffffc13da26c R14: ffffa1257dff3810 R15: ffffa1257dff38c0 Nov 05 20:58:18 fir-md1-s1 kernel: FS: 00007f2faba6b740(0000) GS:ffffa11e3f840000(0000) knlGS:0000000000000000 Nov 05 20:58:18 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:18 fir-md1-s1 kernel: CR2: 00007f2fab52c8f0 CR3: 000000364fa10000 CR4: 00000000003407e0 Nov 05 20:58:18 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:18 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:18 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_es_lru_del+0x32/0x70 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_clear_inode+0x41/0x90 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_free_inode+0x10b/0x610 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ldiskfs_mark_iloc_dirty+0x68/0x80 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ldiskfs_evict_inode+0x472/0x630 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ldiskfs_mark_inode_dirty+0x6f/0x210 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_evict_inode+0x57d/0x630 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] evict+0xb4/0x180 Nov 05 20:58:18 fir-md1-s1 kernel: [] iput+0xfc/0x190 Nov 05 20:58:18 fir-md1-s1 kernel: [] osd_object_delete+0x1d2/0x330 [osd_ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] lu_object_free.isra.32+0x68/0x170 [obdclass] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? cfs_hash_bd_from_key+0x38/0xb0 [libcfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] lu_object_put+0xc5/0x3d0 [obdclass] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? mdt_punch_hpreq_fini+0x10/0x10 [mdt] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldlm_dom_discard_cp_ast+0xb1/0x2b0 [mdt] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldlm_work_cp_ast_lock+0xa8/0x1d0 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x72/0x790 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? kmem_cache_alloc_node_trace+0x11d/0x210 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? lprocfs_counter_add+0xf9/0x160 [obdclass] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ldlm_work_gl_ast_lock+0x3a0/0x3a0 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ptlrpc_prep_set+0xd2/0x280 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] __ldlm_reprocess_all+0x11f/0x360 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldlm_reprocess_all+0x13/0x20 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldlm_request_cancel+0x42f/0x780 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldlm_handle_cancel+0x232/0x2b0 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldlm_cancel_handler+0x158/0x590 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:18 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? __schedule+0x42a/0x860 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:18 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:18 fir-md1-s1 kernel: Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b Nov 05 20:58:18 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#38 stuck for 22s! [mdt_io02_034:41734] Nov 05 20:58:18 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:18 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:18 fir-md1-s1 kernel: CPU: 38 PID: 41734 Comm: mdt_io02_034 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:18 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:18 fir-md1-s1 kernel: task: ffffa11e2a1e4100 ti: ffffa13cd3a70000 task.ti: ffffa13cd3a70000 Nov 05 20:58:18 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x126/0x200 Nov 05 20:58:18 fir-md1-s1 kernel: RSP: 0018:ffffa13cd3a73800 EFLAGS: 00000246 Nov 05 20:58:18 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa130a9500be0 RCX: 0000000001310000 Nov 05 20:58:18 fir-md1-s1 kernel: RDX: ffffa12e3f8db780 RSI: 0000000001710101 RDI: ffffa13e3710f480 Nov 05 20:58:18 fir-md1-s1 kernel: RBP: ffffa13cd3a73800 R08: ffffa12e3f85b780 R09: 0000000000000000 Nov 05 20:58:18 fir-md1-s1 kernel: R10: ffffa12e3f85f140 R11: ffffda91d59da200 R12: 0000000000000000 Nov 05 20:58:18 fir-md1-s1 kernel: R13: ffffa13cd3a737a0 R14: ffffa130a9500948 R15: 0000000000000000 Nov 05 20:58:18 fir-md1-s1 kernel: FS: 00007f38ccb1d700(0000) GS:ffffa12e3f840000(0000) knlGS:0000000000000000 Nov 05 20:58:18 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:18 fir-md1-s1 kernel: CR2: 000000000124f178 CR3: 000000364fa10000 CR4: 00000000003407e0 Nov 05 20:58:18 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:18 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:18 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ktime_get+0x52/0xe0 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? kiblnd_post_tx_locked+0x7bb/0xa50 [ko2iblnd] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? cfs_hash_bd_lookup_intent+0x63/0x170 [libcfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 Nov 05 20:58:18 fir-md1-s1 kernel: [] osd_ldiskfs_map_inode_pages+0x143/0x420 [osd_ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] osd_write_prep+0x2b6/0x360 [osd_ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] mdt_obd_preprw+0x65b/0x10a0 [mdt] Nov 05 20:58:18 fir-md1-s1 kernel: [] tgt_brw_write+0xc7c/0x1cf0 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? load_balance+0x178/0x9a0 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? update_curr+0x14c/0x1e0 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? __enqueue_entity+0x78/0x80 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? enqueue_entity+0x2ef/0xbe0 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? tgt_lookup_reply+0x2d/0x190 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:18 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:18 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:18 fir-md1-s1 kernel: Code: 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 41 8b 40 08 <85> c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b 17 0f b7 c2 Nov 05 20:58:18 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#39 stuck for 22s! [mdt_io03_033:41722] Nov 05 20:58:18 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:18 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:18 fir-md1-s1 kernel: CPU: 39 PID: 41722 Comm: mdt_io03_033 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:18 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:18 fir-md1-s1 kernel: task: ffffa12114f0a080 ti: ffffa12d8d798000 task.ti: ffffa12d8d798000 Nov 05 20:58:18 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 Nov 05 20:58:18 fir-md1-s1 kernel: RSP: 0018:ffffa12d8d79b800 EFLAGS: 00000246 Nov 05 20:58:18 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa130a610a500 RCX: 0000000001390000 Nov 05 20:58:18 fir-md1-s1 kernel: RDX: ffffa12e3f7db780 RSI: 0000000000f10101 RDI: ffffa13e3710f480 Nov 05 20:58:18 fir-md1-s1 kernel: RBP: ffffa12d8d79b800 R08: ffffa13e7f65b780 R09: 0000000000000000 Nov 05 20:58:18 fir-md1-s1 kernel: R10: ffffa13e7f65f140 R11: ffffda920b340e00 R12: 0000000000000000 Nov 05 20:58:18 fir-md1-s1 kernel: R13: ffffa12d8d79b7a0 R14: ffffa130a610a268 R15: 0000000000000000 Nov 05 20:58:18 fir-md1-s1 kernel: FS: 00007f28fc018840(0000) GS:ffffa13e7f640000(0000) knlGS:0000000000000000 Nov 05 20:58:18 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:18 fir-md1-s1 kernel: CR2: 00007fff565b7e28 CR3: 000000364fa10000 CR4: 00000000003407e0 Nov 05 20:58:18 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:18 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:18 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ktime_get+0x52/0xe0 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? kiblnd_post_tx_locked+0x7bb/0xa50 [ko2iblnd] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? cfs_hash_bd_lookup_intent+0x63/0x170 [libcfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 Nov 05 20:58:18 fir-md1-s1 kernel: [] osd_ldiskfs_map_inode_pages+0x143/0x420 [osd_ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] osd_write_prep+0x2b6/0x360 [osd_ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] mdt_obd_preprw+0x65b/0x10a0 [mdt] Nov 05 20:58:18 fir-md1-s1 kernel: [] tgt_brw_write+0xc7c/0x1cf0 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? tgt_free_reply_data+0x128/0x3b0 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? kfree+0x106/0x140 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? tgt_free_reply_data+0x128/0x3b0 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? tgt_lookup_reply+0x2d/0x190 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:18 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:18 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:18 fir-md1-s1 kernel: Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b Nov 05 20:58:18 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#40 stuck for 22s! [mdt_io00_028:41672] Nov 05 20:58:18 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:18 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:18 fir-md1-s1 kernel: CPU: 40 PID: 41672 Comm: mdt_io00_028 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:18 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:18 fir-md1-s1 kernel: task: ffffa12a51e7e180 ti: ffffa12df7f5c000 task.ti: ffffa12df7f5c000 Nov 05 20:58:18 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 Nov 05 20:58:18 fir-md1-s1 kernel: RSP: 0018:ffffa12df7f5f800 EFLAGS: 00000246 Nov 05 20:58:18 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa1179dd7d740 RCX: 0000000001410000 Nov 05 20:58:18 fir-md1-s1 kernel: RDX: ffffa11e3f89b780 RSI: 0000000001490101 RDI: ffffa13e3710f480 Nov 05 20:58:18 fir-md1-s1 kernel: RBP: ffffa12df7f5f800 R08: ffffa10e3f09b780 R09: 0000000000000000 Nov 05 20:58:18 fir-md1-s1 kernel: R10: ffffa10e3f09f140 R11: ffffda9191921000 R12: 0000000000000000 Nov 05 20:58:18 fir-md1-s1 kernel: R13: ffffa12df7f5f7a0 R14: ffffa1179dd7d4a8 R15: 0000000000000000 Nov 05 20:58:18 fir-md1-s1 kernel: FS: 00007f8f6cd9d880(0000) GS:ffffa10e3f080000(0000) knlGS:0000000000000000 Nov 05 20:58:18 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:18 fir-md1-s1 kernel: CR2: 00007f8f6cdb3000 CR3: 0000004022bfc000 CR4: 00000000003407e0 Nov 05 20:58:18 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:18 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:18 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ktime_get+0x52/0xe0 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? kiblnd_post_tx_locked+0x7bb/0xa50 [ko2iblnd] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? cfs_hash_bd_lookup_intent+0x63/0x170 [libcfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 Nov 05 20:58:18 fir-md1-s1 kernel: [] osd_ldiskfs_map_inode_pages+0x143/0x420 [osd_ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] osd_write_prep+0x2b6/0x360 [osd_ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] mdt_obd_preprw+0x65b/0x10a0 [mdt] Nov 05 20:58:18 fir-md1-s1 kernel: [] tgt_brw_write+0xc7c/0x1cf0 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? load_balance+0x1be/0x9a0 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? __enqueue_entity+0x78/0x80 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? tgt_lookup_reply+0x2d/0x190 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:18 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:18 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:18 fir-md1-s1 kernel: Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b Nov 05 20:58:18 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#41 stuck for 22s! [mdt01_031:41137] Nov 05 20:58:18 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:18 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:18 fir-md1-s1 kernel: CPU: 41 PID: 41137 Comm: mdt01_031 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:18 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:18 fir-md1-s1 kernel: task: ffffa11e34580000 ti: ffffa11e366c0000 task.ti: ffffa11e366c0000 Nov 05 20:58:18 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x126/0x200 Nov 05 20:58:18 fir-md1-s1 kernel: RSP: 0018:ffffa11e366c3930 EFLAGS: 00000246 Nov 05 20:58:18 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa129c7c0c5e8 RCX: 0000000001490000 Nov 05 20:58:18 fir-md1-s1 kernel: RDX: ffffa13e7f6db780 RSI: 0000000001790101 RDI: ffffa13e3710f480 Nov 05 20:58:18 fir-md1-s1 kernel: RBP: ffffa11e366c3930 R08: ffffa11e3f89b780 R09: 0000000000000000 Nov 05 20:58:18 fir-md1-s1 kernel: R10: ffffa11e3f89f0c0 R11: ffffda919bac6780 R12: 0000000000000000 Nov 05 20:58:18 fir-md1-s1 kernel: R13: ffffffffc13da26c R14: ffffa11e366c38c0 R15: ffffa11e366c3970 Nov 05 20:58:18 fir-md1-s1 kernel: FS: 00007f3f2c704700(0000) GS:ffffa11e3f880000(0000) knlGS:0000000000000000 Nov 05 20:58:18 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:18 fir-md1-s1 kernel: CR2: 00007f3f2c7d8000 CR3: 000000364fa10000 CR4: 00000000003407e0 Nov 05 20:58:18 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:18 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:18 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_es_lru_del+0x32/0x70 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_clear_inode+0x41/0x90 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_free_inode+0x10b/0x610 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ldiskfs_mark_iloc_dirty+0x68/0x80 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ldiskfs_evict_inode+0x472/0x630 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ldiskfs_mark_inode_dirty+0x6f/0x210 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_evict_inode+0x57d/0x630 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] evict+0xb4/0x180 Nov 05 20:58:18 fir-md1-s1 kernel: [] iput+0xfc/0x190 Nov 05 20:58:18 fir-md1-s1 kernel: [] osd_object_delete+0x1d2/0x330 [osd_ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] lu_object_free.isra.32+0x68/0x170 [obdclass] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? cfs_hash_bd_from_key+0x38/0xb0 [libcfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] lu_object_put+0xc5/0x3d0 [obdclass] Nov 05 20:58:18 fir-md1-s1 kernel: [] mdt_reint_unlink+0x813/0x14b0 [mdt] Nov 05 20:58:18 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Nov 05 20:58:18 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? mdt_thread_info_init+0xa4/0x1e0 [mdt] Nov 05 20:58:18 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Nov 05 20:58:18 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? wake_up_state+0x20/0x20 Nov 05 20:58:18 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:18 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:18 fir-md1-s1 kernel: Code: 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 41 8b 40 08 <85> c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b 17 0f b7 c2 Nov 05 20:58:18 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#42 stuck for 22s! [mdt_io02_017:41591] Nov 05 20:58:18 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:18 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:18 fir-md1-s1 kernel: CPU: 42 PID: 41591 Comm: mdt_io02_017 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:18 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:18 fir-md1-s1 kernel: task: ffffa12ddfd31040 ti: ffffa12d54f0c000 task.ti: ffffa12d54f0c000 Nov 05 20:58:18 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x126/0x200 Nov 05 20:58:18 fir-md1-s1 kernel: RSP: 0018:ffffa12d54f0f800 EFLAGS: 00000246 Nov 05 20:58:18 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa1179ddf5740 RCX: 0000000001510000 Nov 05 20:58:18 fir-md1-s1 kernel: RDX: ffffa13e7f65b780 RSI: 0000000001390101 RDI: ffffa13e3710f480 Nov 05 20:58:18 fir-md1-s1 kernel: RBP: ffffa12d54f0f800 R08: ffffa12e3f89b780 R09: 0000000000000000 Nov 05 20:58:18 fir-md1-s1 kernel: R10: ffffa12e3f89f140 R11: ffffda91c31d4e00 R12: 0000000000000000 Nov 05 20:58:18 fir-md1-s1 kernel: R13: ffffa12d54f0f7a0 R14: ffffa1179ddf54a8 R15: 0000000000000000 Nov 05 20:58:18 fir-md1-s1 kernel: FS: 00007f94d47aa880(0000) GS:ffffa12e3f880000(0000) knlGS:0000000000000000 Nov 05 20:58:18 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:18 fir-md1-s1 kernel: CR2: 00007f94c30e1010 CR3: 000000202a620000 CR4: 00000000003407e0 Nov 05 20:58:18 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:18 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:18 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ktime_get+0x52/0xe0 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? kiblnd_post_tx_locked+0x7bb/0xa50 [ko2iblnd] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? cfs_hash_bd_lookup_intent+0x63/0x170 [libcfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 Nov 05 20:58:18 fir-md1-s1 kernel: [] osd_ldiskfs_map_inode_pages+0x143/0x420 [osd_ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] osd_write_prep+0x2b6/0x360 [osd_ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] mdt_obd_preprw+0x65b/0x10a0 [mdt] Nov 05 20:58:18 fir-md1-s1 kernel: [] tgt_brw_write+0xc7c/0x1cf0 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? lustre_msg_buf_v2+0x1e0/0x1e0 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? lustre_msg_buf+0x17/0x60 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? update_curr+0x14c/0x1e0 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? account_entity_dequeue+0xae/0xd0 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? dequeue_entity+0x11c/0x5e0 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? tgt_lookup_reply+0x2d/0x190 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? wake_up_state+0x20/0x20 Nov 05 20:58:18 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:18 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:18 fir-md1-s1 kernel: Code: 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 41 8b 40 08 <85> c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b 17 0f b7 c2 Nov 05 20:58:18 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#43 stuck for 22s! [ldlm_cn03_028:61247] Nov 05 20:58:18 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:18 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:18 fir-md1-s1 kernel: CPU: 43 PID: 61247 Comm: ldlm_cn03_028 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:18 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:18 fir-md1-s1 kernel: task: ffffa129b1a82080 ti: ffffa128f4b88000 task.ti: ffffa128f4b88000 Nov 05 20:58:18 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 Nov 05 20:58:18 fir-md1-s1 kernel: RSP: 0018:ffffa128f4b8b880 EFLAGS: 00000246 Nov 05 20:58:18 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa12b7c0c56a8 RCX: 0000000001590000 Nov 05 20:58:18 fir-md1-s1 kernel: RDX: ffffa11e3f61b780 RSI: 0000000000090101 RDI: ffffa13e3710f480 Nov 05 20:58:18 fir-md1-s1 kernel: RBP: ffffa128f4b8b880 R08: ffffa13e7f69b780 R09: 0000000000000000 Nov 05 20:58:18 fir-md1-s1 kernel: R10: ffffa13e7f69f0c0 R11: ffffda91c3097340 R12: 0000000000000000 Nov 05 20:58:18 fir-md1-s1 kernel: R13: ffffffffc13da26c R14: ffffa128f4b8b810 R15: ffffa128f4b8b8c0 Nov 05 20:58:18 fir-md1-s1 kernel: FS: 00007f9cd359a740(0000) GS:ffffa13e7f680000(0000) knlGS:0000000000000000 Nov 05 20:58:18 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:18 fir-md1-s1 kernel: CR2: 00007f9cd31851cc CR3: 000000364fa10000 CR4: 00000000003407e0 Nov 05 20:58:18 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:18 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:18 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_es_lru_del+0x32/0x70 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_clear_inode+0x41/0x90 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_free_inode+0x10b/0x610 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ldiskfs_mark_iloc_dirty+0x68/0x80 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ldiskfs_evict_inode+0x472/0x630 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ldiskfs_mark_inode_dirty+0x6f/0x210 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_evict_inode+0x57d/0x630 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] evict+0xb4/0x180 Nov 05 20:58:18 fir-md1-s1 kernel: [] iput+0xfc/0x190 Nov 05 20:58:18 fir-md1-s1 kernel: [] osd_object_delete+0x1d2/0x330 [osd_ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] lu_object_free.isra.32+0x68/0x170 [obdclass] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? cfs_hash_bd_from_key+0x38/0xb0 [libcfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] lu_object_put+0xc5/0x3d0 [obdclass] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? mdt_punch_hpreq_fini+0x10/0x10 [mdt] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldlm_dom_discard_cp_ast+0xb1/0x2b0 [mdt] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldlm_work_cp_ast_lock+0xa8/0x1d0 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ptlrpc_set_wait+0x72/0x790 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? kmem_cache_alloc_node_trace+0x11d/0x210 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? lprocfs_counter_add+0xf9/0x160 [obdclass] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ldlm_work_gl_ast_lock+0x3a0/0x3a0 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ptlrpc_prep_set+0xd2/0x280 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] __ldlm_reprocess_all+0x11f/0x360 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldlm_reprocess_all+0x13/0x20 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldlm_request_cancel+0x42f/0x780 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldlm_handle_cancel+0x232/0x2b0 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldlm_cancel_handler+0x158/0x590 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:18 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? __schedule+0x42a/0x860 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:18 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:18 fir-md1-s1 kernel: Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b Nov 05 20:58:18 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#44 stuck for 22s! [mdt_io00_001:40674] Nov 05 20:58:18 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:18 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:18 fir-md1-s1 kernel: CPU: 44 PID: 40674 Comm: mdt_io00_001 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:18 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:18 fir-md1-s1 kernel: task: ffffa1113b326180 ti: ffffa1126fef8000 task.ti: ffffa1126fef8000 Nov 05 20:58:18 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 Nov 05 20:58:18 fir-md1-s1 kernel: RSP: 0018:ffffa1126fefb800 EFLAGS: 00000246 Nov 05 20:58:18 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa12b7c054ee0 RCX: 0000000001610000 Nov 05 20:58:18 fir-md1-s1 kernel: RDX: ffffa10e3f01b780 RSI: 0000000001010101 RDI: ffffa13e3710f480 Nov 05 20:58:18 fir-md1-s1 kernel: RBP: ffffa1126fefb800 R08: ffffa10e3f0db780 R09: 0000000000000000 Nov 05 20:58:18 fir-md1-s1 kernel: R10: ffffa10e3f0df140 R11: ffffda920a9b6800 R12: 0000000000000000 Nov 05 20:58:18 fir-md1-s1 kernel: R13: ffffa1126fefb7a0 R14: ffffa12b7c054c48 R15: 0000000000000000 Nov 05 20:58:18 fir-md1-s1 kernel: FS: 00007f5420c78700(0000) GS:ffffa10e3f0c0000(0000) knlGS:0000000000000000 Nov 05 20:58:18 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:18 fir-md1-s1 kernel: CR2: 00007f636f0e8000 CR3: 0000003014466000 CR4: 00000000003407e0 Nov 05 20:58:18 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:18 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:18 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ktime_get+0x52/0xe0 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? kiblnd_post_tx_locked+0x7bb/0xa50 [ko2iblnd] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? cfs_hash_bd_lookup_intent+0x63/0x170 [libcfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 Nov 05 20:58:18 fir-md1-s1 kernel: [] osd_ldiskfs_map_inode_pages+0x143/0x420 [osd_ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] osd_write_prep+0x2b6/0x360 [osd_ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] mdt_obd_preprw+0x65b/0x10a0 [mdt] Nov 05 20:58:18 fir-md1-s1 kernel: [] tgt_brw_write+0xc7c/0x1cf0 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? lustre_msg_buf_v2+0x1e0/0x1e0 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? __slab_free+0x81/0x2f0 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? update_curr+0x14c/0x1e0 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? account_entity_dequeue+0xae/0xd0 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? dequeue_entity+0x11c/0x5e0 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? tgt_lookup_reply+0x2d/0x190 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:18 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:18 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:18 fir-md1-s1 kernel: Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b Nov 05 20:58:18 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#46 stuck for 22s! [mdt_io02_046:41773] Nov 05 20:58:18 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:18 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:18 fir-md1-s1 kernel: CPU: 46 PID: 41773 Comm: mdt_io02_046 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:18 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:18 fir-md1-s1 kernel: task: ffffa11e283c6180 ti: ffffa11e2a6e8000 task.ti: ffffa11e2a6e8000 Nov 05 20:58:18 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 Nov 05 20:58:18 fir-md1-s1 kernel: RSP: 0018:ffffa11e2a6eb800 EFLAGS: 00000246 Nov 05 20:58:18 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa121143a2500 RCX: 0000000001710000 Nov 05 20:58:18 fir-md1-s1 kernel: RDX: ffffa13e7f45b780 RSI: 0000000000390101 RDI: ffffa13e3710f480 Nov 05 20:58:18 fir-md1-s1 kernel: RBP: ffffa11e2a6eb800 R08: ffffa12e3f8db780 R09: 0000000000000000 Nov 05 20:58:18 fir-md1-s1 kernel: R10: ffffa12e3f8df140 R11: ffffda91c30ef600 R12: 0000000000000000 Nov 05 20:58:18 fir-md1-s1 kernel: R13: ffffa11e2a6eb7a0 R14: ffffa121143a2268 R15: 0000000000000000 Nov 05 20:58:18 fir-md1-s1 kernel: FS: 00007f3f2c767700(0000) GS:ffffa12e3f8c0000(0000) knlGS:0000000000000000 Nov 05 20:58:18 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:18 fir-md1-s1 kernel: CR2: 000000000280e248 CR3: 000000364fa10000 CR4: 00000000003407e0 Nov 05 20:58:18 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:18 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:18 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ktime_get+0x52/0xe0 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? kiblnd_post_tx_locked+0x7bb/0xa50 [ko2iblnd] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? cfs_hash_bd_lookup_intent+0x63/0x170 [libcfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 Nov 05 20:58:18 fir-md1-s1 kernel: [] osd_ldiskfs_map_inode_pages+0x143/0x420 [osd_ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] osd_write_prep+0x2b6/0x360 [osd_ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] mdt_obd_preprw+0x65b/0x10a0 [mdt] Nov 05 20:58:18 fir-md1-s1 kernel: [] tgt_brw_write+0xc7c/0x1cf0 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? update_curr+0x14c/0x1e0 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? __enqueue_entity+0x78/0x80 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? enqueue_entity+0x2ef/0xbe0 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? tgt_lookup_reply+0x2d/0x190 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:18 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? __schedule+0x42a/0x860 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:18 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:18 fir-md1-s1 kernel: Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b Nov 05 20:58:18 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#47 stuck for 23s! [mdt03_107:41490] Nov 05 20:58:18 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:18 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:18 fir-md1-s1 kernel: CPU: 47 PID: 41490 Comm: mdt03_107 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:18 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:18 fir-md1-s1 kernel: task: ffffa12ce5d10000 ti: ffffa12a3a350000 task.ti: ffffa12a3a350000 Nov 05 20:58:18 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 Nov 05 20:58:18 fir-md1-s1 kernel: RSP: 0018:ffffa12a3a353510 EFLAGS: 00000246 Nov 05 20:58:18 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa130a137ec30 RCX: 0000000001790000 Nov 05 20:58:18 fir-md1-s1 kernel: RDX: ffffa13e7f4db780 RSI: 0000000000790101 RDI: ffffa13e3710f480 Nov 05 20:58:18 fir-md1-s1 kernel: RBP: ffffa12a3a353510 R08: ffffa13e7f6db780 R09: 0000000000000000 Nov 05 20:58:18 fir-md1-s1 kernel: R10: ffffa13e7f6df140 R11: ffffda91b8d46800 R12: 0000000000000000 Nov 05 20:58:18 fir-md1-s1 kernel: R13: ffffa12a3a3534b0 R14: ffffa130a137e998 R15: 0000000000000000 Nov 05 20:58:18 fir-md1-s1 kernel: FS: 00007f67bd132740(0000) GS:ffffa13e7f6c0000(0000) knlGS:0000000000000000 Nov 05 20:58:18 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:18 fir-md1-s1 kernel: CR2: 0000000000412480 CR3: 000000364fa10000 CR4: 00000000003407e0 Nov 05 20:58:18 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:18 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:18 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? lprocfs_counter_sub+0xc1/0x130 [obdclass] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? dynlock_unlock+0x194/0x1e0 [osd_ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? __brelse+0x3d/0x50 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? iam_path_release+0x42/0x60 [osd_ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_getblk+0x65/0x200 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_bread+0x27/0xc0 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_append+0x81/0x150 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_init_new_dir+0xcf/0x230 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? __brelse+0x3d/0x50 Nov 05 20:58:18 fir-md1-s1 kernel: [] ldiskfs_add_dot_dotdot+0x4e/0x90 [ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] osd_add_dot_dotdot_internal.isra.76+0x5f/0x80 [osd_ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] osd_index_ea_insert+0xbaa/0x12f0 [osd_ldiskfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] lod_sub_insert+0x1c1/0x340 [lod] Nov 05 20:58:18 fir-md1-s1 kernel: [] lod_insert+0x24/0x30 [lod] Nov 05 20:58:18 fir-md1-s1 kernel: [] __mdd_index_insert_only+0x1cc/0x280 [mdd] Nov 05 20:58:18 fir-md1-s1 kernel: [] mdd_create_object+0x6c8/0x820 [mdd] Nov 05 20:58:18 fir-md1-s1 kernel: [] mdd_create+0xe31/0x14e0 [mdd] Nov 05 20:58:18 fir-md1-s1 kernel: [] mdt_create+0xb54/0x1090 [mdt] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? lprocfs_stats_lock+0x24/0xd0 [obdclass] Nov 05 20:58:18 fir-md1-s1 kernel: [] mdt_reint_create+0x16b/0x360 [mdt] Nov 05 20:58:18 fir-md1-s1 kernel: [] mdt_reint_rec+0x83/0x210 [mdt] Nov 05 20:58:18 fir-md1-s1 kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? mdt_thread_info_init+0xa4/0x1e0 [mdt] Nov 05 20:58:18 fir-md1-s1 kernel: [] mdt_reint+0x67/0x140 [mdt] Nov 05 20:58:18 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] Nov 05 20:58:18 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:18 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] ? __schedule+0x42a/0x860 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:18 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:18 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:18 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:18 fir-md1-s1 kernel: Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b Nov 05 20:58:18 fir-md1-s1 kernel: Lustre: mdt_io: This server is not able to keep up with request traffic (cpu-bound). Nov 05 20:58:18 fir-md1-s1 kernel: Lustre: 41569:0:(service.c:1541:ptlrpc_at_check_timed()) earlyQ=6 reqQ=0 recA=6, svcEst=1, delay=58373 Nov 05 20:58:18 fir-md1-s1 kernel: LustreError: 41994:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.9.117.23@o2ib4: deadline 6:53s ago req@ffffa12e103b9850 x1649309009656736/t0(0) o4->ec5357f4-3a41-e113-e586-2392fb551089@10.9.117.23@o2ib4:169/0 lens 488/0 e 0 to 0 dl 1573016244 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Nov 05 20:58:18 fir-md1-s1 kernel: Lustre: 41569:0:(service.c:1322:ptlrpc_at_send_early_reply()) @@@ Already past deadline (-53s), not sending early reply. Consider increasing at_early_margin (5)? req@ffffa12e103b9850 x1649309009656736/t0(0) o4->ec5357f4-3a41-e113-e586-2392fb551089@10.9.117.23@o2ib4:169/0 lens 488/0 e 0 to 0 dl 1573016244 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Nov 05 20:58:18 fir-md1-s1 kernel: Lustre: 41569:0:(service.c:1322:ptlrpc_at_send_early_reply()) Skipped 1 previous similar message Nov 05 20:58:18 fir-md1-s1 kernel: Lustre: 22971:0:(service.c:2011:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 59s req@ffffa130c752b180 x1649308741475696/t0(0) o103->e5f223a6-6328-6c0e-211f-85c2a9cfaa07@10.9.116.14@o2ib4:0/0 lens 328/0 e 0 to 0 dl 0 ref 1 fl New:/0/ffffffff rc 0/-1 Nov 05 20:58:18 fir-md1-s1 kernel: Lustre: 41994:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:53s); client may timeout. req@ffffa12e103b9850 x1649309009656736/t0(0) o4->ec5357f4-3a41-e113-e586-2392fb551089@10.9.117.23@o2ib4:169/0 lens 488/0 e 0 to 0 dl 1573016244 ref 2 fl Interpret:/0/ffffffff rc 0/-1 Nov 05 20:58:18 fir-md1-s1 kernel: Lustre: 41994:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 6 previous similar messages Nov 05 20:58:18 fir-md1-s1 kernel: Lustre: 40955:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1573016238/real 1573016238] req@ffffa130a7bb4c80 x1649331621536752/t0(0) o104->fir-MDT0000@10.9.102.22@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1573016245 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Nov 05 20:58:18 fir-md1-s1 kernel: Lustre: 40955:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 6 previous similar messages Nov 05 20:58:18 fir-md1-s1 kernel: Lustre: fir-OST0026-osc-MDT0000: Connection to fir-OST0026 (at 10.0.10.107@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Nov 05 20:58:18 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.0.10.54@o2ib7, removing former export from same NID Nov 05 20:58:18 fir-md1-s1 kernel: Lustre: MGS: Connection restored to 10.0.10.54@o2ib7 (at 10.0.10.54@o2ib7) Nov 05 20:58:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client fir-MDT0000-lwp-OST000a_UUID (at 10.0.10.101@o2ib7) reconnecting Nov 05 20:58:18 fir-md1-s1 kernel: LustreError: 41802:0:(sec.c:2485:sptlrpc_svc_unwrap_bulk()) @@@ truncated bulk GET 0(16430) req@ffffa13d7f2dd050 x1649295093014704/t0(0) o4->8cea164c-9518-4a0e-6f8c-c6ee346f8a71@10.9.109.58@o2ib4:169/0 lens 488/448 e 0 to 0 dl 1573016244 ref 1 fl Interpret:/0/0 rc 0/0 Nov 05 20:58:18 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO write error with 8cea164c-9518-4a0e-6f8c-c6ee346f8a71 (at 10.9.109.58@o2ib4), client will retry: rc = -110 Nov 05 20:58:18 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 20:58:34 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#8 stuck for 23s! [mdt_io00_016:41638] Nov 05 20:58:34 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:34 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:34 fir-md1-s1 kernel: CPU: 8 PID: 41638 Comm: mdt_io00_016 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:34 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:34 fir-md1-s1 kernel: task: ffffa13947bde180 ti: ffffa13ce1bc4000 task.ti: ffffa13ce1bc4000 Nov 05 20:58:34 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 Nov 05 20:58:34 fir-md1-s1 kernel: RSP: 0018:ffffa13ce1bc7800 EFLAGS: 00000246 Nov 05 20:58:34 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa1179dd3b5c0 RCX: 0000000000410000 Nov 05 20:58:34 fir-md1-s1 kernel: RDX: ffffa10e3ef1b780 RSI: 0000000000810101 RDI: ffffa13e3710f480 Nov 05 20:58:34 fir-md1-s1 kernel: RBP: ffffa13ce1bc7800 R08: ffffa10e3ee9b780 R09: 0000000000000000 Nov 05 20:58:34 fir-md1-s1 kernel: R10: ffffa10e3ee9f140 R11: ffffda916229f000 R12: 0000000000000000 Nov 05 20:58:34 fir-md1-s1 kernel: R13: ffffa13ce1bc77a0 R14: ffffa1179dd3b328 R15: 0000000000000000 Nov 05 20:58:34 fir-md1-s1 kernel: FS: 00007f7a69e73740(0000) GS:ffffa10e3ee80000(0000) knlGS:0000000000000000 Nov 05 20:58:34 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:34 fir-md1-s1 kernel: CR2: 00007f6f9994f000 CR3: 000000364fa10000 CR4: 00000000003407e0 Nov 05 20:58:34 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:34 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:34 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:34 fir-md1-s1 kernel: [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] Nov 05 20:58:34 fir-md1-s1 kernel: [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] Nov 05 20:58:34 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 Nov 05 20:58:34 fir-md1-s1 kernel: [] ? ktime_get+0x52/0xe0 Nov 05 20:58:34 fir-md1-s1 kernel: [] ? ___slab_alloc+0x209/0x4f0 Nov 05 20:58:34 fir-md1-s1 kernel: [] ? kiblnd_post_tx_locked+0x7bb/0xa50 [ko2iblnd] Nov 05 20:58:34 fir-md1-s1 kernel: [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] Nov 05 20:58:34 fir-md1-s1 kernel: [] ? cfs_hash_bd_lookup_intent+0x63/0x170 [libcfs] Nov 05 20:58:34 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 Nov 05 20:58:34 fir-md1-s1 kernel: [] osd_ldiskfs_map_inode_pages+0x143/0x420 [osd_ldiskfs] Nov 05 20:58:34 fir-md1-s1 kernel: [] osd_write_prep+0x2b6/0x360 [osd_ldiskfs] Nov 05 20:58:34 fir-md1-s1 kernel: [] mdt_obd_preprw+0x65b/0x10a0 [mdt] Nov 05 20:58:34 fir-md1-s1 kernel: [] tgt_brw_write+0xc7c/0x1cf0 [ptlrpc] Nov 05 20:58:34 fir-md1-s1 kernel: [] ? lustre_msg_buf_v2+0x1e0/0x1e0 [ptlrpc] Nov 05 20:58:34 fir-md1-s1 kernel: [] ? update_curr+0x14c/0x1e0 Nov 05 20:58:34 fir-md1-s1 kernel: [] ? __enqueue_entity+0x78/0x80 Nov 05 20:58:34 fir-md1-s1 kernel: [] ? enqueue_entity+0x2ef/0xbe0 Nov 05 20:58:34 fir-md1-s1 kernel: [] ? tgt_lookup_reply+0x2d/0x190 [ptlrpc] Nov 05 20:58:34 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Nov 05 20:58:34 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] Nov 05 20:58:34 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] Nov 05 20:58:34 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:34 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:34 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:34 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:34 fir-md1-s1 kernel: [] ? __schedule+0x42a/0x860 Nov 05 20:58:34 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:34 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:34 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:34 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:34 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:34 fir-md1-s1 kernel: Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b Nov 05 20:58:34 fir-md1-s1 kernel: LNetError: 19748:0:(o2iblnd_cb.c:3350:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds Nov 05 20:58:34 fir-md1-s1 kernel: LNetError: 19748:0:(o2iblnd_cb.c:3425:kiblnd_check_conns()) Timed out RDMA with 10.0.10.52@o2ib7 (6): c: 6, oc: 0, rc: 8 Nov 05 20:58:34 fir-md1-s1 kernel: LustreError: 19754:0:(events.c:305:request_in_callback()) event type 2, status -5, service mdt_io Nov 05 20:58:34 fir-md1-s1 kernel: LustreError: 41554:0:(pack_generic.c:605:__lustre_unpack_msg()) message length 0 too small for magic/version check Nov 05 20:58:34 fir-md1-s1 kernel: LustreError: 41554:0:(sec.c:2191:sptlrpc_svc_unwrap_request()) error unpacking request from 12345-10.8.27.8@o2ib6 x1649289154295040 Nov 05 20:58:34 fir-md1-s1 kernel: LNetError: 19760:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-103, 0) Nov 05 20:58:34 fir-md1-s1 kernel: Lustre: fir-OST0000-osc-MDT0000: Connection to fir-OST0000 (at 10.0.10.101@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Nov 05 20:58:34 fir-md1-s1 kernel: Lustre: Skipped 17 previous similar messages Nov 05 20:58:34 fir-md1-s1 kernel: LNetError: 19748:0:(o2iblnd_cb.c:3350:kiblnd_check_txs_locked()) Timed out tx: tx_queue, 1 seconds Nov 05 20:58:34 fir-md1-s1 kernel: LNetError: 19748:0:(o2iblnd_cb.c:3350:kiblnd_check_txs_locked()) Skipped 4 previous similar messages Nov 05 20:58:34 fir-md1-s1 kernel: LNetError: 19748:0:(o2iblnd_cb.c:3425:kiblnd_check_conns()) Timed out RDMA with 10.0.10.103@o2ib7 (7): c: 0, oc: 1, rc: 8 Nov 05 20:58:34 fir-md1-s1 kernel: LNetError: 19748:0:(o2iblnd_cb.c:3425:kiblnd_check_conns()) Skipped 4 previous similar messages Nov 05 20:58:34 fir-md1-s1 kernel: Lustre: fir-OST0013-osc-MDT0000: Connection to fir-OST0013 (at 10.0.10.104@o2ib7) was lost; in progress operations using this service will wait for recovery to complete Nov 05 20:58:34 fir-md1-s1 kernel: Lustre: Skipped 19 previous similar messages Nov 05 20:58:34 fir-md1-s1 kernel: LNetError: 19760:0:(lib-msg.c:822:lnet_is_health_check()) Msg is in inconsistent state, don't perform health checking (-103, 0) Nov 05 20:58:34 fir-md1-s1 kernel: LustreError: 19760:0:(events.c:305:request_in_callback()) event type 2, status -5, service mdt_io Nov 05 20:58:34 fir-md1-s1 kernel: LustreError: 19760:0:(events.c:305:request_in_callback()) event type 2, status -5, service mdt_io Nov 05 20:58:34 fir-md1-s1 kernel: LustreError: 41569:0:(pack_generic.c:605:__lustre_unpack_msg()) message length 0 too small for magic/version check Nov 05 20:58:34 fir-md1-s1 kernel: LustreError: 41569:0:(sec.c:2191:sptlrpc_svc_unwrap_request()) error unpacking request from 12345-10.9.104.20@o2ib4 x1648687861855936 Nov 05 20:58:34 fir-md1-s1 kernel: LustreError: 19760:0:(events.c:305:request_in_callback()) event type 2, status -5, service mdt_io Nov 05 20:58:34 fir-md1-s1 kernel: LNet: 19748:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.101@o2ib7: 2 seconds Nov 05 20:58:34 fir-md1-s1 kernel: LNetError: 41237:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.8.18.30@o2ib6 from Nov 05 20:58:34 fir-md1-s1 kernel: LNetError: 41237:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.8.18.30@o2ib6 from Nov 05 20:58:34 fir-md1-s1 kernel: LNetError: 41237:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 212982 previous similar messages Nov 05 20:58:34 fir-md1-s1 kernel: LNet: 19748:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.101@o2ib7: 3 seconds Nov 05 20:58:34 fir-md1-s1 kernel: LNet: 19748:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 115 previous similar messages Nov 05 20:58:34 fir-md1-s1 kernel: LNetError: 41481:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.117.35@o2ib4 from Nov 05 20:58:34 fir-md1-s1 kernel: LNetError: 41481:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 452258 previous similar messages Nov 05 20:58:34 fir-md1-s1 kernel: LNetError: 41452:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.9.117.38@o2ib4 from Nov 05 20:58:34 fir-md1-s1 kernel: LNetError: 41452:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 933387 previous similar messages Nov 05 20:58:34 fir-md1-s1 kernel: LNet: 19748:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.202@o2ib7: 6 seconds Nov 05 20:58:34 fir-md1-s1 kernel: LNet: 19748:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 44 previous similar messages Nov 05 20:58:34 fir-md1-s1 kernel: LNetError: 41314:0:(lib-move.c:2007:lnet_handle_find_routed_path()) no route to 10.8.27.25@o2ib6 from Nov 05 20:58:34 fir-md1-s1 kernel: LNetError: 41314:0:(lib-move.c:2007:lnet_handle_find_routed_path()) Skipped 1973484 previous similar messages Nov 05 20:58:34 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:58:34 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:58:34 fir-md1-s1 kernel: CPU: 0 PID: 41794 Comm: mdt_io00_052 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:58:34 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:58:34 fir-md1-s1 kernel: task: ffffa137ca681040 ti: ffffa10e221fc000 task.ti: ffffa10e221fc000 Nov 05 20:58:34 fir-md1-s1 kernel: RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x122/0x200 Nov 05 20:58:34 fir-md1-s1 kernel: RSP: 0018:ffffa10e221ff800 EFLAGS: 00000246 Nov 05 20:58:34 fir-md1-s1 kernel: RAX: 0000000000000000 RBX: ffffa1179ddeb5c0 RCX: 0000000000010000 Nov 05 20:58:34 fir-md1-s1 kernel: RDX: ffffa10e3f09b780 RSI: 0000000001410101 RDI: ffffa13e3710f480 Nov 05 20:58:34 fir-md1-s1 kernel: RBP: ffffa10e221ff800 R08: ffffa10e3ee1b780 R09: 0000000000000000 Nov 05 20:58:34 fir-md1-s1 kernel: R10: ffffa10e3ee1f140 R11: ffffda91b8d49400 R12: 0000000000000000 Nov 05 20:58:34 fir-md1-s1 kernel: R13: ffffa10e221ff7a0 R14: ffffa1179ddeb328 R15: 0000000000000000 Nov 05 20:58:34 fir-md1-s1 kernel: FS: 00007f4fff165880(0000) GS:ffffa10e3ee00000(0000) knlGS:0000000000000000 Nov 05 20:58:34 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:58:34 fir-md1-s1 kernel: CR2: 00007f4fec48b9e4 CR3: 000000364fa10000 CR4: 00000000003407f0 Nov 05 20:58:34 fir-md1-s1 kernel: Call Trace: Nov 05 20:58:34 fir-md1-s1 kernel: [] queued_spin_lock_slowpath+0xb/0xf Nov 05 20:58:34 fir-md1-s1 kernel: [] _raw_spin_lock+0x20/0x30 Nov 05 20:58:34 fir-md1-s1 kernel: [] ldiskfs_es_lru_add+0x57/0x90 [ldiskfs] Nov 05 20:58:34 fir-md1-s1 kernel: [] ldiskfs_ext_map_blocks+0x7b5/0xf60 [ldiskfs] Nov 05 20:58:34 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 Nov 05 20:58:34 fir-md1-s1 kernel: [] ? ktime_get+0x52/0xe0 Nov 05 20:58:34 fir-md1-s1 kernel: [] ? kiblnd_post_tx_locked+0x7bb/0xa50 [ko2iblnd] Nov 05 20:58:34 fir-md1-s1 kernel: [] ldiskfs_map_blocks+0x98/0x700 [ldiskfs] Nov 05 20:58:34 fir-md1-s1 kernel: [] ? cfs_hash_bd_lookup_intent+0x63/0x170 [libcfs] Nov 05 20:58:34 fir-md1-s1 kernel: [] ? ktime_get_ts64+0x52/0xf0 Nov 05 20:58:34 fir-md1-s1 kernel: [] osd_ldiskfs_map_inode_pages+0x143/0x420 [osd_ldiskfs] Nov 05 20:58:34 fir-md1-s1 kernel: Lustre: 41481:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1573016313/real 1573016313] req@ffffa130bcec4c80 x1649331621536592/t0(0) o104->fir-MDT0000@10.9.117.35@o2ib4:15/16 lens 296/224 e 0 to 1 dl 1573016320 ref 1 fl Rpc:eX/2/ffffffff rc 0/-1 Nov 05 20:58:34 fir-md1-s1 kernel: Lustre: 41481:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 3945740 previous similar messages Nov 05 20:58:34 fir-md1-s1 kernel: [] osd_write_prep+0x2b6/0x360 [osd_ldiskfs] Nov 05 20:58:34 fir-md1-s1 kernel: [] mdt_obd_preprw+0x65b/0x10a0 [mdt] Nov 05 20:58:34 fir-md1-s1 kernel: [] tgt_brw_write+0xc7c/0x1cf0 [ptlrpc] Nov 05 20:58:34 fir-md1-s1 kernel: [] ? load_balance+0x1be/0x9a0 Nov 05 20:58:34 fir-md1-s1 kernel: [] ? tgt_lookup_reply+0x2d/0x190 [ptlrpc] Nov 05 20:58:34 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Nov 05 20:58:34 fir-md1-s1 kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] Nov 05 20:58:34 fir-md1-s1 kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs] Nov 05 20:58:34 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 20:58:34 fir-md1-s1 kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] Nov 05 20:58:34 fir-md1-s1 kernel: [] ? __wake_up+0x44/0x50 Nov 05 20:58:34 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 20:58:34 fir-md1-s1 kernel: [] ? __schedule+0x42a/0x860 Nov 05 20:58:34 fir-md1-s1 kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] Nov 05 20:58:34 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 20:58:34 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:34 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 20:58:34 fir-md1-s1 kernel: [] ? insert_kthread_work+0x40/0x40 Nov 05 20:58:34 fir-md1-s1 kernel: Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 80 b7 01 00 48 03 14 c5 e0 bf 54 bf 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b Nov 05 20:58:34 fir-md1-s1 kernel: LustreError: 41591:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffffa12350244c00 Nov 05 20:58:34 fir-md1-s1 kernel: LustreError: 41597:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffffa130a7948000 Nov 05 20:58:34 fir-md1-s1 kernel: LustreError: 41794:0:(ldlm_lib.c:3205:target_bulk_io()) @@@ bulk WRITE failed: rc -107 req@ffffa10e0b0ff850 x1648331041084176/t0(0) o4->4737d7cc-3e1f-a8cc-964f-c8d597fce061@10.8.27.25@o2ib6:194/0 lens 488/448 e 1 to 0 dl 1573016269 ref 1 fl Interpret:/0/0 rc 0/0 Nov 05 20:58:34 fir-md1-s1 kernel: LustreError: 41746:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffffa130bd3a5200 Nov 05 20:58:34 fir-md1-s1 kernel: LustreError: 41722:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffffa130cd03a600 Nov 05 20:58:34 fir-md1-s1 kernel: LustreError: 41568:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffffa106a9f21400 Nov 05 20:58:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO write error with 4737d7cc-3e1f-a8cc-964f-c8d597fce061 (at 10.8.27.25@o2ib6), client will retry: rc = -107 Nov 05 20:58:34 fir-md1-s1 kernel: Lustre: 41794:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (31:45s); client may timeout. req@ffffa10e0b0ff850 x1648331041084176/t0(0) o4->4737d7cc-3e1f-a8cc-964f-c8d597fce061@10.8.27.25@o2ib6:194/0 lens 488/448 e 1 to 0 dl 1573016269 ref 1 fl Complete:/0/ffffffff rc -107/-1 Nov 05 20:58:34 fir-md1-s1 kernel: Lustre: 41794:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 138 previous similar messages Nov 05 20:58:34 fir-md1-s1 kernel: Lustre: fir-MDT0000: Received new LWP connection from 10.0.10.53@o2ib7, removing former export from same NID Nov 05 20:58:34 fir-md1-s1 kernel: Lustre: Skipped 7 previous similar messages Nov 05 20:58:34 fir-md1-s1 kernel: LustreError: 19750:0:(events.c:450:server_bulk_callback()) event type 5, status -5, desc ffffa120ebd50200 Nov 05 20:58:34 fir-md1-s1 kernel: LustreError: 41746:0:(ldlm_lib.c:3246:target_bulk_io()) @@@ timeout on bulk WRITE after -70+70s req@ffffa136d8a31850 x1648688996494592/t0(0) o4->49b7ea94-4577-eca3-1515-b1c520941f2a@10.9.104.43@o2ib4:169/0 lens 488/448 e 0 to 0 dl 1573016244 ref 1 fl Interpret:/0/0 rc 0/0 Nov 05 20:58:34 fir-md1-s1 kernel: LustreError: 41626:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffffa1068a7c3c00 Nov 05 20:58:34 fir-md1-s1 kernel: LNet: 19748:0:(o2iblnd_cb.c:1510:kiblnd_reconnect_peer()) Abort reconnection of 10.0.10.102@o2ib7: accepting Nov 05 20:58:34 fir-md1-s1 kernel: LustreError: 41632:0:(events.c:450:server_bulk_callback()) event type 5, status -113, desc ffffa123674dd000 Nov 05 20:58:36 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.11.28@o2ib6, removing former export from same NID Nov 05 20:58:36 fir-md1-s1 kernel: Lustre: Skipped 116 previous similar messages Nov 05 20:58:37 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client fe87a86c-3664-505a-689c-812fcee93a05 (at 10.8.31.1@o2ib6) reconnecting Nov 05 20:58:37 fir-md1-s1 kernel: Lustre: Skipped 10 previous similar messages Nov 05 20:58:41 fir-md1-s1 kernel: Lustre: 41481:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:77s); client may timeout. req@ffffa130aac3f080 x1649308983802656/t555180001857(0) o36->e0e49ae1-2aa0-5676-77c0-53375f12b932@10.9.117.35@o2ib4:169/0 lens 512/424 e 0 to 0 dl 1573016244 ref 1 fl Complete:/0/0 rc 0/0 Nov 05 20:58:41 fir-md1-s1 kernel: Lustre: 41481:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 23 previous similar messages Nov 05 20:58:41 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.0.10.113@o2ib7, removing former export from same NID Nov 05 20:58:41 fir-md1-s1 kernel: Lustre: Skipped 177 previous similar messages Nov 05 20:58:51 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.21.27@o2ib6, removing former export from same NID Nov 05 20:58:51 fir-md1-s1 kernel: Lustre: Skipped 72 previous similar messages Nov 05 20:58:57 fir-md1-s1 kernel: LustreError: 21591:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 99s: evicting client at 10.8.27.25@o2ib6 ns: mdt-fir-MDT0000_UUID lock: ffffa11d579321c0/0x675683373fef52b lrc: 3/0,0 mode: CR/CR res: [0x200037afa:0x14037:0x0].0x0 bits 0x9/0x0 rrc: 5 type: IBT flags: 0x60200400000020 nid: 10.8.27.25@o2ib6 remote: 0xbd4f893938c4d07 expref: 545496 pid: 41053 timeout: 105614 lvb_type: 0 Nov 05 20:58:57 fir-md1-s1 kernel: LustreError: 41445:0:(ldlm_lockd.c:1348:ldlm_handle_enqueue0()) ### lock on destroyed export ffffa12dd9d05c00 ns: mdt-fir-MDT0000_UUID lock: ffffa136546798c0/0x675683373fef299 lrc: 3/0,0 mode: EX/EX res: [0x200037b02:0x9d3c:0x0].0x0 bits 0x8/0x0 rrc: 3 type: IBT flags: 0x50000000000000 nid: 10.9.102.22@o2ib4 remote: 0x52a42a6987723bed expref: 549272 pid: 41445 timeout: 0 lvb_type: 3 Nov 05 20:58:57 fir-md1-s1 kernel: Lustre: 41347:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:93s); client may timeout. req@ffffa11f09416300 x1649308983802064/t0(0) o101->e0e49ae1-2aa0-5676-77c0-53375f12b932@10.9.117.35@o2ib4:169/0 lens 480/536 e 0 to 0 dl 1573016244 ref 1 fl Complete:/0/0 rc -107/-107 Nov 05 20:58:57 fir-md1-s1 kernel: LustreError: 41445:0:(ldlm_lockd.c:1348:ldlm_handle_enqueue0()) Skipped 443 previous similar messages Nov 05 20:59:00 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#20 stuck for 22s! [swapper/20:0] Nov 05 20:59:00 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:59:00 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:59:00 fir-md1-s1 kernel: CPU: 20 PID: 0 Comm: swapper/20 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:59:00 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:59:00 fir-md1-s1 kernel: task: ffffa0ff69460000 ti: ffffa0ff69468000 task.ti: ffffa0ff69468000 Nov 05 20:59:00 fir-md1-s1 kernel: RIP: 0010:[] [] native_safe_halt+0xb/0x20 Nov 05 20:59:00 fir-md1-s1 kernel: RSP: 0018:ffffa0ff6946bea8 EFLAGS: 00000246 Nov 05 20:59:00 fir-md1-s1 kernel: RAX: ffffffffbef6cd70 RBX: 0000000000000202 RCX: 0100000000000000 Nov 05 20:59:00 fir-md1-s1 kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000046 Nov 05 20:59:00 fir-md1-s1 kernel: RBP: ffffa0ff6946bea8 R08: 0000000000000000 R09: 0000000000000001 Nov 05 20:59:00 fir-md1-s1 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffffa10e3ef5ab80 Nov 05 20:59:00 fir-md1-s1 kernel: NMI watchdog: BUG: soft lockup - CPU#41 stuck for 22s! [swapper/41:0] Nov 05 20:59:00 fir-md1-s1 kernel: Modules linked in: lustre(OE) mdc(OE) mgs(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lmv(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dell_rbu sunrpc vfat fat dm_round_robin amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul dcdbas glue_helper ablk_helper ses dm_multipath enclosure ipmi_si cryptd sg dm_mod ipmi_devintf pcspkr ccp k10temp i2c_piix4 ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx5_ib(OE) Nov 05 20:59:00 fir-md1-s1 kernel: ib_uverbs(OE) i2c_algo_bit ib_core(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops mlx5_core(OE) ttm libahci mlxfw(OE) devlink mpt3sas(OE) mlx_compat(OE) tg3 drm raid_class crct10dif_pclmul crct10dif_common ptp libata megaraid_sas scsi_transport_sas crc32c_intel drm_panel_orientation_quirks pps_core [last unloaded: mdc] Nov 05 20:59:00 fir-md1-s1 kernel: CPU: 41 PID: 0 Comm: swapper/41 Kdump: loaded Tainted: G OEL ------------ 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 Nov 05 20:59:00 fir-md1-s1 kernel: Hardware name: Dell Inc. PowerEdge R6415/065PKD, BIOS 1.10.6 08/15/2019 Nov 05 20:59:00 fir-md1-s1 kernel: task: ffffa10ee982d140 ti: ffffa10ee983c000 task.ti: ffffa10ee983c000 Nov 05 20:59:00 fir-md1-s1 kernel: RIP: 0010:[] [] native_safe_halt+0xb/0x20 Nov 05 20:59:00 fir-md1-s1 kernel: RSP: 0018:ffffa10ee983fea8 EFLAGS: 00000246 Nov 05 20:59:00 fir-md1-s1 kernel: RAX: ffffffffbef6cd70 RBX: 0000000000000202 RCX: 0100000000000000 Nov 05 20:59:00 fir-md1-s1 kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000046 Nov 05 20:59:00 fir-md1-s1 kernel: RBP: ffffa10ee983fea8 R08: 0000000000000000 R09: 0000000000000001 Nov 05 20:59:00 fir-md1-s1 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffffa11e3f89ab80 Nov 05 20:59:00 fir-md1-s1 kernel: R13: 0000000000000000 R14: ffffa10ee982d7d8 R15: 00000001e983fe08 Nov 05 20:59:00 fir-md1-s1 kernel: FS: 00007f3f2c704700(0000) GS:ffffa11e3f880000(0000) knlGS:0000000000000000 Nov 05 20:59:00 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:59:00 fir-md1-s1 kernel: CR2: 00007f3f2c7d8000 CR3: 000000364fa10000 CR4: 00000000003407e0 Nov 05 20:59:00 fir-md1-s1 kernel: Call Trace: Nov 05 20:59:00 fir-md1-s1 kernel: [] default_idle+0x1e/0xc0 Nov 05 20:59:00 fir-md1-s1 kernel: [] arch_cpu_idle+0x20/0xc0 Nov 05 20:59:00 fir-md1-s1 kernel: [] cpu_startup_entry+0x14a/0x1e0 Nov 05 20:59:00 fir-md1-s1 kernel: [] start_secondary+0x1f7/0x270 Nov 05 20:59:00 fir-md1-s1 kernel: [] start_cpu+0x5/0x14 Nov 05 20:59:00 fir-md1-s1 kernel: Code: 30 e9 7a ff ff ff 66 2e 0f 1f 84 00 00 00 00 00 0f 00 2d d5 af 09 00 e9 3d ff ff ff 0f 1f 40 00 55 48 89 e5 66 66 66 66 90 fb f4 <5d> c3 0f 1f 00 0f 00 2d 29 bf 09 00 eb f0 0f 1f 80 00 00 00 00 Nov 05 20:59:00 fir-md1-s1 kernel: R13: 0000000000000000 R14: ffffa0ff69460698 R15: 000000016946be08 Nov 05 20:59:00 fir-md1-s1 kernel: FS: 00007f5587c09880(0000) GS:ffffa10e3ef40000(0000) knlGS:0000000000000000 Nov 05 20:59:00 fir-md1-s1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 05 20:59:00 fir-md1-s1 kernel: CR2: 00007f3f2c7d8000 CR3: 000000101efc0000 CR4: 00000000003407e0 Nov 05 20:59:00 fir-md1-s1 kernel: Call Trace: Nov 05 20:59:00 fir-md1-s1 kernel: [] default_idle+0x1e/0xc0 Nov 05 20:59:00 fir-md1-s1 kernel: [] arch_cpu_idle+0x20/0xc0 Nov 05 20:59:00 fir-md1-s1 kernel: [] cpu_startup_entry+0x14a/0x1e0 Nov 05 20:59:00 fir-md1-s1 kernel: [] start_secondary+0x1f7/0x270 Nov 05 20:59:00 fir-md1-s1 kernel: [] start_cpu+0x5/0x14 Nov 05 20:59:00 fir-md1-s1 kernel: Code: 30 e9 7a ff ff ff 66 2e 0f 1f 84 00 00 00 00 00 0f 00 2d d5 af 09 00 e9 3d ff ff ff 0f 1f 40 00 55 48 89 e5 66 66 66 66 90 fb f4 <5d> c3 0f 1f 00 0f 00 2d 29 bf 09 00 eb f0 0f 1f 80 00 00 00 00 Nov 05 20:59:07 fir-md1-s1 kernel: LustreError: 41601:0:(ldlm_lib.c:3262:target_bulk_io()) @@@ network error on bulk WRITE req@ffffa113c83ad050 x1648297068145680/t0(0) o4->8f2648b4-4022-d79e-18a5-f850119b4e30@10.8.17.16@o2ib6:278/0 lens 488/448 e 2 to 0 dl 1573016353 ref 1 fl Interpret:/0/0 rc 0/0 Nov 05 20:59:07 fir-md1-s1 kernel: Lustre: fir-MDT0000: Bulk IO write error with 8f2648b4-4022-d79e-18a5-f850119b4e30 (at 10.8.17.16@o2ib6), client will retry: rc = -110 Nov 05 20:59:07 fir-md1-s1 kernel: Lustre: Skipped 9 previous similar messages Nov 05 20:59:09 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.9.109.23@o2ib4, removing former export from same NID Nov 05 20:59:09 fir-md1-s1 kernel: Lustre: Skipped 112 previous similar messages Nov 05 20:59:14 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 3936ad17-21b5-88d3-e6ad-12f20dd12ac6 (at 10.9.106.10@o2ib4) reconnecting Nov 05 20:59:14 fir-md1-s1 kernel: Lustre: Skipped 336 previous similar messages Nov 05 20:59:17 fir-md1-s1 kernel: LNet: 19748:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out tx for 10.0.10.203@o2ib7: 0 seconds Nov 05 20:59:17 fir-md1-s1 kernel: LNet: 19748:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Skipped 17 previous similar messages Nov 05 20:59:33 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to (at 10.9.108.31@o2ib4) Nov 05 20:59:33 fir-md1-s1 kernel: Lustre: Skipped 2740 previous similar messages Nov 05 20:59:53 fir-md1-s1 kernel: LustreError: 40955:0:(ldlm_lockd.c:1348:ldlm_handle_enqueue0()) ### lock on destroyed export ffffa12dd9d05c00 ns: mdt-fir-MDT0000_UUID lock: ffffa139f9219d40/0x675683373fef690 lrc: 3/0,0 mode: EX/EX res: [0x200037b02:0x9d3d:0x0].0x0 bits 0x8/0x0 rrc: 3 type: IBT flags: 0x50000000000000 nid: 10.9.102.22@o2ib4 remote: 0x52a42a6987723bfb expref: 307918 pid: 40955 timeout: 0 lvb_type: 3 Nov 05 20:59:53 fir-md1-s1 kernel: Lustre: 40955:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:149s); client may timeout. req@ffffa130bb3fba80 x1648886750820768/t555180814354(0) o101->40df94b7-4e65-3458-a595-b9607572f9d8@10.9.102.22@o2ib4:169/0 lens 376/1568 e 0 to 0 dl 1573016244 ref 1 fl Complete:/0/0 rc -107/-107 Nov 05 20:59:53 fir-md1-s1 kernel: Lustre: 40955:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 5 previous similar messages Nov 05 21:00:02 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.9.114.15@o2ib4, removing former export from same NID Nov 05 21:00:02 fir-md1-s1 kernel: Lustre: Skipped 822 previous similar messages Nov 05 21:00:38 fir-md1-s1 kernel: LNet: Service thread pid 41314 was inactive for 200.27s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Nov 05 21:00:38 fir-md1-s1 kernel: Pid: 41314, comm: mdt01_067 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 SMP Mon Aug 5 15:28:37 PDT 2019 Nov 05 21:00:38 fir-md1-s1 kernel: Call Trace: Nov 05 21:00:38 fir-md1-s1 kernel: [] ldlm_completion_ast+0x430/0x860 [ptlrpc] Nov 05 21:00:38 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] Nov 05 21:00:38 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Nov 05 21:00:38 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x360 [mdt] Nov 05 21:00:38 fir-md1-s1 kernel: [] mdt_layout_change+0x2a4/0x430 [mdt] Nov 05 21:00:38 fir-md1-s1 kernel: [] mdt_intent_layout+0x7ee/0xcc0 [mdt] Nov 05 21:00:38 fir-md1-s1 kernel: [] mdt_intent_policy+0x435/0xd80 [mdt] Nov 05 21:00:38 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x356/0xa20 [ptlrpc] Nov 05 21:00:38 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa56/0x15f0 [ptlrpc] Nov 05 21:00:38 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Nov 05 21:00:38 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Nov 05 21:00:38 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 21:00:38 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 21:00:38 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 21:00:38 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 21:00:38 fir-md1-s1 kernel: [] 0xffffffffffffffff Nov 05 21:00:38 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1573016438.41314 Nov 05 21:00:39 fir-md1-s1 kernel: LNet: Service thread pid 41089 was inactive for 201.03s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Nov 05 21:00:39 fir-md1-s1 kernel: Pid: 41089, comm: mdt02_021 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 SMP Mon Aug 5 15:28:37 PDT 2019 Nov 05 21:00:39 fir-md1-s1 kernel: Call Trace: Nov 05 21:00:39 fir-md1-s1 kernel: [] ldlm_completion_ast+0x430/0x860 [ptlrpc] Nov 05 21:00:39 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] Nov 05 21:00:39 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Nov 05 21:00:39 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x360 [mdt] Nov 05 21:00:39 fir-md1-s1 kernel: [] mdt_layout_change+0x2a4/0x430 [mdt] Nov 05 21:00:39 fir-md1-s1 kernel: [] mdt_intent_layout+0x7ee/0xcc0 [mdt] Nov 05 21:00:39 fir-md1-s1 kernel: [] mdt_intent_policy+0x435/0xd80 [mdt] Nov 05 21:00:39 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x356/0xa20 [ptlrpc] Nov 05 21:00:39 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa56/0x15f0 [ptlrpc] Nov 05 21:00:39 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Nov 05 21:00:39 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Nov 05 21:00:39 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 21:00:39 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 21:00:39 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 21:00:39 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 21:00:39 fir-md1-s1 kernel: [] 0xffffffffffffffff Nov 05 21:00:39 fir-md1-s1 kernel: Pid: 41321, comm: mdt03_066 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 SMP Mon Aug 5 15:28:37 PDT 2019 Nov 05 21:00:39 fir-md1-s1 kernel: Call Trace: Nov 05 21:00:39 fir-md1-s1 kernel: [] ldlm_completion_ast+0x430/0x860 [ptlrpc] Nov 05 21:00:39 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] Nov 05 21:00:39 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Nov 05 21:00:39 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x360 [mdt] Nov 05 21:00:39 fir-md1-s1 kernel: [] mdt_layout_change+0x2a4/0x430 [mdt] Nov 05 21:00:39 fir-md1-s1 kernel: [] mdt_intent_layout+0x7ee/0xcc0 [mdt] Nov 05 21:00:39 fir-md1-s1 kernel: [] mdt_intent_policy+0x435/0xd80 [mdt] Nov 05 21:00:39 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x356/0xa20 [ptlrpc] Nov 05 21:00:39 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa56/0x15f0 [ptlrpc] Nov 05 21:00:39 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Nov 05 21:00:39 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Nov 05 21:00:39 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 21:00:39 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 21:00:39 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 21:00:39 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 21:00:39 fir-md1-s1 kernel: [] 0xffffffffffffffff Nov 05 21:00:49 fir-md1-s1 kernel: LustreError: 41314:0:(ldlm_lockd.c:1348:ldlm_handle_enqueue0()) ### lock on destroyed export ffffa12e32504000 ns: mdt-fir-MDT0000_UUID lock: ffffa11cbfa8d7c0/0x675683373fefa9c lrc: 3/0,0 mode: EX/EX res: [0x200037afa:0x14037:0x0].0x0 bits 0x8/0x0 rrc: 3 type: IBT flags: 0x50000000000000 nid: 10.8.27.25@o2ib6 remote: 0xbd4f893938c4d2a expref: 201791 pid: 41314 timeout: 0 lvb_type: 3 Nov 05 21:00:49 fir-md1-s1 kernel: LustreError: 41314:0:(ldlm_lockd.c:1348:ldlm_handle_enqueue0()) Skipped 1 previous similar message Nov 05 21:00:49 fir-md1-s1 kernel: Lustre: 41314:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:205s); client may timeout. req@ffffa114dbf27980 x1648331041083952/t555182678786(0) o101->4737d7cc-3e1f-a8cc-964f-c8d597fce061@10.8.27.25@o2ib6:169/0 lens 376/1568 e 0 to 0 dl 1573016244 ref 1 fl Complete:/0/0 rc -107/-107 Nov 05 21:00:49 fir-md1-s1 kernel: Lustre: 41314:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 1 previous similar message Nov 05 21:00:49 fir-md1-s1 kernel: LNet: Service thread pid 41314 completed after 211.66s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Nov 05 21:01:52 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client c5f752ad-463e-c0fc-bbb6-5b29206ddbd4 (at 10.9.109.53@o2ib4) reconnecting Nov 05 21:01:52 fir-md1-s1 kernel: Lustre: Skipped 1024 previous similar messages Nov 05 21:02:18 fir-md1-s1 kernel: LustreError: 41089:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1573016238, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffffa12b41ed5a00/0x675683373fed6fb lrc: 3/0,1 mode: --/EX res: [0x200037af8:0x1f1cf:0x0].0x0 bits 0x8/0x0 rrc: 5 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 41089 timeout: 0 lvb_type: 0 Nov 05 21:02:18 fir-md1-s1 kernel: LustreError: 41089:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 1 previous similar message Nov 05 21:02:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.17.15@o2ib6, removing former export from same NID Nov 05 21:02:40 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Nov 05 21:02:40 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.8.17.15@o2ib6) Nov 05 21:02:40 fir-md1-s1 kernel: Lustre: Skipped 50 previous similar messages Nov 05 21:02:44 fir-md1-s1 kernel: LustreError: 41601:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0000: cli 40df94b7-4e65-3458-a595-b9607572f9d8 claims 69632 GRANT, real grant 0 Nov 05 21:02:44 fir-md1-s1 kernel: LustreError: 41601:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 62069 previous similar messages Nov 05 21:03:19 fir-md1-s1 kernel: LustreError: 41089:0:(ldlm_lockd.c:1348:ldlm_handle_enqueue0()) ### lock on destroyed export ffffa12e3865ec00 ns: mdt-fir-MDT0000_UUID lock: ffffa12b41ed45c0/0x675683373fed6f4 lrc: 3/0,0 mode: EX/EX res: [0x200037af8:0x1f1cf:0x0].0x0 bits 0x8/0x0 rrc: 3 type: IBT flags: 0x50000000000000 nid: 10.9.116.14@o2ib4 remote: 0x5476b7483e05a6a9 expref: 130906 pid: 41089 timeout: 0 lvb_type: 3 Nov 05 21:03:19 fir-md1-s1 kernel: Lustre: 41089:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (6:355s); client may timeout. req@ffffa11f09410000 x1649308741475792/t555188570730(0) o101->e5f223a6-6328-6c0e-211f-85c2a9cfaa07@10.9.116.14@o2ib4:169/0 lens 376/1568 e 0 to 0 dl 1573016244 ref 1 fl Complete:/0/0 rc -107/-107 Nov 05 21:03:19 fir-md1-s1 kernel: LNet: Service thread pid 41089 completed after 361.39s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Nov 05 21:03:31 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 4737d7cc-3e1f-a8cc-964f-c8d597fce061 (at 10.8.27.25@o2ib6) in 227 seconds. I think it's dead, and I am evicting it. exp ffffa10fdf1bf000, cur 1573016611 expire 1573016461 last 1573016384 Nov 05 21:03:31 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 21:05:15 fir-md1-s1 kernel: LNet: Service thread pid 41321 completed after 477.61s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Nov 05 21:05:56 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client e92d47ed-91db-5109-317b-b77bbc04ae60 (at 10.9.109.19@o2ib4) reconnecting Nov 05 21:06:33 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 21:06:33 fir-md1-s1 kernel: LustreError: Skipped 21 previous similar messages Nov 05 21:06:49 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client e4afa95b-e7ac-30df-de3d-de81555307ba (at 10.9.115.5@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffffa12e146aa000, cur 1573016809 expire 1573016659 last 1573016582 Nov 05 21:06:49 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 21:12:44 fir-md1-s1 kernel: LustreError: 41570:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0000: cli 9484b05f-bbba-6023-e820-67c71f8b4c9f claims 155648 GRANT, real grant 0 Nov 05 21:12:44 fir-md1-s1 kernel: LustreError: 41570:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 294158 previous similar messages Nov 05 21:16:35 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 21:16:35 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 21:19:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client b28c253e-3041-3544-86e5-3ee759d202d3 (at 10.9.109.24@o2ib4) reconnecting Nov 05 21:19:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to (at 10.9.109.24@o2ib4) Nov 05 21:19:45 fir-md1-s1 kernel: Lustre: Skipped 3 previous similar messages Nov 05 21:22:44 fir-md1-s1 kernel: LustreError: 41729:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0000: cli 80e6cf50-c677-9de5-3678-2d22f1110390 claims 147456 GRANT, real grant 0 Nov 05 21:22:44 fir-md1-s1 kernel: LustreError: 41729:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 303655 previous similar messages Nov 05 21:25:22 fir-md1-s1 kernel: Lustre: MGS: Connection restored to (at 10.9.115.5@o2ib4) Nov 05 21:26:37 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 21:26:37 fir-md1-s1 kernel: LustreError: Skipped 23 previous similar messages Nov 05 21:32:23 fir-md1-s1 kernel: LustreError: 21591:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 149s: evicting client at 10.9.117.18@o2ib4 ns: mdt-fir-MDT0000_UUID lock: ffffa115a9cc0d80/0x6756833d051f8f9 lrc: 3/0,0 mode: PW/PW res: [0x200037b41:0xf3f5:0x0].0x0 bits 0x40/0x0 rrc: 701 type: IBT flags: 0x60200400000020 nid: 10.9.117.18@o2ib4 remote: 0x1103a394115afc78 expref: 565 pid: 41367 timeout: 107620 lvb_type: 0 Nov 05 21:32:23 fir-md1-s1 kernel: LustreError: 21591:0:(ldlm_lockd.c:256:expired_lock_main()) Skipped 8 previous similar messages Nov 05 21:32:23 fir-md1-s1 kernel: LustreError: 41164:0:(ldlm_lockd.c:1348:ldlm_handle_enqueue0()) ### lock on destroyed export ffffa12dd9d07c00 ns: mdt-fir-MDT0000_UUID lock: ffffa128ad2045c0/0x6756833d051f9b6 lrc: 3/0,0 mode: PW/PW res: [0x200037b41:0xf3f5:0x0].0x0 bits 0x40/0x0 rrc: 694 type: IBT flags: 0x50200400000020 nid: 10.9.117.18@o2ib4 remote: 0x1103a394115afc7f expref: 45 pid: 41164 timeout: 0 lvb_type: 0 Nov 05 21:32:23 fir-md1-s1 kernel: LustreError: 41164:0:(ldlm_lockd.c:1348:ldlm_handle_enqueue0()) Skipped 1 previous similar message Nov 05 21:33:14 fir-md1-s1 kernel: LNet: Service thread pid 41082 was inactive for 200.38s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Nov 05 21:33:14 fir-md1-s1 kernel: LNet: Skipped 1 previous similar message Nov 05 21:33:14 fir-md1-s1 kernel: Pid: 41082, comm: mdt02_019 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 SMP Mon Aug 5 15:28:37 PDT 2019 Nov 05 21:33:14 fir-md1-s1 kernel: Call Trace: Nov 05 21:33:14 fir-md1-s1 kernel: [] ldlm_completion_ast+0x430/0x860 [ptlrpc] Nov 05 21:33:14 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] Nov 05 21:33:14 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Nov 05 21:33:14 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x360 [mdt] Nov 05 21:33:14 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Nov 05 21:33:14 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Nov 05 21:33:14 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Nov 05 21:33:14 fir-md1-s1 kernel: [] mdt_intent_policy+0x435/0xd80 [mdt] Nov 05 21:33:14 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x356/0xa20 [ptlrpc] Nov 05 21:33:14 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa56/0x15f0 [ptlrpc] Nov 05 21:33:14 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Nov 05 21:33:14 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Nov 05 21:33:14 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 21:33:14 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 21:33:14 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 21:33:14 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 21:33:14 fir-md1-s1 kernel: [] 0xffffffffffffffff Nov 05 21:33:14 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1573018394.41082 Nov 05 21:33:15 fir-md1-s1 kernel: LNet: Service thread pid 41366 was inactive for 201.17s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Nov 05 21:33:15 fir-md1-s1 kernel: Pid: 41366, comm: mdt03_077 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 SMP Mon Aug 5 15:28:37 PDT 2019 Nov 05 21:33:15 fir-md1-s1 kernel: Call Trace: Nov 05 21:33:15 fir-md1-s1 kernel: [] ldlm_completion_ast+0x430/0x860 [ptlrpc] Nov 05 21:33:15 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] Nov 05 21:33:15 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Nov 05 21:33:15 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x360 [mdt] Nov 05 21:33:15 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Nov 05 21:33:15 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Nov 05 21:33:15 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Nov 05 21:33:15 fir-md1-s1 kernel: [] mdt_intent_policy+0x435/0xd80 [mdt] Nov 05 21:33:15 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x356/0xa20 [ptlrpc] Nov 05 21:33:15 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa56/0x15f0 [ptlrpc] Nov 05 21:33:15 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Nov 05 21:33:15 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Nov 05 21:33:15 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 21:33:15 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 21:33:15 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 21:33:15 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 21:33:15 fir-md1-s1 kernel: [] 0xffffffffffffffff Nov 05 21:33:15 fir-md1-s1 kernel: Pid: 41471, comm: mdt03_098 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 SMP Mon Aug 5 15:28:37 PDT 2019 Nov 05 21:33:15 fir-md1-s1 kernel: Call Trace: Nov 05 21:33:15 fir-md1-s1 kernel: [] ldlm_completion_ast+0x430/0x860 [ptlrpc] Nov 05 21:33:15 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] Nov 05 21:33:15 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Nov 05 21:33:15 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x360 [mdt] Nov 05 21:33:15 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Nov 05 21:33:15 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Nov 05 21:33:15 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Nov 05 21:33:15 fir-md1-s1 kernel: [] mdt_intent_policy+0x435/0xd80 [mdt] Nov 05 21:33:15 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x356/0xa20 [ptlrpc] Nov 05 21:33:15 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa56/0x15f0 [ptlrpc] Nov 05 21:33:15 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Nov 05 21:33:15 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Nov 05 21:33:15 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 21:33:15 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 21:33:15 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 21:33:15 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 21:33:15 fir-md1-s1 kernel: [] 0xffffffffffffffff Nov 05 21:33:15 fir-md1-s1 kernel: Pid: 40955, comm: mdt03_004 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 SMP Mon Aug 5 15:28:37 PDT 2019 Nov 05 21:33:15 fir-md1-s1 kernel: Call Trace: Nov 05 21:33:15 fir-md1-s1 kernel: [] ldlm_completion_ast+0x430/0x860 [ptlrpc] Nov 05 21:33:15 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] Nov 05 21:33:15 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Nov 05 21:33:15 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x360 [mdt] Nov 05 21:33:15 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Nov 05 21:33:15 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Nov 05 21:33:15 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Nov 05 21:33:15 fir-md1-s1 kernel: [] mdt_intent_policy+0x435/0xd80 [mdt] Nov 05 21:33:15 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x356/0xa20 [ptlrpc] Nov 05 21:33:15 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa56/0x15f0 [ptlrpc] Nov 05 21:33:15 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Nov 05 21:33:15 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Nov 05 21:33:15 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 21:33:15 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 21:33:15 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 21:33:15 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 21:33:15 fir-md1-s1 kernel: [] 0xffffffffffffffff Nov 05 21:33:15 fir-md1-s1 kernel: Pid: 41146, comm: mdt01_033 3.10.0-957.27.2.el7_lustre.pl1.x86_64 #1 SMP Mon Aug 5 15:28:37 PDT 2019 Nov 05 21:33:15 fir-md1-s1 kernel: Call Trace: Nov 05 21:33:15 fir-md1-s1 kernel: [] ldlm_completion_ast+0x430/0x860 [ptlrpc] Nov 05 21:33:15 fir-md1-s1 kernel: [] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] Nov 05 21:33:15 fir-md1-s1 kernel: [] mdt_object_local_lock+0x50b/0xb20 [mdt] Nov 05 21:33:15 fir-md1-s1 kernel: [] mdt_object_lock_internal+0x70/0x360 [mdt] Nov 05 21:33:15 fir-md1-s1 kernel: [] mdt_object_lock+0x20/0x30 [mdt] Nov 05 21:33:15 fir-md1-s1 kernel: [] mdt_brw_enqueue+0x44b/0x760 [mdt] Nov 05 21:33:15 fir-md1-s1 kernel: [] mdt_intent_brw+0x1f/0x30 [mdt] Nov 05 21:33:15 fir-md1-s1 kernel: [] mdt_intent_policy+0x435/0xd80 [mdt] Nov 05 21:33:15 fir-md1-s1 kernel: [] ldlm_lock_enqueue+0x356/0xa20 [ptlrpc] Nov 05 21:33:15 fir-md1-s1 kernel: [] ldlm_handle_enqueue0+0xa56/0x15f0 [ptlrpc] Nov 05 21:33:15 fir-md1-s1 kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc] Nov 05 21:33:15 fir-md1-s1 kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc] Nov 05 21:33:15 fir-md1-s1 kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] Nov 05 21:33:15 fir-md1-s1 kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc] Nov 05 21:33:15 fir-md1-s1 kernel: [] kthread+0xd1/0xe0 Nov 05 21:33:15 fir-md1-s1 kernel: [] ret_from_fork_nospec_begin+0xe/0x21 Nov 05 21:33:15 fir-md1-s1 kernel: [] 0xffffffffffffffff Nov 05 21:33:15 fir-md1-s1 kernel: LNet: Service thread pid 41192 was inactive for 201.73s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Nov 05 21:33:15 fir-md1-s1 kernel: LNet: Skipped 1 previous similar message Nov 05 21:34:36 fir-md1-s1 kernel: LustreError: 41690:0:(tgt_grant.c:758:tgt_grant_check()) fir-MDT0000: cli d3880b5c-b72c-0e9f-b18a-d6299f066ebd claims 36864 GRANT, real grant 0 Nov 05 21:34:36 fir-md1-s1 kernel: LustreError: 41690:0:(tgt_grant.c:758:tgt_grant_check()) Skipped 94511 previous similar messages Nov 05 21:34:40 fir-md1-s1 kernel: Lustre: MGS: Received new LWP connection from 10.8.27.35@o2ib6, removing former export from same NID Nov 05 21:34:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client 67360d0f-602d-e0fd-a763-b6dc0eec238b (at 10.8.27.35@o2ib6) reconnecting Nov 05 21:34:40 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to (at 10.8.27.35@o2ib6) Nov 05 21:34:40 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 21:34:40 fir-md1-s1 kernel: Lustre: Skipped 2 previous similar messages Nov 05 21:34:54 fir-md1-s1 kernel: LustreError: 41441:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1573018193, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffffa1365b3457c0/0x6756833d051fc6b lrc: 3/0,1 mode: --/PW res: [0x200037b41:0xf3f5:0x0].0x0 bits 0x40/0x0 rrc: 698 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 41441 timeout: 0 lvb_type: 0 Nov 05 21:34:54 fir-md1-s1 kernel: LustreError: 41441:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 134 previous similar messages Nov 05 21:37:23 fir-md1-s1 kernel: LustreError: 41487:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1573018343, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-fir-MDT0000_UUID lock: ffffa114bb7fb600/0x6756833d0d4aef1 lrc: 3/0,1 mode: --/PW res: [0x200037b41:0xf3f5:0x0].0x0 bits 0x40/0x0 rrc: 698 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 41487 timeout: 0 lvb_type: 0 Nov 05 21:37:23 fir-md1-s1 kernel: LustreError: 41487:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 137 previous similar messages Nov 05 21:37:35 fir-md1-s1 kernel: LNet: Service thread pid 41330 was inactive for 312.51s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Nov 05 21:37:35 fir-md1-s1 kernel: LNet: Skipped 315 previous similar messages Nov 05 21:37:35 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1573018655.41330 Nov 05 21:37:38 fir-md1-s1 kernel: LNet: Service thread pid 41256 was inactive for 314.54s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. Nov 05 21:37:38 fir-md1-s1 kernel: LNet: Skipped 6 previous similar messages Nov 05 21:37:38 fir-md1-s1 kernel: LustreError: dumping log to /tmp/lustre-log.1573018658.41256 Nov 05 21:38:44 fir-md1-s1 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.11.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Nov 05 21:38:44 fir-md1-s1 kernel: LustreError: Skipped 7 previous similar messages Nov 05 21:39:48 fir-md1-s1 kernel: Lustre: 41098:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffffa12fbd009b00 x1649307076459024/t0(0) o101->c5605c5f-5f91-c28c-3550-34c356d6baea@10.9.117.22@o2ib4:453/0 lens 480/568 e 24 to 0 dl 1573018793 ref 2 fl Interpret:/0/0 rc 0/0 Nov 05 21:39:49 fir-md1-s1 kernel: Lustre: 41143:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffffa12fbce16780 x1648687863523968/t0(0) o101->84fe4322-0bf9-6fc7-4c46-ef148bf26b79@10.9.104.20@o2ib4:454/0 lens 696/0 e 24 to 0 dl 1573018794 ref 2 fl New:/0/ffffffff rc 0/-1 Nov 05 21:39:49 fir-md1-s1 kernel: Lustre: 41143:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 592 previous similar messages Nov 05 21:39:52 fir-md1-s1 kernel: Lustre: 41098:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffffa135e7340d80 x1649295130319968/t0(0) o101->d7b23752-f1f0-8c4b-6c13-f8cb8f537c71@10.9.109.62@o2ib4:457/0 lens 1784/0 e 24 to 0 dl 1573018797 ref 2 fl New:/0/ffffffff rc 0/-1 Nov 05 21:39:52 fir-md1-s1 kernel: Lustre: 41098:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 228 previous similar messages Nov 05 21:39:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to (at 10.9.117.22@o2ib4) Nov 05 21:39:54 fir-md1-s1 kernel: Lustre: Skipped 5 previous similar messages Nov 05 21:39:57 fir-md1-s1 kernel: Lustre: 41393:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffffa11a8c15a880 x1649039388268048/t0(0) o101->262affea-6f08-6e05-c2e8-d629eeb38f83@10.9.107.19@o2ib4:462/0 lens 576/0 e 24 to 0 dl 1573018802 ref 2 fl New:/0/ffffffff rc 0/-1 Nov 05 21:39:57 fir-md1-s1 kernel: Lustre: 41393:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 35 previous similar messages Nov 05 21:40:06 fir-md1-s1 kernel: Lustre: 41393:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffffa119cb046c00 x1649295862870064/t0(0) o101->5ebbffd8-95c2-3ef5-84d0-408c87dbc1da@10.9.104.24@o2ib4:471/0 lens 1792/0 e 24 to 0 dl 1573018811 ref 2 fl New:/0/ffffffff rc 0/-1 Nov 05 21:40:06 fir-md1-s1 kernel: Lustre: 41393:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 17 previous similar messages Nov 05 21:40:25 fir-md1-s1 kernel: Lustre: 41393:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffffa115bd37ec00 x1649382136962512/t0(0) o101->3b0a89b7-cc70-d975-2d52-92896d01d45c@10.8.27.23@o2ib6:490/0 lens 480/0 e 12 to 0 dl 1573018830 ref 2 fl New:/0/ffffffff rc 0/-1 Nov 05 21:40:25 fir-md1-s1 kernel: Lustre: 41393:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 25 previous similar messages Nov 05 21:41:06 fir-md1-s1 kernel: Lustre: 41393:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffffa10f7fe08d80 x1649382136971648/t0(0) o101->3b0a89b7-cc70-d975-2d52-92896d01d45c@10.8.27.23@o2ib6:531/0 lens 376/0 e 6 to 0 dl 1573018871 ref 2 fl New:/0/ffffffff rc 0/-1 Nov 05 21:41:06 fir-md1-s1 kernel: Lustre: 41393:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 28 previous similar messages Nov 05 21:42:23 fir-md1-s1 kernel: Lustre: 41486:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffffa12fc36b5580 x1649321229142592/t0(0) o101->d9a680cf-f48f-730f-ae55-619c940ab227@10.9.110.46@o2ib4:608/0 lens 576/0 e 3 to 0 dl 1573018948 ref 2 fl New:/0/ffffffff rc 0/-1 Nov 05 21:42:23 fir-md1-s1 kernel: Lustre: 41486:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 45 previous similar messages Nov 05 21:43:58 fir-md1-s1 kernel: Lustre: fir-MDT0000: haven't heard from client 49a7bb91-1e44-9061-7b3f-d5e25fd318ce (at 10.9.106.28@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffffa12d80fe9800, cur 1573019038 expire 1573018888 last 1573018811 Nov 05 21:43:58 fir-md1-s1 kernel: Lustre: Skipped 1 previous similar message Nov 05 21:44:01 fir-md1-s1 kernel: Lustre: MGS: haven't heard from client c9aa50d8-5574-cde7-2d2f-ea1dd8919dad (at 10.9.106.28@o2ib4) in 227 seconds. I think it's dead, and I am evicting it. exp ffffa13dca386800, cur 1573019041 expire 1573018891 last 1573018814 Nov 05 21:44:45 fir-md1-s1 kernel: Lustre: fir-MDT0000: Client a3f851e0-b0cd-6d0d-5953-41c767cde0e1 (at 10.9.101.67@o2ib4) reconnecting Nov 05 21:44:45 fir-md1-s1 kernel: Lustre: Skipped 198 previous similar messages Nov 05 21:44:53 fir-md1-s1 kernel: Lustre: 41119:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply req@ffffa13b107af500 x1648397069834720/t0(0) o101->abaf865a-8fcb-451b-e3df-c50916747fa5@10.8.27.11@o2ib6:3/0 lens 376/0 e 1 to 0 dl 1573019098 ref 2 fl New:/0/ffffffff rc 0/-1 Nov 05 21:44:53 fir-md1-s1 kernel: Lustre: 41119:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 55 previous similar messages Nov 05 21:44:54 fir-md1-s1 kernel: Lustre: fir-MDT0000: Connection restored to (at 10.8.20.24@o2ib6) Nov 05 21:44:54 fir-md1-s1 kernel: Lustre: Skipped 195 previous similar messages