Sep 05 05:47:25 fir-md1-s2 kernel: LNet: HW NUMA nodes: 4, HW CPU cores: 48, npartitions: 4 Sep 05 05:47:25 fir-md1-s2 kernel: alg: No test for adler32 (adler32-zlib) Sep 05 05:47:25 fir-md1-s2 kernel: Lustre: Lustre: Build Version: 2.12.2_119_g2d4809a Sep 05 05:47:26 fir-md1-s2 kernel: LNet: 39136:0:(config.c:1626:lnet_inet_enumerate()) lnet: Ignoring interface em2: it's down Sep 05 05:47:26 fir-md1-s2 kernel: LNet: Using FastReg for registration Sep 05 05:47:26 fir-md1-s2 kernel: LNet: Added LNI 10.0.10.52@o2ib7 [8/256/0/180] Sep 05 05:47:27 fir-md1-s2 kernel: LNet: 39180:0:(o2iblnd_cb.c:3381:kiblnd_check_conns()) Timed out tx for 10.0.10.202@o2ib7: 3890 seconds Sep 05 05:48:37 fir-md1-s2 kernel: LDISKFS-fs (dm-3): file extents enabled, maximum tree depth=5 Sep 05 05:48:37 fir-md1-s2 kernel: LDISKFS-fs (dm-1): file extents enabled, maximum tree depth=5 Sep 05 05:48:37 fir-md1-s2 kernel: LDISKFS-fs (dm-3): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,acl,no_mbcache,nodelalloc Sep 05 05:48:37 fir-md1-s2 kernel: LDISKFS-fs (dm-1): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,acl,no_mbcache,nodelalloc Sep 05 05:48:37 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.8.18.31@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Sep 05 05:48:37 fir-md1-s2 kernel: LustreError: Skipped 1 previous similar message Sep 05 05:48:38 fir-md1-s2 kernel: LustreError: 11-0: fir-OST000e-osc-MDT0001: operation ost_connect to node 10.0.10.103@o2ib7 failed: rc = -16 Sep 05 05:48:38 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.18.30@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Sep 05 05:48:38 fir-md1-s2 kernel: LustreError: Skipped 8 previous similar messages Sep 05 05:48:39 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.8.7.15@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Sep 05 05:48:39 fir-md1-s2 kernel: LustreError: Skipped 40 previous similar messages Sep 05 05:48:41 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.8.23.9@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Sep 05 05:48:41 fir-md1-s2 kernel: LustreError: Skipped 212 previous similar messages Sep 05 05:48:45 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.9.102.72@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Sep 05 05:48:45 fir-md1-s2 kernel: LustreError: Skipped 434 previous similar messages Sep 05 05:48:53 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.8.23.1@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Sep 05 05:48:53 fir-md1-s2 kernel: LustreError: Skipped 1066 previous similar messages Sep 05 05:49:03 fir-md1-s2 kernel: LustreError: 11-0: fir-OST002a-osc-MDT0001: operation ost_connect to node 10.0.10.107@o2ib7 failed: rc = -16 Sep 05 05:49:03 fir-md1-s2 kernel: LustreError: Skipped 47 previous similar messages Sep 05 05:49:09 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.8.27.28@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Sep 05 05:49:09 fir-md1-s2 kernel: LustreError: Skipped 195 previous similar messages Sep 05 05:49:28 fir-md1-s2 kernel: LustreError: 11-0: fir-OST002a-osc-MDT0001: operation ost_connect to node 10.0.10.107@o2ib7 failed: rc = -16 Sep 05 05:49:28 fir-md1-s2 kernel: LustreError: Skipped 46 previous similar messages Sep 05 05:49:41 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.9.104.37@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Sep 05 05:49:41 fir-md1-s2 kernel: LustreError: Skipped 1741 previous similar messages Sep 05 05:49:53 fir-md1-s2 kernel: LustreError: 11-0: fir-OST000e-osc-MDT0001: operation ost_connect to node 10.0.10.103@o2ib7 failed: rc = -16 Sep 05 05:49:53 fir-md1-s2 kernel: LustreError: Skipped 35 previous similar messages Sep 05 05:50:18 fir-md1-s2 kernel: LustreError: 11-0: fir-OST000e-osc-MDT0001: operation ost_connect to node 10.0.10.103@o2ib7 failed: rc = -16 Sep 05 05:50:18 fir-md1-s2 kernel: LustreError: Skipped 35 previous similar messages Sep 05 05:50:43 fir-md1-s2 kernel: LustreError: 11-0: fir-OST000c-osc-MDT0001: operation ost_connect to node 10.0.10.103@o2ib7 failed: rc = -16 Sep 05 05:50:43 fir-md1-s2 kernel: LustreError: Skipped 35 previous similar messages Sep 05 05:50:45 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0000_UUID: not available for connect from 10.9.102.36@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Sep 05 05:50:45 fir-md1-s2 kernel: LustreError: Skipped 3230 previous similar messages Sep 05 05:51:08 fir-md1-s2 kernel: LustreError: 11-0: fir-OST000e-osc-MDT0001: operation ost_connect to node 10.0.10.103@o2ib7 failed: rc = -16 Sep 05 05:51:08 fir-md1-s2 kernel: LustreError: Skipped 35 previous similar messages Sep 05 05:51:58 fir-md1-s2 kernel: LustreError: 11-0: fir-OST002e-osc-MDT0001: operation ost_connect to node 10.0.10.107@o2ib7 failed: rc = -16 Sep 05 05:51:58 fir-md1-s2 kernel: LustreError: Skipped 15 previous similar messages Sep 05 05:52:32 fir-md1-s2 kernel: LNet: 39180:0:(o2iblnd_cb.c:3381:kiblnd_check_conns()) Timed out tx for 10.0.10.202@o2ib7: 0 seconds Sep 05 05:52:32 fir-md1-s2 kernel: LNet: 39180:0:(o2iblnd_cb.c:3381:kiblnd_check_conns()) Skipped 1 previous similar message Sep 05 05:52:37 fir-md1-s2 kernel: INFO: task mount.lustre:39383 blocked for more than 120 seconds. Sep 05 05:52:37 fir-md1-s2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Sep 05 05:52:37 fir-md1-s2 kernel: mount.lustre D ffff8fd0c5af1040 0 39383 39382 0x00000082 Sep 05 05:52:37 fir-md1-s2 kernel: Call Trace: Sep 05 05:52:37 fir-md1-s2 kernel: [] schedule_preempt_disabled+0x29/0x70 Sep 05 05:52:37 fir-md1-s2 kernel: [] __mutex_lock_slowpath+0xc7/0x1d0 Sep 05 05:52:37 fir-md1-s2 kernel: [] mutex_lock+0x1f/0x2f Sep 05 05:52:37 fir-md1-s2 kernel: [] mgc_set_info_async+0xa98/0x15f0 [mgc] Sep 05 05:52:37 fir-md1-s2 kernel: [] ? libcfs_debug_msg+0x57/0x80 [libcfs] Sep 05 05:52:37 fir-md1-s2 kernel: [] server_start_targets+0x31a/0x2a20 [obdclass] Sep 05 05:52:37 fir-md1-s2 kernel: [] ? lustre_start_mgc+0x260/0x2510 [obdclass] Sep 05 05:52:37 fir-md1-s2 kernel: [] ? libcfs_debug_msg+0x57/0x80 [libcfs] Sep 05 05:52:37 fir-md1-s2 kernel: [] server_fill_super+0x10cc/0x1890 [obdclass] Sep 05 05:52:37 fir-md1-s2 kernel: [] lustre_fill_super+0x328/0x950 [obdclass] Sep 05 05:52:37 fir-md1-s2 kernel: [] ? lustre_common_put_super+0x270/0x270 [obdclass] Sep 05 05:52:37 fir-md1-s2 kernel: [] mount_nodev+0x4f/0xb0 Sep 05 05:52:37 fir-md1-s2 kernel: [] lustre_mount+0x38/0x60 [obdclass] Sep 05 05:52:37 fir-md1-s2 kernel: [] mount_fs+0x3e/0x1b0 Sep 05 05:52:37 fir-md1-s2 kernel: [] vfs_kern_mount+0x67/0x110 Sep 05 05:52:37 fir-md1-s2 kernel: [] do_mount+0x1ef/0xce0 Sep 05 05:52:37 fir-md1-s2 kernel: [] ? __check_object_size+0x1ca/0x250 Sep 05 05:52:37 fir-md1-s2 kernel: [] ? kmem_cache_alloc_trace+0x3c/0x200 Sep 05 05:52:37 fir-md1-s2 kernel: [] SyS_mount+0x83/0xd0 Sep 05 05:52:37 fir-md1-s2 kernel: [] system_call_fastpath+0x22/0x27 Sep 05 05:52:53 fir-md1-s2 kernel: LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.9.108.40@o2ib4 (no target). If you are running an HA pair check that the target is mounted on the other server. Sep 05 05:52:53 fir-md1-s2 kernel: LustreError: Skipped 6489 previous similar messages Sep 05 05:53:14 fir-md1-s2 kernel: LustreError: 11-0: fir-OST002e-osc-MDT0001: operation ost_connect to node 10.0.10.107@o2ib7 failed: rc = -16 Sep 05 05:53:14 fir-md1-s2 kernel: LustreError: Skipped 4 previous similar messages Sep 05 05:53:38 fir-md1-s2 kernel: LustreError: 166-1: MGC10.0.10.51@o2ib7: Connection to MGS (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will fail Sep 05 05:53:38 fir-md1-s2 kernel: LustreError: 39405:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1567687718, 300s ago), entering recovery for MGS@10.0.10.51@o2ib7 ns: MGC10.0.10.51@o2ib7 lock: ffff8fb0c2030900/0x5731634ee5f4dc6b lrc: 4/1,0 mode: --/CR res: [0x726966:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x98816ce13993a928 expref: -99 pid: 39405 timeout: 0 lvb_type: 0 Sep 05 05:53:38 fir-md1-s2 kernel: LustreError: 39703:0:(ldlm_resource.c:1147:ldlm_resource_complain()) MGC10.0.10.51@o2ib7: namespace resource [0x726966:0x2:0x0].0x0 (ffff8fb0fc121140) refcount nonzero (1) after lock cleanup; forcing cleanup. Sep 05 05:53:38 fir-md1-s2 kernel: Lustre: MGC10.0.10.51@o2ib7: Connection restored to 10.0.10.51@o2ib7 (at 10.0.10.51@o2ib7) Sep 05 05:53:38 fir-md1-s2 kernel: Lustre: fir-MDT0001: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900 Sep 05 05:53:38 fir-md1-s2 kernel: Lustre: fir-MDD0001: changelog on Sep 05 05:53:38 fir-md1-s2 kernel: Lustre: fir-MDT0001: in recovery but waiting for the first client to connect Sep 05 05:53:38 fir-md1-s2 kernel: Lustre: fir-MDT0001: Will be in recovery for at least 2:30, or until 1378 clients reconnect Sep 05 05:53:38 fir-md1-s2 kernel: Lustre: fir-MDT0003: Not available for connect from 10.8.29.2@o2ib6 (not set up) Sep 05 05:53:38 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Sep 05 05:53:39 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.109.5@o2ib4) Sep 05 05:53:39 fir-md1-s2 kernel: Lustre: Skipped 422 previous similar messages Sep 05 05:53:40 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.2.31@o2ib6) Sep 05 05:53:40 fir-md1-s2 kernel: Lustre: Skipped 20 previous similar messages Sep 05 05:53:42 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.9.108.48@o2ib4) Sep 05 05:53:42 fir-md1-s2 kernel: Lustre: Skipped 28 previous similar messages Sep 05 05:53:46 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.2.4@o2ib6) Sep 05 05:53:46 fir-md1-s2 kernel: Lustre: Skipped 308 previous similar messages Sep 05 05:53:54 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 10.8.20.23@o2ib6) Sep 05 05:53:54 fir-md1-s2 kernel: Lustre: Skipped 313 previous similar messages Sep 05 05:54:01 fir-md1-s2 kernel: Lustre: fir-MDT0001: Denying connection for new client 73e6004f-03b9-a131-217e-61f1d09f2d8d (at 10.9.101.8@o2ib4), waiting for 1378 known clients (1275 recovered, 58 in progress, and 0 evicted) to recover in 2:07 Sep 05 05:54:26 fir-md1-s2 kernel: Lustre: fir-MDT0001: Denying connection for new client 73e6004f-03b9-a131-217e-61f1d09f2d8d (at 10.9.101.8@o2ib4), waiting for 1378 known clients (1318 recovered, 58 in progress, and 0 evicted) to recover in 1:41 Sep 05 05:54:29 fir-md1-s2 kernel: Lustre: fir-MDT0001: Connection restored to (at 0@lo) Sep 05 05:54:29 fir-md1-s2 kernel: Lustre: Skipped 277 previous similar messages Sep 05 05:54:38 fir-md1-s2 kernel: Lustre: 39730:0:(ldlm_lib.c:1763:extend_recovery_timer()) fir-MDT0001: extended recovery timer reaching hard limit: 900, extend: 1 Sep 05 05:54:51 fir-md1-s2 kernel: Lustre: fir-MDT0001: Denying connection for new client 73e6004f-03b9-a131-217e-61f1d09f2d8d (at 10.9.101.8@o2ib4), waiting for 1378 known clients (1320 recovered, 58 in progress, and 0 evicted) to recover in 1:16 Sep 05 05:55:16 fir-md1-s2 kernel: Lustre: fir-MDT0001: Denying connection for new client 73e6004f-03b9-a131-217e-61f1d09f2d8d (at 10.9.101.8@o2ib4), waiting for 1378 known clients (1320 recovered, 58 in progress, and 0 evicted) to recover in 0:51 Sep 05 05:55:38 fir-md1-s2 kernel: Lustre: 39730:0:(ldlm_lib.c:1763:extend_recovery_timer()) fir-MDT0001: extended recovery timer reaching hard limit: 900, extend: 1 Sep 05 05:55:38 fir-md1-s2 kernel: Lustre: 39730:0:(ldlm_lib.c:1763:extend_recovery_timer()) Skipped 1 previous similar message Sep 05 05:55:41 fir-md1-s2 kernel: Lustre: fir-MDT0001: Denying connection for new client 73e6004f-03b9-a131-217e-61f1d09f2d8d (at 10.9.101.8@o2ib4), waiting for 1378 known clients (1320 recovered, 58 in progress, and 0 evicted) to recover in 0:26 Sep 05 05:56:06 fir-md1-s2 kernel: Lustre: fir-MDT0001: Denying connection for new client 73e6004f-03b9-a131-217e-61f1d09f2d8d (at 10.9.101.8@o2ib4), waiting for 1378 known clients (1320 recovered, 58 in progress, and 0 evicted) to recover in 0:01 Sep 05 05:56:08 fir-md1-s2 kernel: Lustre: 39730:0:(ldlm_lib.c:1763:extend_recovery_timer()) fir-MDT0001: extended recovery timer reaching hard limit: 900, extend: 1 Sep 05 05:56:08 fir-md1-s2 kernel: Lustre: 39730:0:(ldlm_lib.c:1763:extend_recovery_timer()) Skipped 2 previous similar messages Sep 05 05:56:32 fir-md1-s2 kernel: Lustre: fir-MDT0001: Denying connection for new client 73e6004f-03b9-a131-217e-61f1d09f2d8d (at 10.9.101.8@o2ib4), waiting for 1378 known clients (1320 recovered, 58 in progress, and 0 evicted) already passed deadline 0:23 Sep 05 05:56:34 fir-md1-s2 kernel: LustreError: 11-0: fir-MDT0003-osp-MDT0001: operation mds_connect to node 0@lo failed: rc = -11 Sep 05 05:56:34 fir-md1-s2 kernel: LustreError: Skipped 1 previous similar message Sep 05 05:56:38 fir-md1-s2 kernel: Lustre: 39730:0:(ldlm_lib.c:1763:extend_recovery_timer()) fir-MDT0001: extended recovery timer reaching hard limit: 900, extend: 1 Sep 05 05:57:28 fir-md1-s2 kernel: Lustre: fir-MDT0001: Denying connection for new client 73e6004f-03b9-a131-217e-61f1d09f2d8d (at 10.9.101.8@o2ib4), waiting for 1378 known clients (1320 recovered, 58 in progress, and 0 evicted) already passed deadline 1:19 Sep 05 05:57:28 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Sep 05 05:57:38 fir-md1-s2 kernel: Lustre: 39730:0:(ldlm_lib.c:1763:extend_recovery_timer()) fir-MDT0001: extended recovery timer reaching hard limit: 900, extend: 1 Sep 05 05:57:38 fir-md1-s2 kernel: Lustre: 39730:0:(ldlm_lib.c:1763:extend_recovery_timer()) Skipped 2 previous similar messages Sep 05 05:58:38 fir-md1-s2 kernel: Lustre: 39730:0:(ldlm_lib.c:1763:extend_recovery_timer()) fir-MDT0001: extended recovery timer reaching hard limit: 900, extend: 1 Sep 05 05:58:38 fir-md1-s2 kernel: Lustre: 39730:0:(ldlm_lib.c:1763:extend_recovery_timer()) Skipped 2 previous similar messages Sep 05 05:58:39 fir-md1-s2 kernel: LustreError: 166-1: MGC10.0.10.51@o2ib7: Connection to MGS (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will fail Sep 05 05:58:39 fir-md1-s2 kernel: LustreError: 39383:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1567688019, 300s ago), entering recovery for MGS@10.0.10.51@o2ib7 ns: MGC10.0.10.51@o2ib7 lock: ffff8fb0c2240480/0x5731634ee5f506c6 lrc: 4/1,0 mode: --/CR res: [0x726966:0x2:0x0].0x0 rrc: 3 type: PLN flags: 0x1000000000000 nid: local remote: 0x98816ce1399443f3 expref: -99 pid: 39383 timeout: 0 lvb_type: 0 Sep 05 05:58:39 fir-md1-s2 kernel: LustreError: 40296:0:(ldlm_resource.c:1147:ldlm_resource_complain()) MGC10.0.10.51@o2ib7: namespace resource [0x726966:0x2:0x0].0x0 (ffff8fb0c1334600) refcount nonzero (2) after lock cleanup; forcing cleanup. Sep 05 05:58:39 fir-md1-s2 kernel: Lustre: MGC10.0.10.51@o2ib7: Connection restored to 10.0.10.51@o2ib7 (at 10.0.10.51@o2ib7) Sep 05 05:58:39 fir-md1-s2 kernel: Lustre: Skipped 1 previous similar message Sep 05 05:58:39 fir-md1-s2 kernel: Lustre: fir-MDT0003: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900 Sep 05 05:58:39 fir-md1-s2 kernel: Lustre: fir-MDD0003: changelog on Sep 05 05:58:39 fir-md1-s2 kernel: Lustre: fir-MDT0003: in recovery but waiting for the first client to connect Sep 05 05:58:39 fir-md1-s2 kernel: Lustre: fir-MDT0003: Will be in recovery for at least 2:30, or until 1378 clients reconnect Sep 05 05:58:43 fir-md1-s2 kernel: Lustre: fir-MDT0001: Denying connection for new client 73e6004f-03b9-a131-217e-61f1d09f2d8d (at 10.9.101.8@o2ib4), waiting for 1378 known clients (1320 recovered, 58 in progress, and 0 evicted) already passed deadline 2:34 Sep 05 05:58:43 fir-md1-s2 kernel: Lustre: Skipped 3 previous similar messages Sep 05 05:59:04 fir-md1-s2 kernel: Lustre: fir-MDT0001: recovery is timed out, evict stale exports Sep 05 05:59:04 fir-md1-s2 kernel: Lustre: 39730:0:(ldlm_lib.c:1763:extend_recovery_timer()) fir-MDT0001: extended recovery timer reaching hard limit: 900, extend: 1 Sep 05 05:59:04 fir-md1-s2 kernel: Lustre: 39730:0:(ldlm_lib.c:1763:extend_recovery_timer()) Skipped 2 previous similar messages Sep 05 05:59:04 fir-md1-s2 kernel: Lustre: fir-MDT0001: Recovery over after 5:26, of 1378 clients 1378 recovered and 0 were evicted. Sep 05 05:59:11 fir-md1-s2 kernel: Lustre: fir-MDT0003: Recovery over after 0:32, of 1378 clients 1378 recovered and 0 were evicted. Sep 05 06:03:44 fir-md1-s2 kernel: LustreError: 166-1: MGC10.0.10.51@o2ib7: Connection to MGS (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will fail Sep 05 06:03:44 fir-md1-s2 kernel: LustreError: 39446:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1567688324, 300s ago), entering recovery for MGS@10.0.10.51@o2ib7 ns: MGC10.0.10.51@o2ib7 lock: ffff8fd0c9178fc0/0x5731634ee5f694ed lrc: 4/1,0 mode: --/CR res: [0x726966:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x98816ce139971cef expref: -99 pid: 39446 timeout: 0 lvb_type: 0 Sep 05 06:03:44 fir-md1-s2 kernel: LustreError: 41390:0:(ldlm_resource.c:1147:ldlm_resource_complain()) MGC10.0.10.51@o2ib7: namespace resource [0x726966:0x2:0x0].0x0 (ffff8fc0fbfd1740) refcount nonzero (1) after lock cleanup; forcing cleanup. Sep 05 06:03:44 fir-md1-s2 kernel: Lustre: MGC10.0.10.51@o2ib7: Connection restored to 10.0.10.51@o2ib7 (at 10.0.10.51@o2ib7) Sep 05 06:03:44 fir-md1-s2 kernel: Lustre: Skipped 1433 previous similar messages Sep 05 06:08:44 fir-md1-s2 kernel: LustreError: 166-1: MGC10.0.10.51@o2ib7: Connection to MGS (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will fail Sep 05 06:08:44 fir-md1-s2 kernel: LustreError: 39446:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1567688624, 300s ago), entering recovery for MGS@10.0.10.51@o2ib7 ns: MGC10.0.10.51@o2ib7 lock: ffff8faf58670fc0/0x5731634ee98c599c lrc: 4/1,0 mode: --/CR res: [0x726966:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x98816ce13aeea894 expref: -99 pid: 39446 timeout: 0 lvb_type: 0 Sep 05 06:08:44 fir-md1-s2 kernel: LustreError: 41487:0:(ldlm_resource.c:1147:ldlm_resource_complain()) MGC10.0.10.51@o2ib7: namespace resource [0x726966:0x2:0x0].0x0 (ffff8fd0882f1ec0) refcount nonzero (1) after lock cleanup; forcing cleanup. Sep 05 06:08:44 fir-md1-s2 kernel: Lustre: MGC10.0.10.51@o2ib7: Connection restored to 10.0.10.51@o2ib7 (at 10.0.10.51@o2ib7) Sep 05 06:13:50 fir-md1-s2 kernel: LustreError: 166-1: MGC10.0.10.51@o2ib7: Connection to MGS (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will fail Sep 05 06:13:50 fir-md1-s2 kernel: LustreError: 39446:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1567688930, 300s ago), entering recovery for MGS@10.0.10.51@o2ib7 ns: MGC10.0.10.51@o2ib7 lock: ffff8fd06d69ee40/0x5731634eeb39f439 lrc: 4/1,0 mode: --/CR res: [0x726966:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x98816ce1401f2e94 expref: -99 pid: 39446 timeout: 0 lvb_type: 0 Sep 05 06:13:50 fir-md1-s2 kernel: LustreError: 41570:0:(ldlm_resource.c:1147:ldlm_resource_complain()) MGC10.0.10.51@o2ib7: namespace resource [0x726966:0x2:0x0].0x0 (ffff8fd070e93d40) refcount nonzero (1) after lock cleanup; forcing cleanup. Sep 05 06:13:50 fir-md1-s2 kernel: Lustre: MGC10.0.10.51@o2ib7: Connection restored to 10.0.10.51@o2ib7 (at 10.0.10.51@o2ib7) Sep 05 06:18:50 fir-md1-s2 kernel: LustreError: 166-1: MGC10.0.10.51@o2ib7: Connection to MGS (at 10.0.10.51@o2ib7) was lost; in progress operations using this service will fail Sep 05 06:18:50 fir-md1-s2 kernel: LustreError: 39446:0:(ldlm_request.c:147:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1567689230, 300s ago), entering recovery for MGS@10.0.10.51@o2ib7 ns: MGC10.0.10.51@o2ib7 lock: ffff8fd0692d8480/0x5731634eebbc06b4 lrc: 4/1,0 mode: --/CR res: [0x726966:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1000000000000 nid: local remote: 0x98816ce146c24e88 expref: -99 pid: 39446 timeout: 0 lvb_type: 0 Sep 05 06:18:50 fir-md1-s2 kernel: LustreError: 41648:0:(ldlm_resource.c:1147:ldlm_resource_complain()) MGC10.0.10.51@o2ib7: namespace resource [0x726966:0x2:0x0].0x0 (ffff8fd0687780c0) refcount nonzero (1) after lock cleanup; forcing cleanup. Sep 05 06:21:47 fir-md1-s2 kernel: Lustre: DEBUG MARKER: Thu Sep 5 06:21:47 2019