[Wed Mar 18 10:21:10 2020]CentOS Linux 7 (Core) [Wed Mar 18 10:21:10 2020]Kernel 3.10.0-957.27.2.el7_lustre.pl2.x86_64 on an x86_64 [Wed Mar 18 10:21:10 2020] [Wed Mar 18 10:21:10 2020]fir-md1-s2 login: [ 199.839177] LNet: HW NUMA nodes: 4, HW CPU cores: 48, npartitions: 4 [Wed Mar 18 10:23:23 2020][ 199.846744] alg: No test for adler32 (adler32-zlib) [Wed Mar 18 10:23:24 2020][ 200.646830] Lustre: Lustre: Build Version: 2.12.4 [Wed Mar 18 10:23:24 2020][ 200.750653] LNet: 20171:0:(config.c:1627:lnet_inet_enumerate()) lnet: Ignoring interface em2: it's down [Wed Mar 18 10:23:24 2020][ 200.760432] LNet: Using FastReg for registration [Wed Mar 18 10:23:24 2020][ 200.777380] LNet: Added LNI 10.0.10.52@o2ib7 [8/256/0/180] [Wed Mar 18 10:25:12 2020][ 308.753055] LNetError: 20215:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds [Wed Mar 18 10:25:12 2020][ 308.763230] LNetError: 20215:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.0.10.216@o2ib7 (6): c: 7, oc: 0, rc: 8 [Wed Mar 18 10:25:12 2020][ 308.785216] LNetError: 20224:0:(peer.c:3451:lnet_peer_ni_add_to_recoveryq_locked()) lpni 10.0.10.216@o2ib7 added to recovery queue. Health = 900 [Wed Mar 18 10:26:55 2020][ 411.433064] LDISKFS-fs (dm-1): file extents enabled, maximum tree depth=5 [Wed Mar 18 10:26:55 2020][ 411.518678] LDISKFS-fs (dm-1): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,acl,no_mbcache,nodelalloc [Wed Mar 18 10:26:55 2020][ 411.970062] LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.49.26.15@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server. [Wed Mar 18 10:26:55 2020][ 411.987430] LustreError: Skipped 1 previous similar message [Wed Mar 18 10:26:55 2020][ 412.050602] Lustre: fir-MDT0001: Not available for connect from 10.50.7.56@o2ib2 (not set up) [Wed Mar 18 10:26:56 2020][ 412.364435] Lustre: fir-MDT0001: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900 [Wed Mar 18 10:26:56 2020][ 412.375828] Lustre: 20423:0:(llog_cat.c:1059:llog_cat_reverse_process()) fir-MDD0001: catalog [0x5:0xa:0x0] crosses index zero [Wed Mar 18 10:26:56 2020][ 412.504767] Lustre: fir-MDD0001: changelog on [Wed Mar 18 10:26:56 2020][ 412.509142] Lustre: 20423:0:(llog_cat.c:894:llog_cat_process_or_fork()) fir-MDD0001: catlog [0x5:0xa:0x0] crosses index zero [Wed Mar 18 10:26:56 2020][ 412.524435] Lustre: fir-MDT0001: Will be in recovery for at least 2:30, or until 1290 clients reconnect [Wed Mar 18 10:26:57 2020][ 413.531025] Lustre: fir-MDT0001: Connection restored to e0b3c403-4bb2-4 (at 10.50.14.6@o2ib2) [Wed Mar 18 10:26:57 2020][ 413.539563] Lustre: Skipped 90 previous similar messages [Wed Mar 18 10:26:57 2020][ 414.042375] Lustre: fir-MDT0001: Connection restored to e361cff3-f0f6-4 (at 10.50.9.49@o2ib2) [Wed Mar 18 10:26:57 2020][ 414.050903] Lustre: Skipped 134 previous similar messages [Wed Mar 18 10:26:58 2020][ 415.042327] Lustre: fir-MDT0001: Connection restored to b04d2d5e-f4ea-4 (at 10.49.26.23@o2ib1) [Wed Mar 18 10:26:58 2020][ 415.050940] Lustre: Skipped 103 previous similar messages [Wed Mar 18 10:27:00 2020][ 417.042845] Lustre: fir-MDT0001: Connection restored to 80cdbcdc-4411-4 (at 10.50.10.70@o2ib2) [Wed Mar 18 10:27:00 2020][ 417.051456] Lustre: Skipped 208 previous similar messages [Wed Mar 18 10:27:04 2020][ 421.043355] Lustre: fir-MDT0001: Connection restored to 3e1a7dd1-f48f-4 (at 10.50.2.17@o2ib2) [Wed Mar 18 10:27:04 2020][ 421.051880] Lustre: Skipped 588 previous similar messages [Wed Mar 18 10:28:23 2020][ 499.689451] Lustre: fir-MDT0001: Connection restored to fir-MDT0001-lwp-OST001e_UUID (at 10.0.10.105@o2ib7) [Wed Mar 18 10:28:23 2020][ 499.699192] Lustre: Skipped 152 previous similar messages [Wed Mar 18 10:28:28 2020][ 504.799579] Lustre: fir-MDT0001: Recovery over after 1:33, of 1290 clients 1290 recovered and 0 were evicted. [Wed Mar 18 10:28:28 2020][ 504.831276] Lustre: 21096:0:(llog_cat.c:98:llog_cat_new_log()) fir-MDD0001: there are no more free slots in catalog changelog_catalog [Wed Mar 18 10:28:28 2020][ 504.844351] LustreError: 21097:0:(llog_cat.c:530:llog_cat_current_log()) fir-MDD0001: next log does not exist! [Wed Mar 18 10:28:29 2020][ 505.330811] Lustre: 21239:0:(llog_cat.c:98:llog_cat_new_log()) fir-MDD0001: there are no more free slots in catalog changelog_catalog [Wed Mar 18 10:28:29 2020][ 505.342810] Lustre: 21239:0:(llog_cat.c:98:llog_cat_new_log()) Skipped 7461 previous similar messages [Wed Mar 18 10:28:29 2020][ 505.352159] LustreError: 21217:0:(llog_cat.c:530:llog_cat_current_log()) fir-MDD0001: next log does not exist! [Wed Mar 18 10:28:29 2020][ 505.362195] LustreError: 21217:0:(llog_cat.c:530:llog_cat_current_log()) Skipped 7778 previous similar messages [Wed Mar 18 10:28:30 2020][ 506.330830] Lustre: 21135:0:(llog_cat.c:98:llog_cat_new_log()) fir-MDD0001: there are no more free slots in catalog changelog_catalog [Wed Mar 18 10:28:30 2020][ 506.342839] Lustre: 21135:0:(llog_cat.c:98:llog_cat_new_log()) Skipped 19205 previous similar messages [Wed Mar 18 10:28:30 2020][ 506.352713] LustreError: 21218:0:(llog_cat.c:530:llog_cat_current_log()) fir-MDD0001: next log does not exist! [Wed Mar 18 10:28:30 2020][ 506.362720] LustreError: 21218:0:(llog_cat.c:530:llog_cat_current_log()) Skipped 16801 previous similar messages [Wed Mar 18 10:28:32 2020][ 508.330828] Lustre: 21176:0:(llog_cat.c:98:llog_cat_new_log()) fir-MDD0001: there are no more free slots in catalog changelog_catalog [Wed Mar 18 10:28:32 2020][ 508.342830] Lustre: 21176:0:(llog_cat.c:98:llog_cat_new_log()) Skipped 40185 previous similar messages [Wed Mar 18 10:28:32 2020][ 508.353020] LustreError: 21177:0:(llog_cat.c:530:llog_cat_current_log()) fir-MDD0001: next log does not exist! [Wed Mar 18 10:28:32 2020][ 508.363033] LustreError: 21177:0:(llog_cat.c:530:llog_cat_current_log()) Skipped 34114 previous similar messages [Wed Mar 18 10:28:36 2020][ 512.330888] Lustre: 21030:0:(llog_cat.c:98:llog_cat_new_log()) fir-MDD0001: there are no more free slots in catalog changelog_catalog [Wed Mar 18 10:28:36 2020][ 512.342885] Lustre: 21030:0:(llog_cat.c:98:llog_cat_new_log()) Skipped 79423 previous similar messages [Wed Mar 18 10:28:36 2020][ 512.352917] LustreError: 21213:0:(llog_cat.c:530:llog_cat_current_log()) fir-MDD0001: next log does not exist! [Wed Mar 18 10:28:36 2020][ 512.362931] LustreError: 21213:0:(llog_cat.c:530:llog_cat_current_log()) Skipped 67416 previous similar messages [Wed Mar 18 10:28:44 2020][ 520.331132] Lustre: 21084:0:(llog_cat.c:98:llog_cat_new_log()) fir-MDD0001: there are no more free slots in catalog changelog_catalog [Wed Mar 18 10:28:44 2020][ 520.343129] Lustre: 21084:0:(llog_cat.c:98:llog_cat_new_log()) Skipped 159923 previous similar messages [Wed Mar 18 10:28:44 2020][ 520.353111] LustreError: 21179:0:(llog_cat.c:530:llog_cat_current_log()) fir-MDD0001: next log does not exist! [Wed Mar 18 10:28:44 2020][ 520.363117] LustreError: 21179:0:(llog_cat.c:530:llog_cat_current_log()) Skipped 137757 previous similar messages [Wed Mar 18 10:28:48 2020][ 524.957681] LustreError: 21095:0:(mdd_dir.c:1065:mdd_changelog_ns_store()) fir-MDD0001: cannot store changelog record: type = 1, name = 'alignment.eigen.indiv', t = [0x240049459:0x9f67:0x0], p = [0x2400478b1:0x1e9d6:0x0]: rc = -28 [Wed Mar 18 10:28:54 2020][ 530.656413] LustreError: 20878:0:(mdd_dir.c:1065:mdd_changelog_ns_store()) fir-MDD0001: cannot store changelog record: type = 6, name = 'alignment.eigen.indiv', t = [0x240049419:0xea19:0x0], p = [0x24004ac39:0x22e6:0x0]: rc = -5 [Wed Mar 18 10:28:54 2020][ 530.676655] LustreError: 20878:0:(mdd_dir.c:1065:mdd_changelog_ns_store()) Skipped 2 previous similar messages [Wed Mar 18 10:28:55 2020][ 531.887370] LustreError: 20887:0:(mdd_dir.c:1065:mdd_changelog_ns_store()) fir-MDD0001: cannot store changelog record: type = 1, name = 'alignment.eigen.indiv', t = [0x240049419:0xea1f:0x0], p = [0x24004ac39:0x22e6:0x0]: rc = -28 [Wed Mar 18 10:28:55 2020][ 531.907696] LustreError: 20887:0:(mdd_dir.c:1065:mdd_changelog_ns_store()) Skipped 2 previous similar messages [Wed Mar 18 10:28:58 2020][ 534.437187] LustreError: 21091:0:(mdd_dir.c:1065:mdd_changelog_ns_store()) fir-MDD0001: cannot store changelog record: type = 1, name = '.state.rob002.nwwUCh', t = [0x24003e2c1:0x1a4be:0x0], p = [0x24003e2c1:0x1a2cd:0x0]: rc = -28 [Wed Mar 18 10:28:58 2020][ 534.457604] LustreError: 21091:0:(mdd_dir.c:1065:mdd_changelog_ns_store()) Skipped 476 previous similar messages [Wed Mar 18 10:29:00 2020][ 536.331595] Lustre: 21208:0:(llog_cat.c:98:llog_cat_new_log()) fir-MDD0001: there are no more free slots in catalog changelog_catalog [Wed Mar 18 10:29:00 2020][ 536.343598] Lustre: 21208:0:(llog_cat.c:98:llog_cat_new_log()) Skipped 318896 previous similar messages [Wed Mar 18 10:29:00 2020][ 536.353580] LustreError: 21233:0:(llog_cat.c:530:llog_cat_current_log()) fir-MDD0001: next log does not exist! [Wed Mar 18 10:29:00 2020][ 536.363592] LustreError: 21233:0:(llog_cat.c:530:llog_cat_current_log()) Skipped 278389 previous similar messages [Wed Mar 18 10:29:02 2020][ 538.793759] LustreError: 20903:0:(mdd_dir.c:1065:mdd_changelog_ns_store()) fir-MDD0001: cannot store changelog record: type = 1, name = '.state.rob008.lIDtrw', t = [0x24003e2c1:0x1a4c4:0x0], p = [0x24003e2c1:0x1a2cd:0x0]: rc = -5 [Wed Mar 18 10:29:02 2020][ 538.814093] LustreError: 20903:0:(mdd_dir.c:1065:mdd_changelog_ns_store()) Skipped 4 previous similar messages [Wed Mar 18 10:29:10 2020][ 546.814069] LustreError: 20826:0:(mdd_dir.c:1065:mdd_changelog_ns_store()) fir-MDD0001: cannot store changelog record: type = 1, name = '.state.rob019.9XaMV2', t = [0x24003e2c1:0x1a4cf:0x0], p = [0x24003e2c1:0x1a2cd:0x0]: rc = -28 [Wed Mar 18 10:29:10 2020][ 546.834485] LustreError: 20826:0:(mdd_dir.c:1065:mdd_changelog_ns_store()) Skipped 10 previous similar messages [Wed Mar 18 10:29:26 2020][ 562.867247] LustreError: 20872:0:(mdd_dir.c:1065:mdd_changelog_ns_store()) fir-MDD0001: cannot store changelog record: type = 1, name = '.state.rob038.uqeKld', t = [0x24003e2c1:0x1a702:0x0], p = [0x24003e2c1:0x1a4dd:0x0]: rc = -28 [Wed Mar 18 10:29:26 2020][ 562.887661] LustreError: 20872:0:(mdd_dir.c:1065:mdd_changelog_ns_store()) Skipped 536 previous similar messages [Wed Mar 18 10:29:32 2020][ 568.332611] Lustre: 21146:0:(llog_cat.c:98:llog_cat_new_log()) fir-MDD0001: there are no more free slots in catalog changelog_catalog [Wed Mar 18 10:29:32 2020][ 568.344609] Lustre: 21146:0:(llog_cat.c:98:llog_cat_new_log()) Skipped 632422 previous similar messages [Wed Mar 18 10:29:32 2020][ 568.354939] LustreError: 21214:0:(llog_cat.c:530:llog_cat_current_log()) fir-MDD0001: next log does not exist! [Wed Mar 18 10:29:32 2020][ 568.364951] LustreError: 21214:0:(llog_cat.c:530:llog_cat_current_log()) Skipped 550759 previous similar messages [Wed Mar 18 10:29:58 2020][ 595.094431] LustreError: 20852:0:(mdd_dir.c:1065:mdd_changelog_ns_store()) fir-MDD0001: cannot store changelog record: type = 1, name = '.state.rob144.edbipg', t = [0x24003e2c1:0x1a797:0x0], p = [0x24003e2c1:0x1a4dd:0x0]: rc = -5 [Wed Mar 18 10:29:58 2020][ 595.114751] LustreError: 20852:0:(mdd_dir.c:1065:mdd_changelog_ns_store()) Skipped 154 previous similar messages [Wed Mar 18 10:30:36 2020][ 632.334659] Lustre: 21210:0:(llog_cat.c:98:llog_cat_new_log()) fir-MDD0001: there are no more free slots in catalog changelog_catalog [Wed Mar 18 10:30:36 2020][ 632.346653] Lustre: 21210:0:(llog_cat.c:98:llog_cat_new_log()) Skipped 1243835 previous similar messages [Wed Mar 18 10:30:36 2020][ 632.356529] LustreError: 20480:0:(llog_cat.c:530:llog_cat_current_log()) fir-MDD0001: next log does not exist! [Wed Mar 18 10:30:36 2020][ 632.366527] LustreError: 20480:0:(llog_cat.c:530:llog_cat_current_log()) Skipped 1115377 previous similar messages [Wed Mar 18 10:30:41 2020][ 637.373911] Lustre: Failing over fir-MDT0001 [Wed Mar 18 10:30:41 2020][ 637.538894] Lustre: fir-MDT0001: Not available for connect from 10.49.0.71@o2ib1 (stopping) [Wed Mar 18 10:30:41 2020][ 637.547259] Lustre: Skipped 9 previous similar messages [Wed Mar 18 10:30:42 2020][ 638.564213] Lustre: fir-MDT0001: Not available for connect from 10.50.2.1@o2ib2 (stopping) [Wed Mar 18 10:30:42 2020][ 638.572489] Lustre: Skipped 91 previous similar messages [Wed Mar 18 10:30:44 2020][ 640.572209] Lustre: fir-MDT0001: Not available for connect from 10.49.25.11@o2ib1 (stopping) [Wed Mar 18 10:30:44 2020][ 640.580646] Lustre: Skipped 269 previous similar messages [Wed Mar 18 10:30:45 2020][ 641.349649] LustreError: 137-5: fir-MDT0001_UUID: not available for connect from 10.49.18.32@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server. [Wed Mar 18 10:30:45 2020][ 641.367024] LustreError: Skipped 9 previous similar messages [Wed Mar 18 10:30:45 2020][ 641.885493] Lustre: server umount fir-MDT0001 complete [Wed Mar 18 10:30:46 2020][ 642.798839] LNetError: 20284:0:(o2iblnd_cb.c:2496:kiblnd_passive_connect()) Can't accept conn from 10.0.10.224@o2ib7 on NA (ib0:1:10.0.10.52): bad dst nid 10.0.10.52@o2ib7 [Wed Mar 18 10:30:46 2020][ 643.310440] LNetError: 21557:0:(o2iblnd_cb.c:2496:kiblnd_passive_connect()) Can't accept conn from 10.0.10.218@o2ib7 on NA (ib0:1:10.0.10.52): bad dst nid 10.0.10.52@o2ib7 [Wed Mar 18 10:30:47 2020][ 643.325744] LNetError: 21557:0:(o2iblnd_cb.c:2496:kiblnd_passive_connect()) Skipped 21 previous similar messages [Wed Mar 18 10:30:48 2020][ 644.800599] LNet: Removed LNI 10.0.10.52@o2ib7 [Wed Mar 18 10:32:14 2020][ 731.122093] LDISKFS-fs (dm-1): file extents enabled, maximum tree depth=5 [Wed Mar 18 10:32:14 2020][ 731.208724] LDISKFS-fs (dm-1): mounted filesystem with ordered data mode. Opts: (null) [Wed Mar 18 10:35:11 2020][ 908.220225] LNet: HW NUMA nodes: 4, HW CPU cores: 48, npartitions: 4 [Wed Mar 18 10:35:11 2020][ 908.227858] alg: No test for adler32 (adler32-zlib) [Wed Mar 18 10:35:12 2020][ 909.028572] Lustre: Lustre: Build Version: 2.12.4 [Wed Mar 18 10:35:12 2020][ 909.132162] LNet: 21811:0:(config.c:1627:lnet_inet_enumerate()) lnet: Ignoring interface em2: it's down [Wed Mar 18 10:35:12 2020][ 909.141782] LNet: Using FastReg for registration [Wed Mar 18 10:35:12 2020][ 909.157585] LNet: Added LNI 10.0.10.52@o2ib7 [8/256/0/180] [Wed Mar 18 10:35:14 2020][ 910.343024] LDISKFS-fs (dm-1): file extents enabled, maximum tree depth=5 [Wed Mar 18 10:35:14 2020][ 910.429954] LDISKFS-fs (dm-1): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,acl,no_mbcache,nodelalloc [Wed Mar 18 10:35:15 2020][ 911.334090] Lustre: fir-MDT0001: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900 [Wed Mar 18 10:35:15 2020][ 911.346291] Lustre: fir-MDT0001: in recovery but waiting for the first client to connect [Wed Mar 18 10:35:16 2020][ 912.807358] Lustre: fir-MDT0001: Will be in recovery for at least 2:30, or until 1290 clients reconnect [Wed Mar 18 10:35:17 2020][ 913.816154] Lustre: fir-MDT0001: Connection restored to (at 10.50.4.10@o2ib2) [Wed Mar 18 10:35:20 2020][ 916.346313] Lustre: fir-MDT0001: Connection restored to dc2c90e2-2fae-4 (at 10.49.7.10@o2ib1) [Wed Mar 18 10:35:20 2020][ 916.354858] Lustre: Skipped 2 previous similar messages [Wed Mar 18 10:35:21 2020][ 917.347015] Lustre: fir-MDT0001: Connection restored to 79513c1e-38af-4 (at 10.50.1.12@o2ib2) [Wed Mar 18 10:35:21 2020][ 917.355540] Lustre: Skipped 262 previous similar messages [Wed Mar 18 10:35:23 2020][ 919.346610] Lustre: fir-MDT0001: Connection restored to 7a4b5ab1-0a05-4 (at 10.50.10.40@o2ib2) [Wed Mar 18 10:35:23 2020][ 919.355224] Lustre: Skipped 519 previous similar messages [Wed Mar 18 10:35:29 2020][ 925.853335] Lustre: fir-MDT0001: Connection restored to 9bb420b9-4b7e-4 (at 10.49.8.24@o2ib1) [Wed Mar 18 10:35:29 2020][ 925.861876] Lustre: Skipped 484 previous similar messages [Wed Mar 18 10:35:56 2020][ 952.450239] Lustre: fir-MDT0001: Connection restored to 95eea94a-c8bd-4 (at 10.50.6.1@o2ib2) [Wed Mar 18 10:36:20 2020][ 977.094182] Lustre: fir-MDT0001: Connection restored to fir-MDT0002-mdtlov_UUID (at 10.0.10.53@o2ib7) [Wed Mar 18 10:36:20 2020][ 977.103427] Lustre: Skipped 94 previous similar messages [Wed Mar 18 10:36:20 2020][ 977.141845] Lustre: fir-MDT0001: Recovery over after 1:04, of 1290 clients 1290 recovered and 0 were evicted. [Wed Mar 18 10:41:00 2020][ 1256.698668] LustreError: 11-0: fir-MDT0002-osp-MDT0001: operation mds_statfs to node 10.0.10.53@o2ib7 failed: rc = -107 [Wed Mar 18 10:41:00 2020][ 1256.709459] Lustre: fir-MDT0002-osp-MDT0001: Connection to fir-MDT0002 (at 10.0.10.53@o2ib7) was lost; in progress operations using this service will wait for recovery to complete [Wed Mar 18 10:42:09 2020][ 1326.252157] LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.0.10.3@o2ib7 (no target). If you are running an HA pair check that the target is mounted on the other server. [Wed Mar 18 10:42:10 2020][ 1326.927146] LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.6.28@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. [Wed Mar 18 10:42:12 2020][ 1328.685554] LDISKFS-fs (dm-2): file extents enabled, maximum tree depth=5 [Wed Mar 18 10:42:12 2020][ 1328.771290] LDISKFS-fs (dm-2): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,acl,no_mbcache,nodelalloc [Wed Mar 18 10:42:12 2020][ 1328.907069] LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.7.59@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. [Wed Mar 18 10:42:13 2020][ 1329.683859] Lustre: fir-MDT0002: Not available for connect from 10.50.1.10@o2ib2 (not set up) [Wed Mar 18 10:42:14 2020][ 1330.367735] Lustre: fir-MDT0002: Not available for connect from 10.0.10.51@o2ib7 (not set up) [Wed Mar 18 10:42:14 2020][ 1330.376261] Lustre: Skipped 1 previous similar message [Wed Mar 18 10:42:15 2020][ 1331.417567] Lustre: fir-MDT0002: Not available for connect from 10.0.10.113@o2ib7 (not set up) [Wed Mar 18 10:42:15 2020][ 1331.426189] Lustre: Skipped 25 previous similar messages [Wed Mar 18 10:42:15 2020][ 1331.542791] Lustre: fir-MDT0001: Connection restored to fir-MDT0002-mdtlov_UUID (at 0@lo) [Wed Mar 18 10:42:15 2020][ 1331.550982] Lustre: Skipped 7 previous similar messages [Wed Mar 18 10:42:15 2020][ 1331.842596] Lustre: fir-MDT0002: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900 [Wed Mar 18 10:42:15 2020][ 1332.016545] Lustre: fir-MDD0002: changelog on [Wed Mar 18 10:42:15 2020][ 1332.028636] Lustre: fir-MDT0002: in recovery but waiting for the first client to connect [Wed Mar 18 10:42:15 2020][ 1332.073057] Lustre: fir-MDT0002: Will be in recovery for at least 2:30, or until 1290 clients reconnect [Wed Mar 18 10:43:54 2020][ 1430.722510] Lustre: fir-MDT0002: Connection restored to 10.0.10.51@o2ib7 (at 10.0.10.51@o2ib7) [Wed Mar 18 10:43:54 2020][ 1430.731131] Lustre: Skipped 1376 previous similar messages [Wed Mar 18 10:43:55 2020][ 1431.893954] Lustre: fir-MDT0002: Recovery over after 1:40, of 1290 clients 1290 recovered and 0 were evicted. [Wed Mar 18 11:13:46 2020][ 3222.730973] Lustre: Failing over fir-MDT0002 [Wed Mar 18 11:13:46 2020][ 3222.767144] Lustre: fir-MDT0002: Not available for connect from 10.50.4.26@o2ib2 (stopping) [Wed Mar 18 11:13:46 2020][ 3222.775500] Lustre: Skipped 2 previous similar messages [Wed Mar 18 11:13:46 2020][ 3223.286675] Lustre: fir-MDT0002: Not available for connect from 10.50.8.2@o2ib2 (stopping) [Wed Mar 18 11:13:46 2020][ 3223.294946] Lustre: Skipped 119 previous similar messages [Wed Mar 18 11:13:47 2020][ 3224.291658] Lustre: fir-MDT0002: Not available for connect from 10.50.10.49@o2ib2 (stopping) [Wed Mar 18 11:13:47 2020][ 3224.300099] Lustre: Skipped 176 previous similar messages [Wed Mar 18 11:13:48 2020][ 3224.800652] LustreError: 11-0: fir-MDT0002-osp-MDT0001: operation mds_statfs to node 0@lo failed: rc = -107 [Wed Mar 18 11:13:48 2020][ 3224.810404] Lustre: fir-MDT0002-osp-MDT0001: Connection to fir-MDT0002 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete [Wed Mar 18 11:13:49 2020][ 3226.312907] Lustre: fir-MDT0002: Not available for connect from 10.50.2.7@o2ib2 (stopping) [Wed Mar 18 11:13:49 2020][ 3226.321182] Lustre: Skipped 289 previous similar messages [Wed Mar 18 11:13:54 2020][ 3230.643415] Lustre: fir-MDT0002: Not available for connect from 10.50.6.17@o2ib2 (stopping) [Wed Mar 18 11:13:54 2020][ 3230.651767] Lustre: Skipped 125 previous similar messages [Wed Mar 18 11:13:54 2020][ 3230.720972] LustreError: 22568:0:(ldlm_lockd.c:2324:ldlm_cancel_handler()) ldlm_cancel from 10.50.13.9@o2ib2 arrived at 1584555234 with bad export cookie 13699018698230690058 [Wed Mar 18 11:13:54 2020][ 3230.736522] LustreError: 22568:0:(ldlm_lockd.c:2324:ldlm_cancel_handler()) Skipped 3 previous similar messages [Wed Mar 18 11:13:56 2020][ 3233.053872] LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.7.66@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. [Wed Mar 18 11:13:56 2020][ 3233.071157] LustreError: Skipped 1 previous similar message [Wed Mar 18 11:13:56 2020][ 3233.091450] Lustre: server umount fir-MDT0002 complete [Wed Mar 18 11:13:57 2020][ 3233.963037] LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.4.2@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. [Wed Mar 18 11:13:58 2020][ 3234.968970] LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.9.72@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. [Wed Mar 18 11:13:58 2020][ 3234.986249] LustreError: Skipped 3 previous similar messages [Wed Mar 18 11:14:00 2020][ 3237.066184] LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.7.34@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. [Wed Mar 18 11:14:00 2020][ 3237.083470] LustreError: Skipped 6 previous similar messages [Wed Mar 18 11:14:04 2020][ 3241.072136] LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.49.8.24@o2ib1 (no target). If you are running an HA pair check that the target is mounted on the other server. [Wed Mar 18 11:14:04 2020][ 3241.089423] LustreError: Skipped 151 previous similar messages [Wed Mar 18 11:14:07 2020][ 3244.293339] Lustre: fir-MDT0001: Connection restored to 10.0.10.53@o2ib7 (at 10.0.10.53@o2ib7) [Wed Mar 18 11:14:07 2020][ 3244.301959] Lustre: Skipped 8 previous similar messages [Wed Mar 18 11:14:13 2020][ 3249.517385] LustreError: 137-5: fir-MDT0002_UUID: not available for connect from 10.50.6.29@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. [Wed Mar 18 11:14:13 2020][ 3249.534668] LustreError: Skipped 483 previous similar messages [Wed Mar 18 11:14:48 2020][ 3284.663654] Lustre: fir-MDT0002-osp-MDT0001: Connection restored to 10.0.10.53@o2ib7 (at 10.0.10.53@o2ib7) [Wed Mar 18 11:16:43 2020][ 3400.181442] LustreError: 11-0: fir-MDT0003-osp-MDT0001: operation mds_statfs to node 10.0.10.54@o2ib7 failed: rc = -107 [Wed Mar 18 11:16:43 2020][ 3400.192234] Lustre: fir-MDT0003-osp-MDT0001: Connection to fir-MDT0003 (at 10.0.10.54@o2ib7) was lost; in progress operations using this service will wait for recovery to complete [Wed Mar 18 11:17:27 2020][ 3443.575752] LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.4.14@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. [Wed Mar 18 11:17:27 2020][ 3443.593031] LustreError: Skipped 2 previous similar messages [Wed Mar 18 11:18:02 2020][ 3479.128062] LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.0.62@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. [Wed Mar 18 11:18:02 2020][ 3479.145340] LustreError: Skipped 1383 previous similar messages [Wed Mar 18 11:18:56 2020][ 3532.738794] Lustre: fir-MDT0001: Connection restored to 10.0.10.53@o2ib7 (at 10.0.10.53@o2ib7) [Wed Mar 18 11:19:11 2020][ 3548.177848] LustreError: 137-5: fir-MDT0003_UUID: not available for connect from 10.50.9.32@o2ib2 (no target). If you are running an HA pair check that the target is mounted on the other server. [Wed Mar 18 11:20:36 2020][ 3633.287959] Lustre: fir-MDT0003-osp-MDT0001: Connection restored to 10.0.10.53@o2ib7 (at 10.0.10.53@o2ib7)