[ 94.853950] igb 0000:04:00.0: eth0: (PCIe:5.0Gb/s:Width x4) 00:1e:67:65:25:1d [ 94.854023] igb 0000:04:00.0: eth0: PBA No: 100000-000 [ 94.854027] igb 0000:04:00.0: Using MSI-X interrupts. 8 rx queue(s), 8 tx queue(s) [ 94.854334] igb 0000:04:00.1: irq 60 for MSI/MSI-X [ 94.854378] igb 0000:04:00.1: irq 60 for MSI/MSI-X [ 94.854397] igb 0000:04:00.1: irq 61 for MSI/MSI-X [ 94.854413] igb 0000:04:00.1: irq 62 for MSI/MSI-X [ 94.854442] igb 0000:04:00.1: irq 63 for MSI/MSI-X [ 94.854458] igb 0000:04:00.1: irq 64 for MSI/MSI-X [ 94.854474] igb 0000:04:00.1: irq 65 for MSI/MSI-X [ 94.854490] igb 0000:04:00.1: irq 66 for MSI/MSI-X [ 94.854506] igb 0000:04:00.1: irq 67 for MSI/MSI-X [ 94.854522] igb 0000:04:00.1: irq 68 for MSI/MSI-X [ 94.854553] igb 0000:04:00.1: PHY reset is blocked due to SOL/IDER session. [ 94.985120] [TTM] Initializing pool allocator [ 94.999450] [TTM] Initializing DMA pool allocator [ 95.024604] AVX version of gcm_enc/dec engaged. [ 95.039509] AES CTR mode by8 optimization enabled [ 95.056637] fbcon: mgadrmfb (fb0) is primary device [ 95.058402] alg: No test for __gcm-aes-aesni (__driver-gcm-aes-aesni) [ 95.058445] alg: No test for __generic-gcm-aes-aesni (__driver-generic-gcm-aes-aesni) [ 95.059402] igb 0000:04:00.1: added PHC on eth1 [ 95.059403] igb 0000:04:00.1: Intel(R) Gigabit Ethernet Network Connection [ 95.059405] igb 0000:04:00.1: eth1: (PCIe:5.0Gb/s:Width x4) 00:1e:67:65:25:1e [ 95.059480] igb 0000:04:00.1: eth1: PBA No: 100000-000 [ 95.059483] igb 0000:04:00.1: Using MSI-X interrupts. 8 rx queue(s), 8 tx queue(s) [ 95.059791] igb 0000:04:00.2: irq 70 for MSI/MSI-X [ 95.059847] igb 0000:04:00.2: irq 70 for MSI/MSI-X [ 95.059864] igb 0000:04:00.2: irq 71 for MSI/MSI-X [ 95.059882] igb 0000:04:00.2: irq 72 for MSI/MSI-X [ 95.059900] igb 0000:04:00.2: irq 73 for MSI/MSI-X [ 95.059917] igb 0000:04:00.2: irq 74 for MSI/MSI-X [ 95.059934] igb 0000:04:00.2: irq 75 for MSI/MSI-X [ 95.059953] igb 0000:04:00.2: irq 76 for MSI/MSI-X [ 95.059974] igb 0000:04:00.2: irq 77 for MSI/MSI-X [ 95.059993] igb 0000:04:00.2: irq 78 for MSI/MSI-X [ 95.115148] igb 0000:04:00.2: added PHC on eth2 [ 95.115151] igb 0000:04:00.2: Intel(R) Gigabit Ethernet Network Connection [ 95.115153] igb 0000:04:00.2: eth2: (PCIe:5.0Gb/s:Width x4) 00:1e:67:65:25:1f [ 95.115226] igb 0000:04:00.2: eth2: PBA No: 100000-000 [ 95.115230] igb 0000:04:00.2: Using MSI-X interrupts. 8 rx queue(s), 8 tx queue(s) [ 95.115582] igb 0000:04:00.3: irq 79 for MSI/MSI-X [ 95.115635] igb 0000:04:00.3: irq 79 for MSI/MSI-X [ 95.115651] igb 0000:04:00.3: irq 80 for MSI/MSI-X [ 95.115667] igb 0000:04:00.3: irq 122 for MSI/MSI-X [ 95.115682] igb 0000:04:00.3: irq 123 for MSI/MSI-X [ 95.115703] igb 0000:04:00.3: irq 124 for MSI/MSI-X [ 95.115719] igb 0000:04:00.3: irq 125 for MSI/MSI-X [ 95.115736] igb 0000:04:00.3: irq 126 for MSI/MSI-X [ 95.115753] igb 0000:04:00.3: irq 127 for MSI/MSI-X [ 95.115770] igb 0000:04:00.3: irq 128 for MSI/MSI-X [ 95.118643] kvm: disabled by bios [ 95.129991] kvm: disabled by bios [ 95.132840] intel_rapl: Found RAPL domain package [ 95.132843] intel_rapl: Found RAPL domain core [ 95.132851] intel_rapl: Found RAPL domain dram [ 95.132885] intel_rapl: Found RAPL domain package [ 95.132889] intel_rapl: Found RAPL domain core [ 95.132895] intel_rapl: Found RAPL domain dram [ 95.167850] kvm: disabled by bios [ 95.177915] igb 0000:04:00.3: added PHC on eth3 [ 95.177916] igb 0000:04:00.3: Intel(R) Gigabit Ethernet Network Connection [ 95.177918] igb 0000:04:00.3: eth3: (PCIe:5.0Gb/s:Width x4) 00:1e:67:65:25:20 [ 95.177991] igb 0000:04:00.3: eth3: PBA No: 100000-000 [ 95.177993] igb 0000:04:00.3: Using MSI-X interrupts. 8 rx queue(s), 8 tx queue(s) [ 95.182005] kvm: disabled by bios [ 95.193951] kvm: disabled by bios [ 95.202995] kvm: disabled by bios [ 95.214029] kvm: disabled by bios [ 95.226966] kvm: disabled by bios [ 95.251327] kvm: disabled by bios [ 95.269101] kvm: disabled by bios [ 95.293290] Console: switching to colour frame buffer device 128x48 [ 95.394519] mgag200 0000:0b:00.0: fb0: mgadrmfb frame buffer device [ 95.417166] iTCO_vendor_support: vendor-support=0 [ 95.560360] iTCO_wdt: Intel TCO WatchDog Timer Driver v1.11 [ 95.560416] iTCO_wdt: unable to reset NO_REBOOT flag, device disabled by hardware/BIOS [ 95.992600] kvm: disabled by bios [ 96.003163] [drm] Initialized mgag200 1.0.0 20110418 for 0000:0b:00.0 on minor 0 [ 96.046182] kvm: disabled by bios [ 96.071054] kvm: disabled by bios [ 96.106126] kvm: disabled by bios [ 96.133101] kvm: disabled by bios [ 96.152088] kvm: disabled by bios [ 97.952919] EXT4-fs (sde11): mounting ext3 file system using the ext4 subsystem [ 98.141191] EXT4-fs (sde11): mounted filesystem with ordered data mode. Opts: (null) [ 98.270301] type=1305 audit(1590682225.565:2): audit_pid=2493 old=0 auid=4294967295 ses=4294967295 res=1 [ 99.416083] RPC: Registered named UNIX socket transport module. [ 99.439172] RPC: Registered udp transport module. [ 99.439172] RPC: Registered tcp transport module. [ 99.439173] RPC: Registered tcp NFSv4.1 backchannel transport module. [ 99.597374] Process accounting resumed [ 99.861844] Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011) [ 99.862728] bonding: bond0 is being created... [ 99.862749] bonding: bond0 already exists [ 99.995379] IPv6: ADDRCONF(NETDEV_UP): bond0: link is not ready [ 100.061675] bond0: Enslaving eth0 as a backup interface with a down link [ 100.062068] igb 0000:04:00.0 eth0: igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX [ 100.072934] IPv6: ADDRCONF(NETDEV_UP): bond0: link is not ready [ 100.073232] bond0: link status definitely up for interface eth0, 1000 Mbps full duplex [ 100.073238] bond0: making interface eth0 the new active one [ 100.073547] bond0: first active interface up! [ 100.073558] IPv6: ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready [ 100.563674] mlx4_ib_add: mlx4_ib: Mellanox ConnectX InfiniBand driver v4.7-3.2.9 [ 100.564462] mlx4_ib_add: counter index 0 for port 1 allocated 0 [ 100.564464] mlx4_ib_add: counter index 1 for port 2 allocated 0 [ 130.930294] card: mlx4_0, QP: 0x220, inline size: 120 [ 130.940335] card: mlx4_0, QP: 0x300, inline size: 120 [ 131.486067] IPv6: ADDRCONF(NETDEV_UP): ib1: link is not ready [ 131.508259] IPv6: ADDRCONF(NETDEV_CHANGE): ib1: link becomes ready [ 135.659425] Loading iSCSI transport class v2.0-870. [ 135.696969] LNet: HW NUMA nodes: 2, HW CPU cores: 16, npartitions: 2 [ 135.698730] alg: No test for adler32 (adler32-zlib) [ 136.882995] mpt2sas 0000:84:00.0: invalid short VPD tag 00 at offset 1 [ 136.909197] Lustre: Lustre: Build Version: 2.12.4 [ 137.035018] LNet: 3508:0:(config.c:1627:lnet_inet_enumerate()) lnet: Ignoring interface eth1: it's down [ 137.065981] LNet: Using FMR for registration [ 137.078179] LNetError: 1330:0:(o2iblnd_cb.c:2496:kiblnd_passive_connect()) Can't accept conn from 10.151.37.187@o2ib on NA (ib1:0:10.151.27.60): bad dst nid 10.151.27.60@o2ib [ 137.579725] LNetError: 1114:0:(o2iblnd_cb.c:2496:kiblnd_passive_connect()) Can't accept conn from 10.151.57.150@o2ib on NA (ib1:0:10.151.27.60): bad dst nid 10.151.27.60@o2ib [ 137.630957] LNetError: 1114:0:(o2iblnd_cb.c:2496:kiblnd_passive_connect()) Skipped 89 previous similar messages [ 138.616757] LNetError: 1098:0:(o2iblnd_cb.c:2496:kiblnd_passive_connect()) Can't accept conn from 10.151.33.86@o2ib on NA (ib1:0:10.151.27.60): bad dst nid 10.151.27.60@o2ib [ 138.667672] LNetError: 1098:0:(o2iblnd_cb.c:2496:kiblnd_passive_connect()) Skipped 73 previous similar messages [ 139.912597] LNet: Added LNI 10.151.27.60@o2ib [32/125536/0/0] [ 191.916718] LNetError: 5529:0:(api-ni.c:467:retry_count_set()) Can not set retry_count when health feature is turned off [ 281.336496] LDISKFS-fs (dm-0): recovery complete [ 281.336595] LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,errors=panic,user_xattr,no_mbcache,nodelalloc [ 281.780073] Lustre: MGS: Connection restored to f8285294-d6a2-12ab-fab1-c756859fef38 (at 0@lo) [ 282.055217] LDISKFS-fs (dm-1): failed to open journal device unknown-block(253,0): -16 [ 282.081441] LustreError: 5728:0:(osd_handler.c:7681:osd_mount()) nbp8-MDT0000-osd: can't mount /dev/mapper/nbp8--vg-mdt8: -22 [ 282.118675] LustreError: 5728:0:(obd_config.c:559:class_setup()) setup nbp8-MDT0000-osd failed (-22) [ 282.148735] LustreError: 5728:0:(obd_mount.c:202:lustre_start_simple()) nbp8-MDT0000-osd setup error -22 [ 282.148747] LustreError: 5728:0:(obd_mount_server.c:1956:server_fill_super()) Unable to start osd on /dev/mapper/nbp8--vg-mdt8: -22 [ 282.148754] LustreError: 5728:0:(obd_mount.c:1608:lustre_fill_super()) Unable to mount (-22) [ 284.182720] Lustre: MGS: Connection restored to 7a0d5faf-d18b-5224-892e-aac666559533 (at 10.141.4.18@o2ib417) [ 286.341967] Lustre: MGS: Connection restored to 6be2d89d-5ec5-bc8d-4f9f-7ad62494e051 (at 10.141.4.44@o2ib417) [ 289.274507] Lustre: MGS: Connection restored to acbf430f-20c8-30d6-b2b3-5c0b43b36fb5 (at 10.151.45.67@o2ib) [ 289.274511] Lustre: Skipped 1 previous similar message [ 299.348130] Lustre: MGS: Connection restored to cf2676ba-e2eb-648d-9a5e-b72990b2fa58 (at 10.151.14.13@o2ib) [ 299.348135] Lustre: Skipped 3 previous similar messages [ 310.273995] Lustre: MGS: Connection restored to 079b0579-4e69-73e4-5cf9-8e917a38eaa0 (at 10.151.29.26@o2ib) [ 310.274000] Lustre: Skipped 4 previous similar messages [ 326.538140] Lustre: MGS: Connection restored to 05b4b04a-d1a9-d088-ec68-73ce3fef2eb6 (at 10.151.55.176@o2ib) [ 326.538144] Lustre: Skipped 80 previous similar messages [ 358.552782] Lustre: MGS: Connection restored to 090561c6-63eb-6871-5cea-aa302b4c940b (at 10.141.5.1@o2ib417) [ 358.552786] Lustre: Skipped 176 previous similar messages [ 422.645158] Lustre: MGS: Connection restored to 2f0bd31c-bdba-673a-224d-f0e4e088e8f8 (at 10.151.43.82@o2ib) [ 422.645163] Lustre: Skipped 1418 previous similar messages [ 555.866969] Lustre: MGS: Connection restored to 2e734765-2f51-8ff1-93a5-b5a3e8da7db6 (at 10.151.57.71@o2ib) [ 555.866974] Lustre: Skipped 838 previous similar messages [ 678.629445] LDISKFS-fs (dm-1): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,acl,errors=panic,user_xattr,no_mbcache,nodelalloc [ 680.564805] LustreError: 137-5: nbp8-MDT0000_UUID: not available for connect from 10.151.6.81@o2ib (no target). If you are running an HA pair check that the target is mounted on the other server. [ 681.129977] LustreError: 137-5: nbp8-MDT0000_UUID: not available for connect from 10.151.33.11@o2ib (no target). If you are running an HA pair check that the target is mounted on the other server. [ 681.349296] Lustre: nbp8-MDT0000: Not available for connect from 10.151.26.58@o2ib (not set up) [ 688.575144] Lustre: 6590:0:(mdt_handler.c:5562:mdt_process_config()) For interoperability, skip this mdt.group_upcall. It is obsolete. [ 688.721162] Lustre: nbp8-MDT0000: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900 [ 688.908621] Lustre: nbp8-MDT0000: in recovery but waiting for the first client to connect [ 689.194400] Lustre: nbp8-MDT0000: Will be in recovery for at least 2:30, or until 1894 clients reconnect [ 689.225750] Lustre: nbp8-MDT0000: Denying connection for new client b720ca35-28af-04f1-9ee8-80a9c6fbce90 (at 10.151.56.227@o2ib), waiting for 1894 known clients (0 recovered, 0 in progress, and 0 evicted) to recover in 2:29 [ 689.735394] Lustre: nbp8-MDT0000: Denying connection for new client 5830a0de-9067-2516-4003-13a9636a191a (at 10.151.45.178@o2ib), waiting for 1894 known clients (0 recovered, 0 in progress, and 0 evicted) to recover in 2:29 [ 689.800643] Lustre: Skipped 1 previous similar message [ 690.867622] Lustre: nbp8-MDT0000: Denying connection for new client c34a2457-d449-9ea1-7dca-cbc4cc67c8c2 (at 10.151.55.175@o2ib), waiting for 1894 known clients (0 recovered, 0 in progress, and 0 evicted) to recover in 2:28 [ 690.932881] Lustre: Skipped 1 previous similar message [ 693.189946] Lustre: nbp8-MDT0000: Denying connection for new client c3f7c980-8b6a-a946-322f-a8b80521f815 (at 10.151.45.111@o2ib), waiting for 1894 known clients (0 recovered, 0 in progress, and 0 evicted) to recover in 7:42 [ 693.255187] Lustre: Skipped 9 previous similar messages [ 697.298085] Lustre: nbp8-MDT0000: Denying connection for new client 76737890-dc8f-8736-aa1e-ebc89d245725 (at 10.151.55.183@o2ib), waiting for 1894 known clients (521 recovered, 3 in progress, and 0 evicted) to recover in 7:38 [ 697.363941] Lustre: Skipped 56 previous similar messages [ 705.950333] Lustre: nbp8-MDT0000: Denying connection for new client b4a638e6-7e9d-b0ca-f786-9310ba21c442 (at 10.141.2.123@o2ib417), waiting for 1894 known clients (906 recovered, 1 in progress, and 0 evicted) to recover in 7:30 [ 706.016724] Lustre: Skipped 19 previous similar messages [ 722.030392] Lustre: nbp8-MDT0000: Denying connection for new client 78b71c94-b872-7abb-1bf3-14c025266381 (at 10.151.50.97@o2ib), waiting for 1894 known clients (1059 recovered, 5 in progress, and 0 evicted) to recover in 7:14 [ 722.096209] Lustre: Skipped 35 previous similar messages [ 754.250431] Lustre: nbp8-MDT0000: Denying connection for new client f399842c-2c3c-8378-52b0-da31abc612d8 (at 10.151.50.153@o2ib), waiting for 1894 known clients (1715 recovered, 4 in progress, and 0 evicted) to recover in 6:41 [ 754.316558] Lustre: Skipped 128 previous similar messages [ 812.561784] Lustre: nbp8-MDT0000: Connection restored to f56ba2a9-53b0-4725-3279-454ee6c59b3c (at 10.151.57.145@o2ib) [ 812.561789] Lustre: Skipped 2041 previous similar messages [ 818.481990] Lustre: nbp8-MDT0000: Denying connection for new client 8a572dfc-37f4-d145-51aa-ed010bc8fc34 (at 10.151.49.162@o2ib), waiting for 1894 known clients (1728 recovered, 4 in progress, and 0 evicted) to recover in 5:37 [ 818.548095] Lustre: Skipped 302 previous similar messages [ 946.503385] Lustre: nbp8-MDT0000: Denying connection for new client 7004b578-6a1d-48f8-bae9-07e922cc1393 (at 10.151.51.29@o2ib), waiting for 1894 known clients (1795 recovered, 4 in progress, and 0 evicted) to recover in 3:29 [ 946.569209] Lustre: Skipped 789 previous similar messages [ 1011.756652] Lustre: 7294:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (30/30), not sending early reply req@ffff897d23ef3180 x1667407579507376/t0(0) o400->79c34bf2-9957-d454-f170-b1eae38e1620@10.151.57.42@o2ib:93/0 lens 224/0 e 0 to 0 dl 1590683168 ref 2 fl Complete:H/c0/ffffffff rc 0/-1 [ 1011.853664] Lustre: 7294:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 32 previous similar messages [ 1012.758679] Lustre: 6563:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (30/30), not sending early reply req@ffff89a3b1940900 x1666712785770144/t0(0) o400->0faf747e-2bc7-0739-4b7e-df0297971f8e@10.151.34.72@o2ib:94/0 lens 224/0 e 0 to 0 dl 1590683169 ref 2 fl Complete:H/c0/ffffffff rc 0/-1 [ 1012.855664] Lustre: 6563:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 109 previous similar messages [ 1013.760730] Lustre: 6563:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (30/30), not sending early reply req@ffff89a39eac2d00 x1667381269930736/t0(0) o400->8efe3273-fd53-5756-284f-8aee3a20f907@10.151.52.12@o2ib:95/0 lens 224/0 e 0 to 0 dl 1590683170 ref 2 fl Complete:H/c0/ffffffff rc 0/-1 [ 1013.857730] Lustre: 6563:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 154 previous similar messages [ 1015.764780] Lustre: 7294:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (30/30), not sending early reply req@ffff897e14ef8480 x1666711358966576/t0(0) o400->39228877-7023-df3b-2b0e-1f22f1d879f0@10.151.35.163@o2ib:97/0 lens 224/0 e 0 to 0 dl 1590683172 ref 2 fl Complete:H/c0/ffffffff rc 0/-1 [ 1015.862075] Lustre: 7294:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 265 previous similar messages [ 1019.772929] Lustre: 6563:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (30/30), not sending early reply req@ffff89a35b8cc800 x1667400497082752/t0(0) o400->a4d5f58b-baca-fd2a-44a5-4bc6d3d9096f@10.151.57.135@o2ib:101/0 lens 224/0 e 0 to 0 dl 1590683176 ref 2 fl Complete:H/c0/ffffffff rc 0/-1 [ 1019.870511] Lustre: 6563:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 276 previous similar messages [ 1027.789223] Lustre: 6563:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (30/30), not sending early reply req@ffff89a35aed0d80 x1667405198915664/t0(0) o400->8cd3c71f-b862-34b6-1f00-6bb2369adbef@10.151.57.78@o2ib:109/0 lens 224/0 e 0 to 0 dl 1590683184 ref 2 fl Complete:H/c0/ffffffff rc 0/-1 [ 1027.886507] Lustre: 6563:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 66 previous similar messages [ 1043.815811] Lustre: 6563:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (30/30), not sending early reply req@ffff89a35ab4f500 x1666144756182704/t0(0) o400->6d30f9d1-87fc-f33a-cc27-92de87ea5a83@10.149.2.152@o2ib313:125/0 lens 224/0 e 0 to 0 dl 1590683200 ref 2 fl Complete:H/c0/ffffffff rc 0/-1 [ 1043.913968] Lustre: 6563:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 242 previous similar messages [ 1078.895128] Lustre: 6563:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (30/30), not sending early reply req@ffff89a367805580 x1658453548185792/t0(0) o400->dd3ed090-f249-325c-cd3a-5c30e4ffa197@10.151.52.106@o2ib:160/0 lens 224/0 e 0 to 0 dl 1590683235 ref 2 fl Complete:H/c0/ffffffff rc 0/-1 [ 1078.992705] Lustre: 6563:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 568 previous similar messages [ 1143.105465] Lustre: 7286:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (30/30), not sending early reply req@ffff89a3d2334800 x1666013225504240/t0(0) o400->e2df5817-2b27-d463-d5ea-e9e533eeaeec@10.151.31.66@o2ib:224/0 lens 224/0 e 0 to 0 dl 1590683299 ref 2 fl Complete:H/c0/ffffffff rc 0/-1 [ 1143.202754] Lustre: 7286:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 12 previous similar messages [ 1156.106940] Lustre: nbp8-MDT0000: recovery is timed out, evict stale exports [ 1156.130774] Lustre: nbp8-MDT0000: disconnecting 90 stale clients [ 1156.161239] Lustre: 7264:0:(ldlm_lib.c:1782:extend_recovery_timer()) nbp8-MDT0000: extended recovery timer reached hard limit: 900, extend: 1 [ 1156.631091] Lustre: nbp8-MDT0000: Recovery over after 7:47, of 1894 clients 1804 recovered and 90 were evicted. [ 1383.174024] Lustre: nbp8-MDT0000: haven't heard from client a8185608-41bc-6a4b-4bd5-bb9a73c7d301 (at 10.151.32.30@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897eaacb4800, cur 1590683510 expire 1590683360 last 1590683283 [ 1396.173855] Lustre: MGS: haven't heard from client d211235d-6c38-ba8e-1ed0-bb7162f9e19d (at 10.151.27.26@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8980f48bd400, cur 1590683523 expire 1590683373 last 1590683296 [ 1396.243411] Lustre: Skipped 2 previous similar messages [ 1478.539472] LNet: 7739:0:(o2iblnd_cb.c:2602:kiblnd_passive_connect()) Conn stale 10.151.27.26@o2ib version 12/12 incarnation 1588985836754405/1590683600614443 [ 1478.590590] Lustre: MGS: Connection restored to d211235d-6c38-ba8e-1ed0-bb7162f9e19d (at 10.151.27.26@o2ib) [ 1478.590594] Lustre: Skipped 635 previous similar messages [ 2291.454012] Lustre: MGS: Connection restored to 729863a9-1454-5147-12d2-dc6be130fdc2 (at 10.151.3.42@o2ib) [ 2291.454017] Lustre: Skipped 9 previous similar messages [ 2913.673199] Lustre: MGS: Connection restored to 4fc9a9cb-fc19-bea5-fbeb-a1f226598528 (at 10.149.1.41@o2ib313) [ 2913.673205] Lustre: Skipped 3959 previous similar messages [ 3011.230324] Lustre: nbp8-MDT0000: haven't heard from client 5b6acf67-c3e9-ff25-8eaf-52fdd662e3da (at 10.151.45.89@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a364e5dc00, cur 1590685138 expire 1590684988 last 1590684911 [ 3116.144040] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [ 3116.177247] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.45.89@o2ib (331): c: 30, oc: 0, rc: 32 [ 3599.487287] Lustre: MGS: Connection restored to 966e1a77-e4c5-bfce-5a66-3b05da6c5ddb (at 10.151.47.33@o2ib) [ 3599.487293] Lustre: Skipped 1499 previous similar messages [ 4116.270800] Lustre: nbp8-MDT0000: haven't heard from client 575181d1-4628-4124-e3cb-b36dd139f152 (at 10.151.29.29@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897cf67d7000, cur 1590686243 expire 1590686093 last 1590686016 [ 4116.342904] Lustre: Skipped 1 previous similar message [ 4123.274589] Lustre: MGS: haven't heard from client 08eaa415-d1b2-60e0-2ff6-709226d423e1 (at 10.151.34.96@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a3dc73b000, cur 1590686250 expire 1590686100 last 1590686023 [ 4123.344122] Lustre: Skipped 41 previous similar messages [ 4201.183931] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [ 4201.217146] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.35.173@o2ib (302): c: 31, oc: 0, rc: 32 [ 4202.183948] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [ 4202.217162] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.35.73@o2ib (304): c: 31, oc: 0, rc: 32 [ 4204.184024] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [ 4204.217239] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.34.131@o2ib (306): c: 31, oc: 0, rc: 32 [ 4207.184127] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [ 4207.217334] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [ 4207.250249] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.34.137@o2ib (310): c: 31, oc: 0, rc: 32 [ 4207.290890] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [ 4208.179250] Lustre: MGS: Connection restored to e4770c49-635c-4ee9-0d2a-0b76e598a6b4 (at 10.151.46.36@o2ib) [ 4208.179255] Lustre: Skipped 521 previous similar messages [ 4213.184348] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [ 4213.217571] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.34.48@o2ib (315): c: 31, oc: 0, rc: 32 [ 4222.184677] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [ 4222.217899] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 16 previous similar messages [ 4222.251393] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.29.33@o2ib (324): c: 31, oc: 0, rc: 32 [ 4222.291739] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 16 previous similar messages [ 4242.185415] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [ 4242.218638] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 10 previous similar messages [ 4242.252124] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.35.153@o2ib (345): c: 31, oc: 0, rc: 32 [ 4242.292760] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 10 previous similar messages [ 4276.186670] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [ 4276.219892] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 7 previous similar messages [ 4276.253095] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.29.41@o2ib (351): c: 30, oc: 0, rc: 32 [ 4276.293438] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 7 previous similar messages [ 4430.281317] Lustre: nbp8-MDT0000: haven't heard from client 547636ac-379d-a6a6-19f5-95350e3b8dad (at 10.151.28.193@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897cf72d6c00, cur 1590686557 expire 1590686407 last 1590686330 [ 4430.353733] Lustre: Skipped 41 previous similar messages [ 4527.195879] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [ 4527.229089] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.28.193@o2ib (323): c: 30, oc: 0, rc: 32 [ 4809.329457] Lustre: MGS: Connection restored to 3d1ebb70-381b-14d8-b678-4098ff783366 (at 10.151.42.242@o2ib) [ 4809.329463] Lustre: Skipped 707 previous similar messages [ 5120.547183] LNet: 1116:0:(o2iblnd_cb.c:2602:kiblnd_passive_connect()) Conn stale 10.151.3.55@o2ib version 12/12 incarnation 1588819989626245/1590687188544346 [ 5318.752267] perf: interrupt took too long (2530 > 2500), lowering kernel.perf_event_max_sample_rate to 79000 [ 5410.726458] Lustre: MGS: Connection restored to becac59d-8635-bc31-7a34-6cf27cc20048 (at 10.151.32.200@o2ib) [ 5410.726463] Lustre: Skipped 105 previous similar messages [ 6013.653097] Lustre: MGS: Connection restored to ad1617f4-b06f-6e54-b353-91ab16bdd5b3 (at 10.151.10.72@o2ib) [ 6013.653108] Lustre: Skipped 1267 previous similar messages [ 6323.501915] perf: interrupt took too long (3169 > 3162), lowering kernel.perf_event_max_sample_rate to 63000 [ 6504.357088] Lustre: nbp8-MDT0000: haven't heard from client 8741c74f-18e5-cab3-8648-7a3e36b6cb55 (at 10.151.7.113@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897db943d000, cur 1590688631 expire 1590688481 last 1590688404 [ 6504.429200] Lustre: Skipped 3 previous similar messages [ 6597.271379] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [ 6597.304600] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [ 6597.337516] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.7.113@o2ib (320): c: 30, oc: 0, rc: 32 [ 6597.377865] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [ 6645.693131] Lustre: MGS: Connection restored to f4a41fda-735d-781b-189c-473ba8c3c973 (at 10.151.46.114@o2ib) [ 6645.693137] Lustre: Skipped 843 previous similar messages [ 7046.287793] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [ 7046.321000] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.19.162@o2ib (216): c: 32, oc: 0, rc: 32 [ 7315.037170] Lustre: MGS: Connection restored to 081f5f86-b7a8-f83e-b1f3-5f0095ae6587 (at 10.141.5.87@o2ib417) [ 7315.037176] Lustre: Skipped 209 previous similar messages [ 7971.390813] Lustre: MGS: Connection restored to 47c876a4-4c63-a0c4-e616-dbaabdd5ce3a (at 10.151.54.147@o2ib) [ 7971.390822] Lustre: Skipped 131 previous similar messages [ 8103.269558] perf: interrupt took too long (3982 > 3961), lowering kernel.perf_event_max_sample_rate to 50000 [ 8611.858593] Lustre: MGS: Connection restored to 7a9d0173-8cbe-af0b-4e2a-5578b7c1d0c2 (at 10.151.28.177@o2ib) [ 8611.858599] Lustre: Skipped 189 previous similar messages [ 9225.147317] Lustre: MGS: Connection restored to e7174936-d652-909b-751b-017cf8d49fe6 (at 10.141.2.198@o2ib417) [ 9225.147323] Lustre: Skipped 161 previous similar messages [ 9780.387091] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [ 9780.420306] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.23.162@o2ib (303): c: 32, oc: 0, rc: 32 [ 9825.389510] Lustre: nbp8-MDT0000: Connection restored to dfb866cc-9ba2-f613-a734-c6d610d0e75c (at 10.151.33.59@o2ib) [ 9825.389515] Lustre: Skipped 102 previous similar messages [10475.720521] Lustre: MGS: Connection restored to 23ce3b74-912e-f976-9b94-abb625c98ad1 (at 10.151.29.30@o2ib) [10475.720527] Lustre: Skipped 290 previous similar messages [11221.363240] Lustre: MGS: Connection restored to cfd65234-a1c6-a41d-a9f5-92798a0b911e (at 10.151.3.35@o2ib) [11221.363245] Lustre: Skipped 1877 previous similar messages [11576.542609] Lustre: nbp8-MDT0000: haven't heard from client a00a2510-c1d6-8675-d307-8ba7e6e6a7b2 (at 10.153.13.77@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897df71b8800, cur 1590693703 expire 1590693553 last 1590693476 [11576.615578] Lustre: Skipped 1 previous similar message [11816.550906] Lustre: nbp8-MDT0000: haven't heard from client 7527a0d3-2f46-7505-1395-733c7e8f7f04 (at 10.153.10.75@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897d44e3bc00, cur 1590693943 expire 1590693793 last 1590693716 [11816.623871] Lustre: Skipped 1 previous similar message [11821.680312] Lustre: nbp8-MDT0000: Connection restored to f7762624-e011-1921-fc37-be511f80bba1 (at 10.151.33.73@o2ib) [11821.680317] Lustre: Skipped 402 previous similar messages [11861.505637] perf: interrupt took too long (4980 > 4977), lowering kernel.perf_event_max_sample_rate to 40000 [11892.554587] Lustre: nbp8-MDT0000: haven't heard from client a7d10738-5e12-520b-919f-39fffb2ab67c (at 10.151.23.188@o2ib) in 201 seconds. I think it's dead, and I am evicting it. exp ffff89a3a99ce000, cur 1590694019 expire 1590693869 last 1590693818 [11892.626986] Lustre: Skipped 1 previous similar message [12015.469532] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [12015.502746] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.23.187@o2ib (323): c: 31, oc: 0, rc: 32 [12019.468682] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [12019.501887] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [12019.534802] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.23.195@o2ib (327): c: 31, oc: 0, rc: 32 [12019.575438] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [12020.469732] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [12021.468761] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.23.199@o2ib (329): c: 31, oc: 0, rc: 32 [12021.509397] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [12135.563188] Lustre: nbp8-MDT0000: haven't heard from client 7c5e3a37-bdd6-b430-988f-63d032b4f2dd (at 10.153.10.80@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a3d56aa800, cur 1590694262 expire 1590694112 last 1590694035 [12135.636156] Lustre: Skipped 11 previous similar messages [12430.674029] Lustre: MGS: Connection restored to 17d050b0-bf82-e2f4-23fe-be7cf768ac73 (at 10.151.43.187@o2ib) [12430.674035] Lustre: Skipped 254 previous similar messages [12916.502437] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [12916.535650] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 2 previous similar messages [12916.568843] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.44.149@o2ib (304): c: 32, oc: 0, rc: 32 [12916.609478] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [13092.508870] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [13092.542081] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.29.200@o2ib (279): c: 32, oc: 0, rc: 32 [13115.350860] Lustre: MGS: Connection restored to cb5b5302-fece-c58b-3f8c-8462e2138fd2 (at 10.149.2.182@o2ib313) [13115.350866] Lustre: Skipped 219 previous similar messages [13670.528970] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [13670.562185] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.24.110@o2ib (229): c: 32, oc: 0, rc: 32 [13927.449783] Lustre: MGS: Connection restored to 81fa95ac-00e0-d74a-7f82-c2d0a7dd4aea (at 10.151.11.213@o2ib) [13927.449789] Lustre: Skipped 189 previous similar messages [14194.637038] Lustre: nbp8-MDT0000: haven't heard from client ad07c08c-a8f9-e216-38a6-6285ac4dcaa8 (at 10.149.2.182@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a2e9b98c00, cur 1590696321 expire 1590696171 last 1590696094 [14194.710003] Lustre: Skipped 9 previous similar messages [14592.360612] Lustre: MGS: Connection restored to 9e667393-f451-b8b1-6e4d-2ebc5c22a429 (at 10.149.5.85@o2ib313) [14592.360618] Lustre: Skipped 297 previous similar messages [14779.569892] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [14779.603105] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.24.106@o2ib (303): c: 32, oc: 0, rc: 32 [15864.135269] Lustre: MGS: Connection restored to 646f8061-e6a6-d54e-4e76-298ed46801c4 (at 10.151.39.107@o2ib) [15864.135275] Lustre: Skipped 1 previous similar message [15956.638358] Lustre: MGS: Connection restored to f130cb08-8717-bbfd-519d-c27ef5211e5a (at 10.151.7.85@o2ib) [15956.638364] Lustre: Skipped 31 previous similar messages [16107.706949] Lustre: MGS: haven't heard from client 71a26561-5a22-3abe-14c1-accedfb0b0f7 (at 10.141.2.250@o2ib417) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897b8ee99000, cur 1590698234 expire 1590698084 last 1590698007 [16107.777338] Lustre: Skipped 1 previous similar message [16174.004138] Lustre: MGS: Connection restored to 71a26561-5a22-3abe-14c1-accedfb0b0f7 (at 10.141.2.250@o2ib417) [16174.004144] Lustre: Skipped 299 previous similar messages [16240.621790] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [16240.654995] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.5.116@o2ib (311): c: 30, oc: 0, rc: 32 [16267.714263] Lustre: nbp8-MDT0000: haven't heard from client 835494f6-161c-2397-1ae3-a4144f93a6d8 (at 10.141.2.238@o2ib417) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a2e4a3b800, cur 1590698394 expire 1590698244 last 1590698167 [16267.787233] Lustre: Skipped 6 previous similar messages [16512.608025] Lustre: nbp8-MDT0000: Client 33f169de-7464-34f2-a0db-927d3a84cad8 (at 10.141.2.243@o2ib417) reconnecting [16512.642694] Lustre: nbp8-MDT0000: Connection restored to adb1e05b-6eb9-45a0-7f97-1fad592e3103 (at 10.141.2.243@o2ib417) [16512.642697] Lustre: Skipped 54 previous similar messages [16666.573086] Lustre: MGS: Received new LWP connection from 10.141.2.249@o2ib417, removing former export from same NID [16674.643115] Lustre: nbp8-MDT0000: Client fd81eb6e-f223-0494-0ecb-c98e3db08fd0 (at 10.141.2.246@o2ib417) reconnecting [16684.705368] Lustre: nbp8-MDT0000: Client 8db2138b-a22b-16c4-0752-067d57fda5d9 (at 10.141.2.247@o2ib417) reconnecting [16688.713678] LNet: 27173:0:(o2iblnd_cb.c:2602:kiblnd_passive_connect()) Conn stale 10.151.27.93@o2ib version 12/12 incarnation 1587508741328094/1590698810766285 [16693.730689] Lustre: MGS: haven't heard from client ac57b89b-d26c-c16e-a5df-0249a06e7d41 (at 10.151.27.93@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a3ba8de800, cur 1590698820 expire 1590698670 last 1590698593 [16693.800217] Lustre: Skipped 13 previous similar messages [16710.728218] Lustre: nbp8-MDT0000: haven't heard from client 46c28389-03d5-d65a-3627-878046a89ee7 (at 10.151.27.93@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897c42873800, cur 1590698837 expire 1590698687 last 1590698610 [16769.733753] Lustre: MGS: haven't heard from client a2a1c6dd-c77c-5c84-2409-ae62f19687ee (at 10.151.9.180@o2ib) in 161 seconds. I think it's dead, and I am evicting it. exp ffff897dfe0bc800, cur 1590698896 expire 1590698746 last 1590698735 [16786.731834] Lustre: nbp8-MDT0000: haven't heard from client fef5e254-f554-72ce-f900-eb873ca788a2 (at 10.151.9.180@o2ib) in 178 seconds. I think it's dead, and I am evicting it. exp ffff897dc64bec00, cur 1590698913 expire 1590698763 last 1590698735 [16812.568029] LNet: 27173:0:(o2iblnd_cb.c:2602:kiblnd_passive_connect()) Conn stale 10.151.9.180@o2ib version 12/12 incarnation 1588859516574293/1590698875986821 [17190.780248] Lustre: MGS: Connection restored to 61f6ae37-d61b-ed96-69db-ad62cc7e10f9 (at 10.151.44.248@o2ib) [17190.780254] Lustre: Skipped 265 previous similar messages [17792.206556] Lustre: MGS: Connection restored to 15bf08b7-69af-4267-d1ef-82ea105fda04 (at 10.151.34.153@o2ib) [17792.206562] Lustre: Skipped 263 previous similar messages [18536.966514] Lustre: MGS: Connection restored to a93c8408-5a3f-3070-9dbb-849e2a3d7e23 (at 10.151.51.240@o2ib) [18536.966520] Lustre: Skipped 35 previous similar messages [18696.802102] Lustre: nbp8-MDT0000: haven't heard from client 5d7b0ecc-7d2b-088b-ec6b-7122c0028f97 (at 10.149.1.69@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a35c3c4800, cur 1590700823 expire 1590700673 last 1590700596 [18772.804675] Lustre: nbp8-MDT0000: haven't heard from client fd81eb6e-f223-0494-0ecb-c98e3db08fd0 (at 10.141.2.246@o2ib417) in 183 seconds. I think it's dead, and I am evicting it. exp ffff89a3d1a97000, cur 1590700899 expire 1590700749 last 1590700716 [18772.877769] Lustre: Skipped 1 previous similar message [18780.714595] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [18780.747801] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.42.168@o2ib (280): c: 32, oc: 0, rc: 32 [19080.726535] LustreError: 5715:0:(ldlm_lib.c:3269:target_bulk_io()) @@@ timeout on bulk READ after 200+1590682128s req@ffff897ddd3a0050 x1666008801351632/t0(0) o256->fe53dd49-4464-27e9-4f48-5909e834f96b@10.141.2.245@o2ib417:159/0 lens 304/240 e 0 to 0 dl 1590701354 ref 1 fl Interpret:/0/0 rc 0/0 [19096.815183] Lustre: nbp8-MDT0000: haven't heard from client e58f4285-e702-daf4-7901-fabcdbc9635c (at 10.141.2.248@o2ib417) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a3fb755400, cur 1590701223 expire 1590701073 last 1590700996 [19096.888151] Lustre: Skipped 17 previous similar messages [19115.821038] Lustre: MGS: haven't heard from client 65e66e22-b27f-5375-c793-912288d2dab9 (at 10.141.2.246@o2ib417) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a3c8518400, cur 1590701242 expire 1590701092 last 1590701015 [19115.891430] Lustre: Skipped 6 previous similar messages [19159.518465] Lustre: MGS: Connection restored to ed4d1805-d4d9-2bb2-e5ad-67e245d2f7c9 (at 10.151.0.208@o2ib) [19159.518471] Lustre: Skipped 111 previous similar messages [19499.518070] Lustre: MGS: Received new LWP connection from 10.141.2.248@o2ib417, removing former export from same NID [19759.664249] Lustre: nbp8-MDT0000: Connection restored to 21c3e9f4-eac5-1f4a-dd2a-6bc5299d1312 (at 10.151.33.58@o2ib) [19759.664255] Lustre: Skipped 286 previous similar messages [20366.676388] Lustre: MGS: Connection restored to 784586f6-6002-ef40-704c-071e201b3994 (at 10.151.36.31@o2ib) [20366.676393] Lustre: Skipped 6 previous similar messages [21055.213936] Lustre: MGS: Connection restored to 6900764e-dc52-0b6e-ded6-a924f24b3dd1 (at 10.151.35.204@o2ib) [21055.213942] Lustre: Skipped 31 previous similar messages [21657.166065] Lustre: MGS: Connection restored to ef8c0df3-346d-d794-50a1-4b9c52c718cd (at 10.141.3.44@o2ib417) [21657.166071] Lustre: Skipped 185 previous similar messages [22357.872089] Lustre: MGS: Connection restored to 11a76c06-767e-4805-9c13-08a564dddcaa (at 10.151.7.113@o2ib) [22357.872095] Lustre: Skipped 57 previous similar messages [22966.666642] Lustre: MGS: Connection restored to dc4aaaea-1c0b-e8a9-aee3-3f1f3ac8ac7d (at 10.151.29.238@o2ib) [22966.666648] Lustre: Skipped 55 previous similar messages [23588.138349] Lustre: MGS: Connection restored to e5f389e8-c4bd-5de2-0f25-85188aa00fa2 (at 10.151.39.104@o2ib) [23588.138355] Lustre: Skipped 265 previous similar messages [24210.215011] Lustre: MGS: Connection restored to 26cd0906-e349-34da-cb56-c78093b65df9 (at 10.149.3.24@o2ib313) [24210.215017] Lustre: Skipped 9 previous similar messages [24812.495554] Lustre: MGS: Connection restored to 721ff43f-3fce-59b1-502a-d08164034096 (at 10.151.53.159@o2ib) [24812.495560] Lustre: Skipped 205 previous similar messages [25440.413868] Lustre: MGS: Connection restored to 512d122d-4a60-6063-c1f5-2108dee13060 (at 10.149.6.130@o2ib313) [25440.413875] Lustre: Skipped 367 previous similar messages [26097.072293] Lustre: nbp8-MDT0000: haven't heard from client f9012a18-f6cd-9298-d892-ff6639fa624b (at 10.151.46.122@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897c51c33000, cur 1590708223 expire 1590708073 last 1590707996 [26097.144681] Lustre: Skipped 5 previous similar messages [26115.239466] Lustre: MGS: Connection restored to 1844af35-1b9d-0daf-9bb8-e034e0c164ff (at 10.141.3.210@o2ib417) [26115.239472] Lustre: Skipped 45 previous similar messages [26184.986361] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [26185.019576] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.13.180@o2ib (313): c: 30, oc: 0, rc: 32 [26188.987487] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [26189.020694] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.46.122@o2ib (317): c: 30, oc: 0, rc: 32 [26190.986547] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [26191.019755] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.7.111@o2ib (303): c: 30, oc: 0, rc: 32 [26204.987065] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [26205.020271] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.10.78@o2ib (331): c: 30, oc: 0, rc: 32 [26211.987323] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [26212.020497] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [26212.053405] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.42.182@o2ib (340): c: 30, oc: 0, rc: 32 [26212.094040] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [26236.146511] Lustre: MGS: Received new LWP connection from 10.141.2.249@o2ib417, removing former export from same NID [26236.146513] Lustre: nbp8-MDT0000: Client 6f85f8e6-09e6-dcd8-0a12-6b042e718a14 (at 10.141.2.249@o2ib417) reconnecting [26239.686904] Lustre: nbp8-MDT0000: Client e58f4285-e702-daf4-7901-fabcdbc9635c (at 10.141.2.248@o2ib417) reconnecting [26239.686912] Lustre: MGS: Received new LWP connection from 10.141.2.248@o2ib417, removing former export from same NID [26239.988396] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [26240.021603] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.14.34@o2ib (341): c: 30, oc: 0, rc: 32 [26242.175206] Lustre: MGS: Received new LWP connection from 10.141.2.246@o2ib417, removing former export from same NID [26242.175229] Lustre: nbp8-MDT0000: Client fd81eb6e-f223-0494-0ecb-c98e3db08fd0 (at 10.141.2.246@o2ib417) reconnecting [26254.000998] Lustre: MGS: Received new LWP connection from 10.141.2.245@o2ib417, removing former export from same NID [26254.001026] Lustre: nbp8-MDT0000: Client b904b04e-6bc1-d498-e070-763801e54cd4 (at 10.141.2.245@o2ib417) reconnecting [26254.001028] Lustre: Skipped 1 previous similar message [26254.087211] Lustre: Skipped 1 previous similar message [26352.082549] Lustre: nbp8-MDT0000: haven't heard from client 33f169de-7464-34f2-a0db-927d3a84cad8 (at 10.141.2.243@o2ib417) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897d292da000, cur 1590708478 expire 1590708328 last 1590708251 [26352.155517] Lustre: Skipped 13 previous similar messages [26428.084309] Lustre: nbp8-MDT0000: haven't heard from client 6f85f8e6-09e6-dcd8-0a12-6b042e718a14 (at 10.141.2.249@o2ib417) in 192 seconds. I think it's dead, and I am evicting it. exp ffff89a3ad2a4000, cur 1590708554 expire 1590708404 last 1590708362 [26428.157274] Lustre: Skipped 15 previous similar messages [26505.999133] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [26506.032347] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.1.157@o2ib (315): c: 30, oc: 0, rc: 32 [26542.999455] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [26543.032668] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [26543.065584] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.23.153@o2ib (349): c: 30, oc: 0, rc: 32 [26543.106219] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [26653.093580] Lustre: nbp8-MDT0000: haven't heard from client 1edc8f1d-6094-1626-4ec9-6ea0e4cb938f (at 10.141.2.244@o2ib417) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a2d7e2e400, cur 1590708779 expire 1590708629 last 1590708552 [26653.166943] Lustre: Skipped 5 previous similar messages [26719.602627] Lustre: MGS: Connection restored to b0c6738f-1aaf-7e72-cc09-1dfe0340b84a (at 10.141.2.244@o2ib417) [26719.602633] Lustre: Skipped 97 previous similar messages [26864.948464] Lustre: MGS: Received new LWP connection from 10.141.2.249@o2ib417, removing former export from same NID [26864.983119] Lustre: Skipped 1 previous similar message [26875.083262] Lustre: MGS: Received new LWP connection from 10.141.2.246@o2ib417, removing former export from same NID [26875.117919] Lustre: Skipped 1 previous similar message [27387.365181] Lustre: MGS: Connection restored to 2c506802-b0e6-37fd-86f9-ad132d71f6a6 (at 10.151.9.176@o2ib) [27387.365186] Lustre: Skipped 87 previous similar messages [27457.812520] perf: interrupt took too long (6228 > 6225), lowering kernel.perf_event_max_sample_rate to 32000 [27754.133113] Lustre: nbp8-MDT0000: haven't heard from client 77cfb372-3ba9-59aa-8eba-297314e8774c (at 10.141.2.128@o2ib417) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a3a93fe400, cur 1590709880 expire 1590709730 last 1590709653 [27754.206077] Lustre: Skipped 3 previous similar messages [28033.007848] Lustre: MGS: Connection restored to d8403cb3-f5a8-f2eb-b6f2-89b790fbf15c (at 10.151.19.102@o2ib) [28033.007854] Lustre: Skipped 157 previous similar messages [28652.670590] Lustre: MGS: Connection restored to 95e32009-0317-84e8-8767-5e375ae6c4bf (at 10.151.56.41@o2ib) [28652.670595] Lustre: Skipped 107 previous similar messages [28983.178919] Lustre: nbp8-MDT0000: haven't heard from client e21bc315-267e-3ce6-c0c1-40e0b85ed51a (at 10.141.2.27@o2ib417) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a2dfbba400, cur 1590711109 expire 1590710959 last 1590710882 [28983.251601] Lustre: Skipped 1 previous similar message [29059.181760] Lustre: nbp8-MDT0000: haven't heard from client 38620884-911d-02db-3f73-79869268c588 (at 10.141.2.40@o2ib417) in 189 seconds. I think it's dead, and I am evicting it. exp ffff897dc37d4800, cur 1590711185 expire 1590711035 last 1590710996 [29059.254444] Lustre: Skipped 9 previous similar messages [29256.944645] Lustre: MGS: Connection restored to a71d7449-a181-c9c8-be14-9e9e9c03ffe8 (at 10.151.19.138@o2ib) [29256.944651] Lustre: Skipped 17 previous similar messages [29973.605017] Lustre: MGS: Connection restored to c5471be5-c615-563a-93ab-0d41253e1f71 (at 10.151.37.115@o2ib) [29973.605023] Lustre: Skipped 207 previous similar messages [30699.938725] Lustre: MGS: Connection restored to 279d8f58-a7f8-028f-8f68-3d96f47f808d (at 10.151.54.52@o2ib) [30699.938731] Lustre: Skipped 19 previous similar messages [31300.164192] Lustre: MGS: Connection restored to 625543e2-36f6-fa86-508d-d9834122b0da (at 10.141.6.139@o2ib417) [31300.164198] Lustre: Skipped 57 previous similar messages [31495.271068] Lustre: nbp8-MDT0000: haven't heard from client 6fce14f1-4999-78b4-209d-ff99f23e65d3 (at 10.151.43.151@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897da9dfc000, cur 1590713621 expire 1590713471 last 1590713394 [31495.343462] Lustre: Skipped 1 previous similar message [31576.183487] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [31576.216693] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 2 previous similar messages [31576.249907] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.43.151@o2ib (308): c: 30, oc: 0, rc: 32 [31576.290544] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 2 previous similar messages [31967.503960] Lustre: MGS: Connection restored to 2ba414d8-3bb2-0ed2-3f7e-fc834275e985 (at 10.149.1.76@o2ib313) [31967.503966] Lustre: Skipped 319 previous similar messages [32622.310896] Lustre: nbp8-MDT0000: haven't heard from client 9b01c192-266c-1030-d039-dffec30e2044 (at 10.151.28.163@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a2e3e86000, cur 1590714748 expire 1590714598 last 1590714521 [32622.383299] Lustre: Skipped 1 previous similar message [32642.943005] Lustre: MGS: Connection restored to afd05b70-a11b-74db-b509-8f534db31651 (at 10.151.54.27@o2ib) [32642.943011] Lustre: Skipped 245 previous similar messages [32743.227108] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [32743.260319] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.28.163@o2ib (347): c: 30, oc: 0, rc: 32 [33346.172464] Lustre: MGS: Connection restored to 6ea310a4-f446-23fb-f4ee-11b4b1ccf2eb (at 10.151.54.161@o2ib) [33346.172470] Lustre: Skipped 73 previous similar messages [34177.050049] Lustre: MGS: Connection restored to 6d97ffee-adbe-3ce8-f4f7-7a379e25af64 (at 10.151.57.120@o2ib) [34177.050054] Lustre: Skipped 289 previous similar messages [34862.876696] Lustre: MGS: Connection restored to 3a17cdd6-cd4a-718b-6adf-85370b1ad112 (at 10.149.4.222@o2ib313) [34862.876701] Lustre: Skipped 97 previous similar messages [35464.325598] Lustre: MGS: Connection restored to f2bb9b5a-ccbe-938b-aca0-1c0f7712f0dc (at 10.151.33.35@o2ib) [35464.325604] Lustre: Skipped 319 previous similar messages [36320.626954] Lustre: MGS: Connection restored to 948eba72-36b2-9608-5ee3-bb4e4e3d6f5a (at 10.151.54.92@o2ib) [36320.626960] Lustre: Skipped 101 previous similar messages [36939.566387] Lustre: MGS: Connection restored to 8b7e9a8a-d6d1-3818-9797-c6db7b01ce44 (at 10.151.32.23@o2ib) [36939.566392] Lustre: Skipped 351 previous similar messages [37953.690166] Lustre: MGS: Connection restored to 87f19c83-5f0b-c2a4-65e6-7d6921f4cc25 (at 10.151.54.94@o2ib) [37953.690172] Lustre: Skipped 77 previous similar messages [38610.511319] Lustre: MGS: Connection restored to b785ec7d-635f-ebe0-9341-27a9c7f24b09 (at 10.151.51.159@o2ib) [38610.511325] Lustre: Skipped 225 previous similar messages [39375.762542] Lustre: MGS: Connection restored to 9c65b6a4-0c74-a44a-26da-b732a2f363fc (at 10.151.7.71@o2ib) [39375.762548] Lustre: Skipped 25 previous similar messages [40036.271565] Lustre: MGS: Connection restored to dc82c09d-cce4-4839-aaf4-b561803f1a3c (at 10.151.35.126@o2ib) [40036.271571] Lustre: Skipped 23 previous similar messages [40674.751007] Lustre: MGS: Connection restored to 7fa30b1e-d4a1-48b1-9a47-a651dd96225a (at 10.151.50.175@o2ib) [40674.751013] Lustre: Skipped 53 previous similar messages [41305.168539] Lustre: MGS: Connection restored to 8f580602-19b7-bc71-07bd-bbeef1c11bfc (at 10.151.32.126@o2ib) [41305.168545] Lustre: Skipped 565 previous similar messages [41925.128292] Lustre: MGS: Connection restored to 52f47afb-86af-9fd2-2bae-c9124eaa3536 (at 10.151.56.17@o2ib) [41925.128298] Lustre: Skipped 57 previous similar messages [42526.716946] Lustre: MGS: Connection restored to edba9f51-cf87-3622-27cf-2856ae5b2587 (at 10.151.3.51@o2ib) [42526.716952] Lustre: Skipped 69 previous similar messages [43127.151353] Lustre: nbp8-MDT0000: Connection restored to 59e5fb3d-6b69-1e80-6c80-0117a31b26e5 (at 10.151.32.47@o2ib) [43127.151358] Lustre: Skipped 46 previous similar messages [43732.859892] Lustre: MGS: Connection restored to a0a0bac8-2c3a-9139-87b2-d9b89bdb6497 (at 10.151.56.18@o2ib) [43732.859897] Lustre: Skipped 98 previous similar messages [44339.794573] Lustre: MGS: Connection restored to 6d361a61-1855-6f6e-bd0a-870b33899b0a (at 10.151.56.56@o2ib) [44339.794579] Lustre: Skipped 17 previous similar messages [44963.397241] Lustre: MGS: Connection restored to e8f4e8a8-72a7-0b09-4f97-fe721a271298 (at 10.151.7.84@o2ib) [44963.397247] Lustre: Skipped 19 previous similar messages [45649.096409] Lustre: MGS: Connection restored to cac98f4f-09fd-ce73-9452-5acc8f8da83a (at 10.151.56.23@o2ib) [45649.096414] Lustre: Skipped 23 previous similar messages [46255.641200] Lustre: MGS: Connection restored to a6612b3c-e35e-f978-4e6b-38667ddb6601 (at 10.151.43.38@o2ib) [46255.641206] Lustre: Skipped 65 previous similar messages [46970.040069] Lustre: MGS: Connection restored to 81e19d23-1c28-9320-e2d0-47134a6f2ea8 (at 10.151.56.14@o2ib) [46970.040074] Lustre: Skipped 41 previous similar messages [47647.054028] Lustre: MGS: Connection restored to 978adafd-41be-5f86-5dce-a5d900bb33ce (at 10.151.19.167@o2ib) [47647.054033] Lustre: Skipped 7 previous similar messages [48320.615506] Lustre: MGS: Connection restored to 9ce1dcf9-8553-734b-818c-9ba1a9291a51 (at 10.149.2.155@o2ib313) [48320.615512] Lustre: Skipped 369 previous similar messages [49099.396111] Lustre: MGS: Connection restored to 2ab97ebd-c8c9-103c-4afa-834460f233f5 (at 10.151.34.24@o2ib) [49099.396117] Lustre: Skipped 329 previous similar messages [49874.906593] Lustre: MGS: Connection restored to e8f4e8a8-72a7-0b09-4f97-fe721a271298 (at 10.151.7.84@o2ib) [49874.906599] Lustre: Skipped 31 previous similar messages [50871.407346] Lustre: MGS: Connection restored to 81d7861d-20b6-3bf5-e7da-dc2ede3aae93 (at 10.151.36.218@o2ib) [50871.407351] Lustre: Skipped 1 previous similar message [51580.810832] Lustre: MGS: Connection restored to d2ecb73a-5185-e15c-8aa2-f7c9cdb83a48 (at 10.151.35.132@o2ib) [51580.810843] Lustre: Skipped 1063 previous similar messages [52481.260534] Lustre: MGS: Connection restored to fafa8a83-30be-8b20-8ae9-2f55b23d7e15 (at 10.151.8.47@o2ib) [52481.260539] Lustre: Skipped 109 previous similar messages [53103.524037] Lustre: MGS: Connection restored to 869d143e-2fea-053f-f4bf-38a4d3c0f97e (at 10.151.46.165@o2ib) [53103.524042] Lustre: Skipped 205 previous similar messages [53753.172943] Lustre: MGS: Connection restored to 36bbb84d-8265-5227-fa31-0e9d42397877 (at 10.151.1.176@o2ib) [53753.172948] Lustre: Skipped 63 previous similar messages [54677.487475] Process accounting resumed [54838.969564] Lustre: MGS: Connection restored to 06ca9284-da8c-feaf-8520-b650ed3d379a (at 10.151.8.35@o2ib) [54838.969570] Lustre: Skipped 41 previous similar messages [55329.052026] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [55329.085232] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.18.120@o2ib (303): c: 32, oc: 0, rc: 32 [55670.817247] Lustre: MGS: Connection restored to 036e3026-6198-f8ad-1f59-2b8a40c3dd07 (at 10.149.10.47@o2ib313) [55670.817253] Lustre: Skipped 5 previous similar messages [56822.106274] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [56822.139489] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.23.110@o2ib (297): c: 32, oc: 0, rc: 32 [56836.334313] Lustre: MGS: Connection restored to f5cf3aaa-853f-755e-baec-8635e53803c2 (at 10.151.34.17@o2ib) [56836.334320] Lustre: Skipped 323 previous similar messages [57688.007049] Lustre: MGS: Connection restored to 11a76c06-767e-4805-9c13-08a564dddcaa (at 10.151.7.113@o2ib) [57688.007054] Lustre: Skipped 107 previous similar messages [58427.866598] Lustre: MGS: Connection restored to b1ccc78c-3bc8-eb1c-636a-72417a91c14b (at 10.151.23.221@o2ib) [58427.866604] Lustre: Skipped 273 previous similar messages [59034.108061] Lustre: MGS: Connection restored to 73703c08-87dc-a944-f54b-84f8862ccaec (at 10.151.33.188@o2ib) [59034.108066] Lustre: Skipped 175 previous similar messages [59793.206738] Lustre: MGS: Connection restored to 2a8d2439-c44e-aec6-2cd2-bba9e6549b2f (at 10.151.36.118@o2ib) [59793.206743] Lustre: Skipped 3 previous similar messages [60473.183231] Lustre: MGS: Connection restored to e6f75a1d-81cd-b9b4-5268-59c65e094bd0 (at 10.151.34.157@o2ib) [60473.183237] Lustre: Skipped 19 previous similar messages [61357.449947] Lustre: MGS: Connection restored to 21c3e9f4-eac5-1f4a-dd2a-6bc5299d1312 (at 10.151.33.58@o2ib) [61357.449952] Lustre: Skipped 9 previous similar messages [62383.841389] Lustre: MGS: Connection restored to 5a9381bf-1680-7b97-8502-f72f6b761c27 (at 10.151.23.91@o2ib) [62383.841395] Lustre: Skipped 39 previous similar messages [63001.346878] Lustre: MGS: Connection restored to d7b40379-7b18-7d8b-4ace-27f0cc4f6f23 (at 10.151.34.32@o2ib) [63001.346884] Lustre: Skipped 5 previous similar messages [64478.880639] Lustre: MGS: Connection restored to c8939505-3f57-00e6-4b19-1eb55a1af472 (at 10.151.55.165@o2ib) [64478.880645] Lustre: Skipped 31 previous similar messages [64756.676822] Lustre: MGS: Connection restored to c4466eb0-8609-6fb8-6bda-2d1708e1467f (at 10.151.35.68@o2ib) [64756.676827] Lustre: Skipped 1 previous similar message [64923.700263] Lustre: MGS: Connection restored to 8f404389-722e-4337-fa17-d22e6994f97d (at 10.151.33.137@o2ib) [64923.700269] Lustre: Skipped 15 previous similar messages [66420.556663] Lustre: MGS: Connection restored to f820501a-04ba-7ee7-ff03-7d52ac1d53ab (at 10.151.34.44@o2ib) [66420.556669] Lustre: Skipped 19 previous similar messages [66522.842634] Lustre: MGS: Connection restored to 902b0ff4-2aeb-af3e-6e2f-9093f263c0b6 (at 10.151.33.132@o2ib) [66522.842640] Lustre: Skipped 1 previous similar message [66597.851924] Lustre: nbp8-MDT0000: Connection restored to 45304f16-e6a0-9541-3a7b-607568f26b01 (at 10.151.8.46@o2ib) [66597.851929] Lustre: Skipped 58 previous similar messages [66751.637387] Lustre: MGS: Connection restored to caaac0db-d49c-4ac3-bc1d-26a7393f693a (at 10.151.33.135@o2ib) [66751.637393] Lustre: Skipped 74 previous similar messages [67151.604939] Lustre: MGS: Connection restored to 1656f841-db77-2345-dc55-f4cdd189bf74 (at 10.151.23.140@o2ib) [67151.604945] Lustre: Skipped 27 previous similar messages [67977.806443] Lustre: MGS: Connection restored to fb51ef2e-88a2-aa85-8ce7-c0fc31031e2e (at 10.151.10.107@o2ib) [67977.806449] Lustre: Skipped 17 previous similar messages [69102.646929] Lustre: nbp8-MDT0000: haven't heard from client a041c8f0-639b-f85d-ba72-ed69f5abb762 (at 10.153.13.220@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897ca5da5c00, cur 1590751227 expire 1590751077 last 1590751000 [69102.720185] Lustre: Skipped 1 previous similar message [69159.441627] Lustre: MGS: Connection restored to 866c4eff-b146-65e5-e50c-0f750d6dd23c (at 10.151.32.27@o2ib) [69159.441633] Lustre: Skipped 13 previous similar messages [70514.948265] Lustre: MGS: Connection restored to 03cdebae-b216-8770-4f6e-2601771ad717 (at 10.151.54.168@o2ib) [70514.948271] Lustre: Skipped 1 previous similar message [71143.100327] Lustre: MGS: Connection restored to b3d4cdf1-97f1-b45c-510b-37e17bb3b716 (at 10.151.7.108@o2ib) [71143.100333] Lustre: Skipped 159 previous similar messages [71341.313849] Lustre: MGS: Connection restored to 6ea310a4-f446-23fb-f4ee-11b4b1ccf2eb (at 10.151.54.161@o2ib) [71341.313855] Lustre: Skipped 19 previous similar messages [71493.735131] Lustre: nbp8-MDT0000: haven't heard from client d1cc5dd1-a71e-e34f-d24c-080facb19c03 (at 10.153.12.40@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897e28a98400, cur 1590753618 expire 1590753468 last 1590753391 [71493.808142] Lustre: Skipped 1 previous similar message [74867.172439] Lustre: MGS: Connection restored to 659dc5ac-8c98-e774-e93e-e99f361ecd9f (at 10.151.30.169@o2ib) [74867.172445] Lustre: Skipped 79 previous similar messages [75123.346711] Lustre: MGS: Connection restored to c6ce301d-6ee4-a933-6786-4faf384dbc6a (at 10.141.2.236@o2ib417) [75123.346716] Lustre: Skipped 59 previous similar messages [75727.305175] Lustre: MGS: Connection restored to a82a713a-3567-6a5f-1ab8-2b2d94ff90fa (at 10.149.1.133@o2ib313) [75727.305181] Lustre: Skipped 17 previous similar messages [76543.920247] Lustre: nbp8-MDT0000: haven't heard from client 49177e4d-6795-4fd3-da13-b83c5eb8efbc (at 10.141.2.242@o2ib417) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a3dd6a4000, cur 1590758668 expire 1590758518 last 1590758441 [76543.993206] Lustre: Skipped 3 previous similar messages [76823.955093] Lustre: MGS: Connection restored to f445d0d1-9ba9-9ecc-e477-e71233b3e595 (at 10.141.2.242@o2ib417) [76823.955099] Lustre: Skipped 1 previous similar message [77450.953434] Lustre: nbp8-MDT0000: haven't heard from client a4522323-9a7c-6718-ed11-760457a93927 (at 10.141.2.242@o2ib417) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a27f2fac00, cur 1590759575 expire 1590759425 last 1590759348 [77451.026369] Lustre: Skipped 1 previous similar message [77675.961626] Lustre: nbp8-MDT0000: haven't heard from client 2bf5f9f9-8629-b90d-f5db-c4335c0f81ac (at 10.151.11.28@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a350f5d800, cur 1590759800 expire 1590759650 last 1590759573 [77676.033742] Lustre: Skipped 13 previous similar messages [77768.874648] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [77768.907870] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.11.28@o2ib (318): c: 30, oc: 0, rc: 32 [78530.706848] Lustre: MGS: Connection restored to a8e54fa4-a808-bcd6-6163-0bda1c60dba3 (at 10.149.11.16@o2ib313) [78530.706854] Lustre: Skipped 1 previous similar message [78536.957452] Lustre: MGS: Connection restored to 5bd5ffc9-14b8-df9a-b857-081db70589ba (at 10.149.12.57@o2ib313) [78536.957458] Lustre: Skipped 1 previous similar message [78845.069711] Lustre: MGS: Connection restored to d67b54f7-1924-d98c-25ae-619e2d8cce80 (at 10.151.28.108@o2ib) [78845.069716] Lustre: Skipped 97 previous similar messages [79003.304044] Lustre: MGS: Connection restored to 91bffc85-701c-9d9b-b27f-06523fb9b1e2 (at 10.151.55.161@o2ib) [79003.304049] Lustre: Skipped 99 previous similar messages [79246.834058] Lustre: MGS: Connection restored to 2eba1cab-24eb-62c9-4965-2bcde5c4e630 (at 10.141.2.214@o2ib417) [79246.834068] Lustre: Skipped 1 previous similar message [80085.219006] Lustre: MGS: Connection restored to ae2ce2d7-609c-31c4-7188-49bb27b41c88 (at 10.151.3.45@o2ib) [80085.219011] Lustre: Skipped 939 previous similar messages [80092.246063] Lustre: MGS: Connection restored to 6a26c704-3e26-0689-dda3-7b69657a96e7 (at 10.151.7.42@o2ib) [80092.246070] Lustre: Skipped 1 previous similar message [80172.671427] Lustre: MGS: Connection restored to ea28c456-9c1c-4b9b-6e38-8f27494b0b95 (at 10.151.9.221@o2ib) [80172.671433] Lustre: Skipped 55 previous similar messages [80191.485566] Lustre: MGS: Connection restored to fb2b11ed-74fe-e7d9-0879-46e964034bbe (at 10.151.0.173@o2ib) [80191.485572] Lustre: Skipped 109 previous similar messages [80246.624220] Lustre: MGS: Connection restored to 0e8d7337-649e-af76-e03a-af274d8d9e18 (at 10.151.39.103@o2ib) [80246.624226] Lustre: Skipped 75 previous similar messages [80356.982101] Lustre: MGS: Connection restored to 73770fd8-44c4-40b1-823b-7218b2f0125d (at 10.151.56.20@o2ib) [80356.982107] Lustre: Skipped 57 previous similar messages [80503.974106] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [80504.007326] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.34.215@o2ib (277): c: 32, oc: 0, rc: 32 [80507.974264] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [80508.007471] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.30.237@o2ib (233): c: 32, oc: 0, rc: 32 [80511.974403] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [80512.007604] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.30.245@o2ib (219): c: 32, oc: 0, rc: 32 [80514.974515] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [80515.007722] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 4 previous similar messages [80515.040922] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.34.237@o2ib (220): c: 32, oc: 0, rc: 32 [80515.081556] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 4 previous similar messages [80519.974741] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [80520.007954] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 2 previous similar messages [80520.041156] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.34.248@o2ib (304): c: 32, oc: 0, rc: 32 [80520.081792] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 2 previous similar messages [80520.116536] Lustre: MGS: Connection restored to a90e64d1-bdb1-9da5-22f7-89523b29c3f9 (at 10.151.30.119@o2ib) [80520.116540] Lustre: Skipped 105 previous similar messages [80532.976208] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [80533.009421] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 8 previous similar messages [80533.042618] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.35.220@o2ib (304): c: 32, oc: 0, rc: 32 [80533.083254] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 8 previous similar messages [80559.976215] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [80560.009416] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [80560.042325] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.34.227@o2ib (295): c: 32, oc: 0, rc: 32 [80560.082958] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [80594.977453] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [80595.010672] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 21 previous similar messages [80595.044161] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.33.250@o2ib (304): c: 32, oc: 0, rc: 32 [80595.084798] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 21 previous similar messages [80977.944823] Lustre: MGS: Connection restored to 101f0beb-d365-180d-bd78-79dfef9aaf64 (at 10.151.43.10@o2ib) [80977.944829] Lustre: Skipped 171 previous similar messages [81650.345546] Lustre: MGS: Connection restored to c36882b2-82c9-6af7-dc48-1fdd47cab2c7 (at 10.151.55.188@o2ib) [81650.345552] Lustre: Skipped 447 previous similar messages [82745.157026] Lustre: MGS: Connection restored to 894914cc-8398-81b9-f374-fae12fdd849a (at 10.151.35.13@o2ib) [82745.157032] Lustre: Skipped 325 previous similar messages [83786.394136] Lustre: MGS: Connection restored to ae2ce2d7-609c-31c4-7188-49bb27b41c88 (at 10.151.3.45@o2ib) [83786.394142] Lustre: Skipped 3 previous similar messages [84520.550102] Lustre: MGS: Connection restored to 1288088f-168d-a1c7-ced4-7f8d94ba61c2 (at 10.141.2.118@o2ib417) [84520.550108] Lustre: Skipped 507 previous similar messages [85305.237898] Lustre: nbp8-MDT0000: haven't heard from client 501f2613-0cf2-3281-d50f-096f49f9940f (at 10.151.0.187@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a2e83a7000, cur 1590767429 expire 1590767279 last 1590767202 [85305.310008] Lustre: Skipped 1 previous similar message [85414.152762] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [85414.185968] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 6 previous similar messages [85414.219177] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.17.46@o2ib (319): c: 30, oc: 0, rc: 32 [85414.259533] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 6 previous similar messages [85430.153317] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [85430.186541] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [85430.219462] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.0.187@o2ib (352): c: 30, oc: 0, rc: 32 [85430.259802] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [86188.983623] Lustre: MGS: Connection restored to 2815d300-c88d-3010-1b5a-73f4e9f6f449 (at 10.151.17.46@o2ib) [86188.983633] Lustre: Skipped 311 previous similar messages [86283.126330] Lustre: MGS: Connection restored to 08472217-d939-c2c3-7d6a-db99b3620f0b (at 10.149.1.26@o2ib313) [86283.126335] Lustre: Skipped 11 previous similar messages [86472.610115] Lustre: MGS: Connection restored to 92a0f4ef-a318-23a8-fb6b-03a212dfc94a (at 10.151.30.23@o2ib) [86472.610120] Lustre: Skipped 31 previous similar messages [86775.982546] Lustre: MGS: Connection restored to 8b852f61-c1fa-b975-9dcc-822cc6f72ac2 (at 10.151.30.166@o2ib) [86775.982552] Lustre: Skipped 11 previous similar messages [87548.852694] Lustre: MGS: Connection restored to 292f8db8-6630-b90a-97eb-44bc0616fac2 (at 10.151.28.169@o2ib) [87548.852700] Lustre: Skipped 245 previous similar messages [88422.326702] Lustre: MGS: Connection restored to 9533b3e3-c6e4-201f-8234-c01301baaac4 (at 10.151.19.179@o2ib) [88422.326708] Lustre: Skipped 211 previous similar messages [89089.376990] Lustre: nbp8-MDT0000: haven't heard from client 20c15a51-9172-4998-08ca-22d60a3bbd96 (at 10.151.32.11@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a042e41c00, cur 1590771213 expire 1590771063 last 1590770986 [89089.449102] Lustre: Skipped 5 previous similar messages [89170.186525] LNet: 69485:0:(o2iblnd_cb.c:2602:kiblnd_passive_connect()) Conn stale 10.151.32.11@o2ib version 12/12 incarnation 1589551035536772/1590771157534906 [89170.238705] Lustre: MGS: Connection restored to d9703531-97c2-541a-4026-8a8322871217 (at 10.151.32.11@o2ib) [89170.238709] Lustre: Skipped 377 previous similar messages [89662.397255] Lustre: nbp8-MDT0000: haven't heard from client b3be53e8-d9d2-0645-6629-6c9cc858b850 (at 10.151.32.11@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a0a6a3ec00, cur 1590771786 expire 1590771636 last 1590771559 [89662.469359] Lustre: Skipped 1 previous similar message [89741.310811] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [89741.344010] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.32.11@o2ib (305): c: 30, oc: 0, rc: 32 [90321.227755] Lustre: MGS: Connection restored to 38518503-06ba-230b-9f4d-3e82c004b841 (at 10.151.23.97@o2ib) [90321.227760] Lustre: Skipped 131 previous similar messages [90981.135511] Lustre: MGS: Connection restored to 0f6abe20-2ce7-7a91-deab-ea99275f92d8 (at 10.151.34.75@o2ib) [90981.135517] Lustre: Skipped 77 previous similar messages [91998.309472] Lustre: MGS: Connection restored to fafa8a83-30be-8b20-8ae9-2f55b23d7e15 (at 10.151.8.47@o2ib) [91998.309478] Lustre: Skipped 93 previous similar messages [92628.148173] Lustre: MGS: Connection restored to b80dde3a-fa22-1aab-8fa6-25d0fb6d956f (at 10.151.31.187@o2ib) [92628.148179] Lustre: Skipped 153 previous similar messages [93231.326101] Lustre: MGS: Connection restored to e7c0c45e-3885-a530-7fd8-8b4d4a2e6045 (at 10.151.3.40@o2ib) [93231.326108] Lustre: Skipped 327 previous similar messages [93850.305748] Lustre: MGS: Connection restored to eab002e9-341f-f4d4-daa3-b7d874f21f7b (at 10.151.31.150@o2ib) [93850.305754] Lustre: Skipped 195 previous similar messages [94184.474813] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [94184.508027] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.45.116@o2ib (303): c: 32, oc: 0, rc: 32 [94520.032656] Lustre: MGS: Connection restored to 084bee6b-ec4f-00bd-3d1e-1f737f3f7e2d (at 10.151.35.137@o2ib) [94520.032662] Lustre: Skipped 105 previous similar messages [95120.331404] Lustre: nbp8-MDT0000: Connection restored to 0f1c9f85-b0aa-8463-a012-18f9065e3a86 (at 10.149.3.72@o2ib313) [95120.331409] Lustre: Skipped 284 previous similar messages [95726.190047] Lustre: MGS: Connection restored to a9cd08f8-a6e0-378a-b962-f8b889250fdf (at 10.151.23.122@o2ib) [95726.190053] Lustre: Skipped 8 previous similar messages [96544.930701] Lustre: MGS: Connection restored to 9b9f6d7b-99de-f7f0-03b4-96d69e25b803 (at 10.151.28.71@o2ib) [96544.930706] Lustre: Skipped 147 previous similar messages [97415.364396] Lustre: MGS: Connection restored to 5e99824c-c4e1-f6bb-9970-5aa2d9f87fc4 (at 10.151.38.117@o2ib) [97415.364402] Lustre: Skipped 81 previous similar messages [98133.183392] Lustre: MGS: Connection restored to 61770e92-e8b7-5bbb-3e9b-6ec38523b45a (at 10.151.32.135@o2ib) [98133.183398] Lustre: Skipped 71 previous similar messages [98750.695442] Lustre: MGS: Connection restored to 2ffa08a2-60ef-5aba-3a30-c060617e1232 (at 10.151.55.189@o2ib) [98750.695447] Lustre: Skipped 37 previous similar messages [99477.955020] Lustre: MGS: Connection restored to 213ce36d-196b-e655-e6ff-aae1369a9ff3 (at 10.151.23.202@o2ib) [99477.955026] Lustre: Skipped 577 previous similar messages [100300.344692] Lustre: MGS: Connection restored to 352f937a-1501-76db-4c11-be1739cf0d49 (at 10.151.29.248@o2ib) [100300.344697] Lustre: Skipped 121 previous similar messages [100900.942883] Lustre: MGS: Connection restored to 287e5db3-64db-5d1e-09a7-e5ea40043d7c (at 10.149.2.78@o2ib313) [100900.942889] Lustre: Skipped 373 previous similar messages [101500.972224] Lustre: MGS: Connection restored to b5556e4c-85dd-c855-f85d-d20be828580d (at 10.153.17.100@o2ib233) [101500.972230] Lustre: Skipped 1455 previous similar messages [102111.661420] Lustre: MGS: Connection restored to f041938a-9f4f-f146-07f8-06c709c1a4e1 (at 10.149.5.19@o2ib313) [102111.661426] Lustre: Skipped 527 previous similar messages [102240.858982] Lustre: nbp8-MDT0000: haven't heard from client d09a0fd3-c149-166b-8cf8-0ffd7ab799ac (at 10.153.13.77@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a2e9a54400, cur 1590784364 expire 1590784214 last 1590784137 [102240.932230] Lustre: Skipped 1 previous similar message [103002.719145] Lustre: MGS: Connection restored to effe69f9-6511-c514-dc41-d52d92ed88ad (at 10.151.7.37@o2ib) [103002.719151] Lustre: Skipped 101 previous similar messages [103638.731027] Lustre: MGS: Connection restored to 3555158b-c47b-61b9-2a90-24735dfae9e6 (at 10.151.14.194@o2ib) [103638.731033] Lustre: Skipped 139 previous similar messages [104311.128330] Lustre: MGS: Connection restored to f5416920-2d86-6df2-f113-b9fe496603cb (at 10.151.38.36@o2ib) [104311.128336] Lustre: Skipped 253 previous similar messages [104943.121522] Lustre: MGS: Connection restored to 8b7e9a8a-d6d1-3818-9797-c6db7b01ce44 (at 10.151.32.23@o2ib) [104943.121532] Lustre: Skipped 97 previous similar messages [105659.824829] Lustre: MGS: Connection restored to 06ca9284-da8c-feaf-8520-b650ed3d379a (at 10.151.8.35@o2ib) [105659.824835] Lustre: Skipped 131 previous similar messages [105694.983569] Lustre: nbp8-MDT0000: haven't heard from client fd5a43b5-6797-1f88-2a00-af5da951abfd (at 10.149.1.69@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a3cc00f000, cur 1590787818 expire 1590787668 last 1590787591 [105695.056538] Lustre: Skipped 1 previous similar message [106267.921452] Lustre: MGS: Connection restored to 411eaac0-8258-869b-92e7-fe850fbf82fc (at 10.151.19.142@o2ib) [106267.921458] Lustre: Skipped 155 previous similar messages [106868.558829] Lustre: MGS: Connection restored to 56e21ce4-7431-1907-89fb-689589a46734 (at 10.151.42.136@o2ib) [106868.558834] Lustre: Skipped 233 previous similar messages [107472.072243] Lustre: MGS: Connection restored to a35076a8-142a-dd99-1899-2bdcc5254f68 (at 10.151.7.202@o2ib) [107472.072249] Lustre: Skipped 131 previous similar messages [108309.759936] Lustre: MGS: Connection restored to f017008b-6d2f-8fd1-d1ca-2b40fefc465c (at 10.151.10.19@o2ib) [108309.759942] Lustre: Skipped 185 previous similar messages [108945.512046] Lustre: MGS: Connection restored to 298e6afc-e65c-a882-5ab2-e4de19e1a80f (at 10.151.32.154@o2ib) [108945.512052] Lustre: Skipped 339 previous similar messages [109369.028398] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [109369.061900] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.3.28@o2ib (306): c: 31, oc: 0, rc: 32 [109637.316925] Lustre: MGS: Connection restored to f5303ff0-3f2b-4994-eeab-a06f75a9211b (at 10.151.24.96@o2ib) [109637.316930] Lustre: Skipped 181 previous similar messages [110498.646435] Lustre: MGS: Connection restored to 352f937a-1501-76db-4c11-be1739cf0d49 (at 10.151.29.248@o2ib) [110498.646441] Lustre: Skipped 131 previous similar messages [111194.955453] Lustre: MGS: Connection restored to 2978f6ac-4c7b-a5f0-5826-62aaa4dbffe4 (at 10.141.2.95@o2ib417) [111194.955459] Lustre: Skipped 97 previous similar messages [111850.092957] Lustre: MGS: Connection restored to 87ca0228-d756-5f9a-0c88-408d41649a90 (at 10.151.37.247@o2ib) [111850.092962] Lustre: Skipped 127 previous similar messages [112484.072133] Lustre: MGS: Connection restored to 62168799-ffd3-66ba-8c23-becd27eff00d (at 10.151.28.200@o2ib) [112484.072139] Lustre: Skipped 61 previous similar messages [113395.038239] Lustre: MGS: Connection restored to 356b75de-9243-965f-27ec-c08f41367e13 (at 10.149.5.111@o2ib313) [113395.038245] Lustre: Skipped 91 previous similar messages [114180.010918] Lustre: MGS: Connection restored to e941b57a-5aba-ea44-9c8c-3c95f8f8ace5 (at 10.151.52.44@o2ib) [114180.010924] Lustre: Skipped 151 previous similar messages [115306.281529] Lustre: MGS: Connection restored to 88950724-02b5-9265-7849-2fce64d0fa30 (at 10.151.44.98@o2ib) [115306.281535] Lustre: Skipped 275 previous similar messages [116124.879473] Lustre: MGS: Connection restored to a8e54fa4-a808-bcd6-6163-0bda1c60dba3 (at 10.149.11.16@o2ib313) [116124.879479] Lustre: Skipped 83 previous similar messages [116751.300749] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [116751.334259] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.52.127@o2ib (291): c: 32, oc: 0, rc: 32 [116760.934707] Lustre: MGS: Connection restored to 4b77d8a2-3291-222e-96f1-ca929225d169 (at 10.151.44.78@o2ib) [116760.934713] Lustre: Skipped 107 previous similar messages [117369.565094] Lustre: MGS: Connection restored to 86e70108-7c6f-2a01-11f8-02bc89f70429 (at 10.149.1.180@o2ib313) [117369.565100] Lustre: Skipped 39 previous similar messages [118164.305499] Lustre: MGS: Connection restored to 7e0d654d-e856-8c5e-66ea-c7d3f7b934dd (at 10.149.4.94@o2ib313) [118164.305505] Lustre: Skipped 143 previous similar messages [118945.647178] Lustre: MGS: Connection restored to dd3106e3-0240-9d4d-b249-a8cc9e192167 (at 10.151.52.139@o2ib) [118945.647184] Lustre: Skipped 757 previous similar messages [119727.740796] Lustre: MGS: Connection restored to 98ce1dd2-c268-4e1a-9433-704964c4d5a7 (at 10.151.35.201@o2ib) [119727.740802] Lustre: Skipped 209 previous similar messages [120330.668068] Lustre: MGS: Connection restored to 3976dcc5-0a6b-a16f-590d-b992642c8d2b (at 10.151.47.96@o2ib) [120330.668073] Lustre: Skipped 75 previous similar messages [120940.414883] Lustre: MGS: Connection restored to ecd18963-4c52-ec3d-5e91-a846ac8ba99b (at 10.151.31.132@o2ib) [120940.414888] Lustre: Skipped 205 previous similar messages [121600.730284] Lustre: MGS: Connection restored to 9bf3ece5-133f-86eb-33b9-09e94abdaaa2 (at 10.151.29.146@o2ib) [121600.730290] Lustre: Skipped 139 previous similar messages [122397.287906] Lustre: MGS: Connection restored to e7fb9928-bd86-6699-c511-fa7e59bc21b4 (at 10.151.28.183@o2ib) [122397.287913] Lustre: Skipped 75 previous similar messages [122541.601860] Lustre: MGS: haven't heard from client ee25b38a-bf9c-3142-f0dc-d87b407e86d8 (at 10.149.5.119@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89814a47ec00, cur 1590804664 expire 1590804514 last 1590804437 [122541.672536] Lustre: Skipped 1 previous similar message [123064.110701] Lustre: MGS: Connection restored to 4036b63c-6033-d1ad-11b6-1043aa747703 (at 10.151.9.57@o2ib) [123064.110707] Lustre: Skipped 307 previous similar messages [123721.607803] Lustre: MGS: Connection restored to 55523566-3243-b747-97b3-ff6c75f50033 (at 10.151.28.130@o2ib) [123721.607809] Lustre: Skipped 29 previous similar messages [124350.247733] Lustre: MGS: Connection restored to 9d0dab20-7abc-5ce7-1524-6c0844ffa20b (at 10.151.10.239@o2ib) [124350.247739] Lustre: Skipped 241 previous similar messages [124546.674704] Lustre: nbp8-MDT0000: haven't heard from client 2cc1d689-b869-d6d5-8289-f8bfd367b91c (at 10.151.4.43@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897987bfd400, cur 1590806669 expire 1590806519 last 1590806442 [124546.746810] Lustre: Skipped 1 previous similar message [124655.590497] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [124655.623987] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.4.43@o2ib (334): c: 30, oc: 0, rc: 32 [124675.590266] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [124675.623776] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.4.84@o2ib (347): c: 30, oc: 0, rc: 32 [124753.690671] Lustre: MGS: haven't heard from client 86d47259-d261-4883-5dad-37148723c8c7 (at 10.151.9.27@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a401179800, cur 1590806876 expire 1590806726 last 1590806649 [124753.760203] Lustre: Skipped 3 previous similar messages [124769.684301] Lustre: nbp8-MDT0000: haven't heard from client eb87f8fd-7fdb-376f-ff35-2e483d3d6826 (at 10.151.9.21@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897d3348c800, cur 1590806892 expire 1590806742 last 1590806665 [124769.756403] Lustre: Skipped 30 previous similar messages [124846.596471] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [124846.629964] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.9.54@o2ib (302): c: 31, oc: 0, rc: 32 [124849.596615] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [124849.630101] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [124849.663296] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.9.61@o2ib (307): c: 31, oc: 0, rc: 32 [124849.703637] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [124854.596795] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [124854.630301] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 4 previous similar messages [124854.663793] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.9.71@o2ib (311): c: 31, oc: 0, rc: 32 [124854.704135] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 4 previous similar messages [124867.597235] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [124867.630735] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 4 previous similar messages [124867.664218] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.9.96@o2ib (324): c: 31, oc: 0, rc: 32 [124867.704540] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 4 previous similar messages [124887.598013] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [124887.631512] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 11 previous similar messages [124887.665284] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.9.36@o2ib (344): c: 31, oc: 0, rc: 32 [124887.705625] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 11 previous similar messages [125089.584275] Lustre: MGS: Connection restored to d30b2ab3-ba05-e108-07ec-e19e07bf6135 (at 10.151.54.120@o2ib) [125089.584281] Lustre: Skipped 91 previous similar messages [125570.713203] Lustre: nbp8-MDT0000: haven't heard from client 5ab94935-7ded-7a4e-c562-e6c652ac3858 (at 10.149.12.57@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a283e35000, cur 1590807693 expire 1590807543 last 1590807466 [125570.786455] Lustre: Skipped 30 previous similar messages [125975.969817] Lustre: MGS: Connection restored to 236e4718-7ace-3f5e-7684-c92c30825d5d (at 10.151.56.27@o2ib) [125975.969822] Lustre: Skipped 45 previous similar messages [126611.114769] Lustre: MGS: Connection restored to 88adb8d3-126c-bfd6-7d0e-81b2cabbb256 (at 10.151.6.239@o2ib) [126611.114775] Lustre: Skipped 79 previous similar messages [126906.672755] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [126906.706241] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 6 previous similar messages [126906.739721] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.56.20@o2ib (277): c: 32, oc: 0, rc: 32 [126906.780346] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 6 previous similar messages [127250.774518] Lustre: nbp8-MDT0000: haven't heard from client 403f3f46-b3e6-bbc5-9fdc-4098e6dda1ef (at 10.151.11.197@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897acf2b3c00, cur 1590809373 expire 1590809223 last 1590809146 [127250.847227] Lustre: Skipped 1 previous similar message [127345.687795] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [127345.721295] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.11.197@o2ib (321): c: 30, oc: 0, rc: 32 [127366.688603] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [127366.722098] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.10.193@o2ib (342): c: 30, oc: 0, rc: 32 [127402.984361] Lustre: MGS: Connection restored to e7c0c45e-3885-a530-7fd8-8b4d4a2e6045 (at 10.151.3.40@o2ib) [127402.984367] Lustre: Skipped 387 previous similar messages [127806.795041] Lustre: nbp8-MDT0000: haven't heard from client 53257e0a-aba3-8e11-369a-46a60b9a4f1b (at 10.151.3.35@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8979c838e800, cur 1590809929 expire 1590809779 last 1590809702 [127806.867149] Lustre: Skipped 3 previous similar messages [127906.709340] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [127906.742841] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.3.35@o2ib (326): c: 30, oc: 0, rc: 32 [128105.171055] Lustre: MGS: Connection restored to 1948f051-fb19-5081-2acd-0180418f3bba (at 10.151.46.159@o2ib) [128105.171061] Lustre: Skipped 171 previous similar messages [128708.798751] Lustre: MGS: Connection restored to 63b15710-3e37-692d-cef5-a0b9ba8f68af (at 10.151.46.209@o2ib) [128708.798757] Lustre: Skipped 259 previous similar messages [129316.927401] Lustre: MGS: Connection restored to cf3b4b13-a65a-3148-9adc-4b93007c0e29 (at 10.149.3.131@o2ib313) [129316.927407] Lustre: Skipped 835 previous similar messages [129954.058201] Lustre: MGS: Connection restored to 6642cc6f-52fb-4354-bb0d-11be78f5f640 (at 10.151.32.247@o2ib) [129954.058207] Lustre: Skipped 181 previous similar messages [130816.486559] Lustre: MGS: Connection restored to 0773b5f7-08d7-b1cf-ba3d-7ce46d84f19b (at 10.149.10.8@o2ib313) [130816.486564] Lustre: Skipped 99 previous similar messages [131459.092774] Lustre: MGS: Connection restored to 01fd61da-b17a-efa8-5a61-32ab773f7ddb (at 10.149.7.146@o2ib313) [131459.092780] Lustre: Skipped 43 previous similar messages [132169.769980] Lustre: MGS: Connection restored to 341a80d3-c0d7-e3ba-5761-1c14491a5644 (at 10.151.8.13@o2ib) [132169.769986] Lustre: Skipped 49 previous similar messages [132772.276302] Lustre: MGS: Connection restored to b91212da-a2a1-65d5-fb82-d356cdb7402e (at 10.151.33.41@o2ib) [132772.276308] Lustre: Skipped 109 previous similar messages [133430.219398] Lustre: MGS: Connection restored to a5d14d76-5e92-e78c-0bf5-fdaad7727071 (at 10.151.45.157@o2ib) [133430.219404] Lustre: Skipped 255 previous similar messages [134054.587496] Lustre: MGS: Connection restored to 30e269f8-f4f1-f6d8-5422-992ee7d1650a (at 10.151.29.45@o2ib) [134054.587501] Lustre: Skipped 103 previous similar messages [134396.038362] Lustre: nbp8-MDT0000: haven't heard from client 5297ab7f-92bb-4722-0e7e-f526174b2c19 (at 10.151.63.40@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897d3fbae800, cur 1590816518 expire 1590816368 last 1590816291 [134396.110755] Lustre: Skipped 3 previous similar messages [134478.950166] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [134478.983665] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [134479.016866] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.63.40@o2ib (310): c: 30, oc: 0, rc: 32 [134479.057502] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [134833.156408] Lustre: MGS: Connection restored to 54a1727a-499b-ae73-d481-3b0e39226e93 (at 10.151.39.212@o2ib) [134833.156413] Lustre: Skipped 391 previous similar messages [135669.428481] Lustre: MGS: Connection restored to 5f29299b-274c-c2dd-1956-1e8f20f65262 (at 10.151.53.187@o2ib) [135669.428491] Lustre: Skipped 161 previous similar messages [136318.980665] Lustre: MGS: Connection restored to 73bdcffc-26d5-393d-ecc6-e195141b977f (at 10.149.2.108@o2ib313) [136318.980671] Lustre: Skipped 83 previous similar messages [136552.122542] Lustre: MGS: haven't heard from client 6cca1aba-dd79-c2ac-1366-88938f33dc83 (at 10.151.10.211@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8979d1f9bc00, cur 1590818674 expire 1590818524 last 1590818447 [136552.192618] Lustre: Skipped 1 previous similar message [136569.116290] Lustre: nbp8-MDT0000: haven't heard from client 67f55bd9-c5e5-c5f4-c8dc-b61fb1e2094e (at 10.151.13.139@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a090668800, cur 1590818691 expire 1590818541 last 1590818464 [136569.188987] Lustre: Skipped 30 previous similar messages [136656.029224] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [136656.062732] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.11.234@o2ib (313): c: 31, oc: 0, rc: 32 [136660.029369] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [136660.062876] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.13.134@o2ib (317): c: 31, oc: 0, rc: 32 [136663.029433] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [136663.062926] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 2 previous similar messages [136663.096406] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.13.139@o2ib (321): c: 31, oc: 0, rc: 32 [136663.137328] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 2 previous similar messages [136666.029593] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [136666.063107] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 3 previous similar messages [136666.096594] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.13.146@o2ib (324): c: 31, oc: 0, rc: 32 [136666.137517] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 3 previous similar messages [136673.030863] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [136673.064368] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 7 previous similar messages [136673.097847] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.10.19@o2ib (329): c: 31, oc: 0, rc: 32 [136673.138478] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 7 previous similar messages [136691.030512] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [136691.064018] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 13 previous similar messages [136691.097797] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.10.55@o2ib (348): c: 31, oc: 0, rc: 32 [136691.138424] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 13 previous similar messages [136927.684305] Lustre: MGS: Connection restored to 1ad4b8bd-bed0-bcba-f85a-6bb0b7ce527c (at 10.151.1.32@o2ib) [136927.684311] Lustre: Skipped 101 previous similar messages [137530.643665] Lustre: MGS: Connection restored to 85d28150-e0b6-adbd-53c7-ec3c8114e910 (at 10.151.45.63@o2ib) [137530.643671] Lustre: Skipped 347 previous similar messages [138512.673724] Lustre: MGS: Connection restored to a2da3b28-f65b-91bf-e092-8383540eb28f (at 10.151.35.121@o2ib) [138512.673729] Lustre: Skipped 75 previous similar messages [139179.827189] Lustre: MGS: Connection restored to db453f9f-90b9-326f-203d-32024430a41c (at 10.151.43.28@o2ib) [139179.827195] Lustre: Skipped 63 previous similar messages [140042.969840] Lustre: MGS: Connection restored to e988576e-c7d0-2611-89cf-aef9e3b908e4 (at 10.151.44.68@o2ib) [140042.969846] Lustre: Skipped 29 previous similar messages [140326.164075] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [140326.197583] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.0.88@o2ib (238): c: 32, oc: 0, rc: 32 [140711.825887] Lustre: MGS: Connection restored to 01a9fd6a-2a18-6fdc-933e-c7e1877e9cb5 (at 10.151.0.88@o2ib) [140711.825892] Lustre: Skipped 41 previous similar messages [140840.468875] Process accounting resumed [141339.329106] Lustre: MGS: Connection restored to dd8409e6-ac9d-5d77-4d2c-cde52f3e38fe (at 10.151.2.233@o2ib) [141339.329112] Lustre: Skipped 199 previous similar messages [141958.110060] Lustre: MGS: Connection restored to 1a162bfd-345b-63c7-53b3-820eead01574 (at 10.149.3.52@o2ib313) [141958.110066] Lustre: Skipped 85 previous similar messages [142613.568107] Lustre: MGS: Connection restored to 256e7d1c-616d-b3ff-0d98-0ba6273c96b6 (at 10.151.55.152@o2ib) [142613.568114] Lustre: Skipped 55 previous similar messages [143453.262332] Lustre: MGS: Connection restored to d30b2ab3-ba05-e108-07ec-e19e07bf6135 (at 10.151.54.120@o2ib) [143453.262338] Lustre: Skipped 59 previous similar messages [144408.793446] Lustre: MGS: Connection restored to 59e5fb3d-6b69-1e80-6c80-0117a31b26e5 (at 10.151.32.47@o2ib) [144408.793451] Lustre: Skipped 25 previous similar messages [145036.094866] Lustre: MGS: Connection restored to c32fcc7f-e908-1576-6392-ea1adea1b702 (at 10.151.30.36@o2ib) [145036.094872] Lustre: Skipped 257 previous similar messages [145741.306276] Lustre: MGS: Connection restored to 7fc23e5a-4459-97ef-d61e-12bcee2a5efa (at 10.151.49.110@o2ib) [145741.306282] Lustre: Skipped 103 previous similar messages [146453.775660] Lustre: MGS: Connection restored to e43902c6-bd46-d9e3-751a-a9b02ef0a58a (at 10.151.12.72@o2ib) [146453.775666] Lustre: Skipped 15 previous similar messages [147099.461488] Lustre: MGS: Connection restored to 4fc790b9-c3c0-0d2f-635e-9a3a6fa6b5fc (at 10.151.44.140@o2ib) [147099.461494] Lustre: Skipped 11 previous similar messages [147336.510794] Lustre: nbp8-MDT0000: haven't heard from client 419eb282-86ab-fade-755b-157b7bdf89db (at 10.149.4.47@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a2d8625400, cur 1590829458 expire 1590829308 last 1590829231 [147336.583800] Lustre: Skipped 30 previous similar messages [147478.425852] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [147478.459349] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.30.147@o2ib (338): c: 30, oc: 0, rc: 32 [147738.212749] Lustre: MGS: Connection restored to 46d6d806-37b0-e7ca-5989-622052ace292 (at 10.151.32.140@o2ib) [147738.212755] Lustre: Skipped 37 previous similar messages [147882.530563] Lustre: nbp8-MDT0000: haven't heard from client 4222981f-729b-bdc8-6f5b-43b9fb4397ba (at 10.149.2.231@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899f567e0400, cur 1590830004 expire 1590829854 last 1590829777 [147882.603834] Lustre: Skipped 9 previous similar messages [147958.533320] Lustre: nbp8-MDT0000: haven't heard from client 0d1b2cc7-ee44-1949-a81b-c9bf27d4169c (at 10.151.29.88@o2ib) in 212 seconds. I think it's dead, and I am evicting it. exp ffff89a040be5000, cur 1590830080 expire 1590829930 last 1590829868 [147958.605719] Lustre: Skipped 13 previous similar messages [148029.445918] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [148029.479422] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.28.43@o2ib (321): c: 30, oc: 0, rc: 32 [148070.447472] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [148070.480973] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.30.18@o2ib (322): c: 30, oc: 0, rc: 32 [148075.448653] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [148075.482144] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 2 previous similar messages [148075.515631] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.29.183@o2ib (329): c: 30, oc: 0, rc: 32 [148075.556544] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 2 previous similar messages [148392.164056] Lustre: MGS: Connection restored to 0f6abe20-2ce7-7a91-deab-ea99275f92d8 (at 10.151.34.75@o2ib) [148392.164062] Lustre: Skipped 41 previous similar messages [148905.567997] Lustre: nbp8-MDT0000: haven't heard from client 8690fa14-5bc9-6458-b00f-7a0f61e75558 (at 10.151.38.121@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897ec2f7d000, cur 1590831027 expire 1590830877 last 1590830800 [148905.640705] Lustre: Skipped 7 previous similar messages [148994.481218] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [148994.514717] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [148994.547928] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.38.121@o2ib (316): c: 30, oc: 0, rc: 32 [148994.588817] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [149229.051874] Lustre: MGS: Connection restored to 2c527fbf-71da-9a29-4586-494890ab2e85 (at 10.141.6.55@o2ib417) [149229.051880] Lustre: Skipped 121 previous similar messages [149496.588496] Lustre: nbp8-MDT0000: haven't heard from client fad14f0f-882d-5dc1-bd8c-927d59cde1d1 (at 10.141.2.100@o2ib417) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a2e2e5a000, cur 1590831618 expire 1590831468 last 1590831391 [149496.661773] Lustre: Skipped 1 previous similar message [149920.625939] Lustre: MGS: Connection restored to 3e5dd4d8-6bc6-cb66-b1d8-091472a18673 (at 10.151.33.124@o2ib) [149920.625944] Lustre: Skipped 91 previous similar messages [150379.532746] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [150379.566241] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.12.52@o2ib (287): c: 32, oc: 0, rc: 32 [150656.276392] Lustre: MGS: Connection restored to 5c8ebccc-dccb-c8ea-f217-10323fc93e2d (at 10.151.12.201@o2ib) [150656.276398] Lustre: Skipped 21 previous similar messages [152106.750466] Lustre: MGS: Connection restored to c88b3763-5343-8aee-bd66-097dd28521d1 (at 10.151.31.163@o2ib) [152106.750472] Lustre: Skipped 51 previous similar messages [153039.437394] Lustre: MGS: Connection restored to 0b220cdd-6a7a-47f1-e185-58c97fda9444 (at 10.151.0.145@o2ib) [153039.437400] Lustre: Skipped 15 previous similar messages [153233.703656] Lustre: MGS: Connection restored to 83408335-c336-205f-dc0c-b1ab69844e82 (at 10.141.3.211@o2ib417) [153233.703661] Lustre: Skipped 5 previous similar messages [153947.904803] Lustre: MGS: Connection restored to 7e4eb75b-3b41-6902-0d22-1de796666cb2 (at 10.151.54.159@o2ib) [153947.904809] Lustre: Skipped 35 previous similar messages [154169.589548] Lustre: MGS: Connection restored to 76c27a47-b5c5-e193-a026-0229aa7c01b0 (at 10.151.28.26@o2ib) [154169.589554] Lustre: Skipped 19 previous similar messages [154212.480612] Lustre: MGS: Connection restored to 1a162bfd-345b-63c7-53b3-820eead01574 (at 10.149.3.52@o2ib313) [154212.480618] Lustre: Skipped 3 previous similar messages [154313.608962] Lustre: MGS: Connection restored to 752c96bf-a702-4b5a-9b36-6e9481a54cdd (at 10.151.42.154@o2ib) [154313.608967] Lustre: Skipped 1 previous similar message [154554.268010] Lustre: MGS: Connection restored to 02094384-8149-a16d-4704-d5fa53dfef9c (at 10.151.46.123@o2ib) [154554.268016] Lustre: Skipped 15 previous similar messages [154605.690350] Lustre: MGS: Connection restored to cec19be5-8e6b-8baa-1007-218ce6112161 (at 10.151.19.220@o2ib) [154605.690355] Lustre: Skipped 231 previous similar messages [154779.468699] Lustre: MGS: Connection restored to 9832cd94-15aa-2b68-189a-a94c87e898e3 (at 10.149.4.106@o2ib313) [154779.468705] Lustre: Skipped 1 previous similar message [155102.678868] Lustre: MGS: Connection restored to b1de6b7f-9681-fc8a-0b52-d5075c812369 (at 10.149.1.28@o2ib313) [155102.678874] Lustre: Skipped 61 previous similar messages [157313.961782] Lustre: MGS: Connection restored to d90b2a41-c662-0545-b9d1-05efc695b7e0 (at 10.151.12.69@o2ib) [157313.961786] Lustre: Skipped 283 previous similar messages [157759.380953] Lustre: MGS: Connection restored to 8db08479-29d8-f2a4-bbb0-bceb29eb25b1 (at 10.149.5.90@o2ib313) [157759.380958] Lustre: Skipped 1 previous similar message [158047.334250] Lustre: MGS: Connection restored to 7b3686c1-ef5b-d5f8-5f4c-7101a5662cf2 (at 10.151.28.231@o2ib) [158047.334258] Lustre: Skipped 5 previous similar messages [158131.816079] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [158131.849578] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.19.129@o2ib (303): c: 32, oc: 0, rc: 32 [158212.743933] Lustre: MGS: Connection restored to 08e007d1-6387-7e7e-dd23-caa506bd1fa2 (at 10.151.12.116@o2ib) [158212.743939] Lustre: Skipped 57 previous similar messages [158642.022132] Lustre: MGS: Connection restored to 1397bd8e-8f8a-aa7e-685f-b819f8a8f0e2 (at 10.151.46.126@o2ib) [158642.022138] Lustre: Skipped 5 previous similar messages [159347.364666] Lustre: MGS: Connection restored to 0e8d7337-649e-af76-e03a-af274d8d9e18 (at 10.151.39.103@o2ib) [159347.364672] Lustre: Skipped 153 previous similar messages [160040.803762] Lustre: MGS: Connection restored to d3047c11-1425-054f-77fb-7c2b67fb96c1 (at 10.151.45.101@o2ib) [160040.803768] Lustre: Skipped 29 previous similar messages [160350.897575] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [160350.931077] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.19.123@o2ib (303): c: 32, oc: 0, rc: 32 [160699.151420] Lustre: MGS: Connection restored to 04c64128-92d2-ff43-6e86-17a07c0d4e6c (at 10.151.3.148@o2ib) [160699.151426] Lustre: Skipped 201 previous similar messages [160756.912487] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [160756.945982] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.19.127@o2ib (303): c: 32, oc: 0, rc: 32 [160942.009292] Lustre: nbp8-MDT0000: haven't heard from client fb275ca7-0c31-91dc-a14c-4f8306241613 (at 10.149.7.227@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899f187c1000, cur 1590843063 expire 1590842913 last 1590842836 [160942.082612] Lustre: Skipped 1 previous similar message [161444.581597] Lustre: MGS: Connection restored to 50014ec2-8371-ec58-49fe-20e23770ef85 (at 10.151.28.206@o2ib) [161444.581603] Lustre: Skipped 3 previous similar messages [162120.289923] Lustre: MGS: Connection restored to f2be5dc8-d6ba-01b3-723a-70f0412d74c7 (at 10.149.14.172@o2ib313) [162120.289928] Lustre: Skipped 127 previous similar messages [162903.658944] Lustre: MGS: Connection restored to a8195a24-3fde-7dce-390c-419f4437b521 (at 10.151.8.63@o2ib) [162903.658950] Lustre: Skipped 163 previous similar messages [162992.994638] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [162993.028131] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.7.197@o2ib (285): c: 32, oc: 0, rc: 32 [163483.012620] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [163483.046124] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.19.125@o2ib (209): c: 32, oc: 0, rc: 32 [163551.547602] Lustre: MGS: Connection restored to acead0f2-a35e-fce9-b4a2-811fe16d18ef (at 10.151.6.205@o2ib) [163551.547608] Lustre: Skipped 263 previous similar messages [164430.701652] Lustre: MGS: Connection restored to 4f9340e2-502a-3414-f02a-490721186950 (at 10.151.11.13@o2ib) [164430.701658] Lustre: Skipped 91 previous similar messages [165039.680265] Lustre: MGS: Connection restored to c3e41625-7382-ced5-cd88-3ba3c8b9a5f1 (at 10.151.32.22@o2ib) [165039.680271] Lustre: Skipped 361 previous similar messages [165715.332360] Lustre: MGS: Connection restored to f0a93947-e845-b965-5b0f-3115166834b7 (at 10.149.10.48@o2ib313) [165715.332365] Lustre: Skipped 91 previous similar messages [166375.048248] Lustre: MGS: Connection restored to 1a162bfd-345b-63c7-53b3-820eead01574 (at 10.149.3.52@o2ib313) [166375.048254] Lustre: Skipped 83 previous similar messages [167098.112119] Lustre: MGS: Connection restored to 1db5c6a7-7536-e061-2c73-b432bedaa6b6 (at 10.151.19.12@o2ib) [167098.112125] Lustre: Skipped 39 previous similar messages [167764.158144] Lustre: MGS: Connection restored to ad7eca1f-951c-3cd4-68c0-c353555c68f9 (at 10.151.44.26@o2ib) [167764.158150] Lustre: Skipped 435 previous similar messages [168479.802243] Lustre: MGS: Connection restored to f5912cf8-6200-a546-dfd6-3aef874f442d (at 10.149.11.71@o2ib313) [168479.802249] Lustre: Skipped 369 previous similar messages [169137.695753] Lustre: MGS: Connection restored to 76c01b4d-38c9-ce7d-8761-c6872414dd3c (at 10.151.32.21@o2ib) [169137.695759] Lustre: Skipped 277 previous similar messages [169872.812125] Lustre: MGS: Connection restored to 3f03c423-1840-ea18-b183-5810f0854488 (at 10.149.15.123@o2ib313) [169872.812131] Lustre: Skipped 323 previous similar messages [170622.635436] Lustre: MGS: Connection restored to eadc71b6-9b3d-6f03-cebb-950c603deb6d (at 10.151.56.150@o2ib) [170622.635443] Lustre: Skipped 427 previous similar messages [171764.814425] Lustre: MGS: Connection restored to f667f8ad-38ae-7270-dbe2-58fd105cdd5f (at 10.151.28.221@o2ib) [171764.814430] Lustre: Skipped 31 previous similar messages [172410.501014] Lustre: MGS: Connection restored to 9fbf0cb8-baea-c078-de2d-129cd8080ab1 (at 10.151.30.211@o2ib) [172410.501020] Lustre: Skipped 9 previous similar messages [173052.898812] Lustre: MGS: Connection restored to b1dad37c-0596-bf45-6fbe-93abeb61adc4 (at 10.151.19.126@o2ib) [173052.898818] Lustre: Skipped 101 previous similar messages [173672.368641] Lustre: MGS: Connection restored to 866c4eff-b146-65e5-e50c-0f750d6dd23c (at 10.151.32.27@o2ib) [173672.368650] Lustre: Skipped 395 previous similar messages [174274.898877] Lustre: MGS: Connection restored to da8d1d43-d6c7-9aa4-3369-24d1d0028bdb (at 10.149.10.107@o2ib313) [174274.898883] Lustre: Skipped 37 previous similar messages [174881.740195] Lustre: MGS: Connection restored to 3a2a3cee-bb6b-6158-e9bc-363e38492ee9 (at 10.141.6.71@o2ib417) [174881.740201] Lustre: Skipped 81 previous similar messages [175508.957476] Lustre: MGS: Connection restored to 454f2615-8b9b-17b9-9e94-c9a637fd40b2 (at 10.141.2.135@o2ib417) [175508.957482] Lustre: Skipped 397 previous similar messages [176136.017760] Lustre: MGS: Connection restored to e154ec6d-9a5f-79db-1831-5e3efeb7449e (at 10.153.10.10@o2ib233) [176136.017767] Lustre: Skipped 81 previous similar messages [176314.572758] Lustre: nbp8-MDT0000: haven't heard from client 76ccace1-4164-2bd7-bf57-0ed79d064ec1 (at 10.151.29.44@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a085b32c00, cur 1590858435 expire 1590858285 last 1590858208 [176314.645166] Lustre: Skipped 1 previous similar message [176388.485455] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [176388.518953] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.29.45@o2ib (308): c: 31, oc: 0, rc: 32 [176395.486702] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [176395.520203] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.29.59@o2ib (350): c: 31, oc: 0, rc: 32 [176438.487273] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [176438.520764] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.29.44@o2ib (350): c: 30, oc: 0, rc: 32 [176447.487556] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [176447.521049] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.29.61@o2ib (350): c: 30, oc: 0, rc: 32 [176742.281857] Lustre: MGS: Connection restored to ad1617f4-b06f-6e54-b353-91ab16bdd5b3 (at 10.151.10.72@o2ib) [176742.281863] Lustre: Skipped 945 previous similar messages [177519.497132] Lustre: MGS: Connection restored to 76994480-6154-b874-b2a8-53f9772cc6e7 (at 10.141.2.184@o2ib417) [177519.497138] Lustre: Skipped 93 previous similar messages [178146.152329] Lustre: MGS: Connection restored to cf170eda-9f05-c2bc-b22f-dd80ba63466f (at 10.151.30.41@o2ib) [178146.152335] Lustre: Skipped 379 previous similar messages [178781.663784] Lustre: nbp8-MDT0000: haven't heard from client 130e3b03-e9e0-0671-dfc3-ca83fd9cab60 (at 10.153.10.30@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897fb3f69400, cur 1590860902 expire 1590860752 last 1590860675 [178781.737066] Lustre: Skipped 3 previous similar messages [179243.469720] Lustre: MGS: Connection restored to e154ec6d-9a5f-79db-1831-5e3efeb7449e (at 10.153.10.10@o2ib233) [179243.469726] Lustre: Skipped 493 previous similar messages [179853.267409] Lustre: MGS: Connection restored to eb120f55-295a-e301-45bd-3e0f9785ba7d (at 10.151.55.146@o2ib) [179853.267415] Lustre: Skipped 953 previous similar messages [180465.864223] Lustre: MGS: Connection restored to 58ee1f8f-a40e-bcc1-5ecb-c2b5148d18fa (at 10.151.30.233@o2ib) [180465.864229] Lustre: Skipped 413 previous similar messages [181084.649802] Lustre: MGS: Connection restored to 50fe6650-3e7e-aa8f-53df-97c1d30a66eb (at 10.151.19.165@o2ib) [181084.649808] Lustre: Skipped 3 previous similar messages [181195.661385] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [181195.694884] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.18.52@o2ib (303): c: 32, oc: 0, rc: 32 [181745.498609] Lustre: MGS: Connection restored to 70ac826c-f725-63b8-a250-156652ed39b9 (at 10.149.3.151@o2ib313) [181745.498614] Lustre: Skipped 279 previous similar messages [182406.705754] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [182406.739255] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.56.119@o2ib (303): c: 32, oc: 0, rc: 32 [182418.426994] Lustre: MGS: Connection restored to e154ec6d-9a5f-79db-1831-5e3efeb7449e (at 10.153.10.10@o2ib233) [182418.426999] Lustre: Skipped 233 previous similar messages [183168.861093] Lustre: MGS: Connection restored to a7d64437-ecbb-242f-b3ec-58bcb0880f55 (at 10.151.32.84@o2ib) [183168.861099] Lustre: Skipped 1021 previous similar messages [183847.346375] Lustre: MGS: Connection restored to f75872e0-3341-a5d5-671d-02c080587815 (at 10.149.16.28@o2ib313) [183847.346381] Lustre: Skipped 141 previous similar messages [184591.169039] Lustre: MGS: Connection restored to 6296cee4-436b-1552-edb0-7ace037f5d8f (at 10.151.32.38@o2ib) [184591.169045] Lustre: Skipped 233 previous similar messages [185365.885800] Lustre: MGS: Connection restored to f5912cf8-6200-a546-dfd6-3aef874f442d (at 10.149.11.71@o2ib313) [185365.885806] Lustre: Skipped 39 previous similar messages [186075.409533] Lustre: MGS: Connection restored to be4f7e0e-b040-c70c-5719-6d3c12143f42 (at 10.151.29.115@o2ib) [186075.409539] Lustre: Skipped 243 previous similar messages [186291.937284] Lustre: nbp8-MDT0000: haven't heard from client 4e9c42e9-6b52-a050-4a5e-31c78eccdc27 (at 10.149.1.90@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899f47efe400, cur 1590868412 expire 1590868262 last 1590868185 [186292.010264] Lustre: Skipped 1 previous similar message [186747.830704] Lustre: MGS: Connection restored to 21e556fb-00bb-1737-3d2e-de4b0e06707a (at 10.151.14.88@o2ib) [186747.830709] Lustre: Skipped 239 previous similar messages [187165.971100] Lustre: nbp8-MDT0000: haven't heard from client fdfb1ead-252c-2907-9dc9-0af16172f347 (at 10.149.5.48@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a3aa467c00, cur 1590869286 expire 1590869136 last 1590869059 [187166.044089] Lustre: Skipped 3 previous similar messages [187527.048659] Lustre: MGS: Connection restored to ab4ae0d9-731c-7a8c-acf5-c31d954d290b (at 10.151.37.123@o2ib) [187527.048664] Lustre: Skipped 57 previous similar messages [188581.461606] Lustre: MGS: Connection restored to 170e97a9-e559-32e1-fc3a-facb4786cfa4 (at 10.153.10.30@o2ib233) [188581.461612] Lustre: Skipped 29 previous similar messages [189116.952647] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [189116.986158] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.33.135@o2ib (278): c: 32, oc: 0, rc: 32 [189162.953347] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [189162.986846] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.33.126@o2ib (299): c: 32, oc: 0, rc: 32 [189629.059653] Lustre: nbp8-MDT0000: haven't heard from client f4593c86-1792-07d3-b2d6-7c72887b5f98 (at 10.149.4.91@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8980df0de400, cur 1590871749 expire 1590871599 last 1590871522 [189629.132668] Lustre: Skipped 5 previous similar messages [189637.061317] Lustre: MGS: haven't heard from client 75d017be-3715-b46a-0120-ed930184355e (at 10.149.4.91@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897eab9c6400, cur 1590871757 expire 1590871607 last 1590871530 [189637.131722] Lustre: Skipped 1 previous similar message [190091.033527] Lustre: MGS: Connection restored to da50eb49-334f-3d72-f4b0-f16d59ada3ac (at 10.149.1.48@o2ib313) [190091.033532] Lustre: Skipped 147 previous similar messages [190187.085833] Lustre: MGS: haven't heard from client e0a75f0b-88ed-bd62-5510-6ca1d48d5c55 (at 10.149.2.108@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8980df0dd800, cur 1590872307 expire 1590872157 last 1590872080 [190187.156520] Lustre: Skipped 1 previous similar message [190330.491164] Lustre: MGS: Connection restored to bfca701a-d1aa-a6c2-513c-99fdcaf0f48d (at 10.149.3.111@o2ib313) [190330.491170] Lustre: Skipped 45 previous similar messages [190687.504382] Lustre: MGS: Connection restored to 710e8d0d-5908-3785-8b05-6c3c299156f2 (at 10.151.42.152@o2ib) [190687.504387] Lustre: Skipped 5 previous similar messages [191164.465871] Lustre: MGS: Connection restored to 66944729-60c0-c6df-e44f-88e9b0a305d8 (at 10.151.33.126@o2ib) [191164.465877] Lustre: Skipped 31 previous similar messages [191774.579377] Lustre: MGS: Connection restored to d8887404-9482-f94b-819e-8ef5d90d1df0 (at 10.151.38.156@o2ib) [191774.579383] Lustre: Skipped 595 previous similar messages [192400.108963] Lustre: MGS: Connection restored to ff1b3077-1223-25f4-34f8-5057336b733d (at 10.141.6.135@o2ib417) [192400.108969] Lustre: Skipped 97 previous similar messages [193577.291967] Lustre: MGS: Connection restored to a3a1da54-3ea3-dd12-7cb0-6d41cd85ab71 (at 10.151.14.184@o2ib) [193577.291973] Lustre: Skipped 57 previous similar messages [194587.719678] Lustre: MGS: Connection restored to a7d64437-ecbb-242f-b3ec-58bcb0880f55 (at 10.151.32.84@o2ib) [194587.719683] Lustre: Skipped 119 previous similar messages [195222.540775] Lustre: MGS: Connection restored to f34cbd6c-2a3c-c8eb-dacd-a7c5e04ffc3c (at 10.151.22.13@o2ib) [195222.540781] Lustre: Skipped 89 previous similar messages [195865.028922] Lustre: MGS: Connection restored to bd9ba461-200b-1442-d888-cad91bededbd (at 10.149.6.51@o2ib313) [195865.028928] Lustre: Skipped 81 previous similar messages [196485.351493] Lustre: MGS: Connection restored to 9d0eb1c2-9f5d-5532-0edc-0512a98cfce2 (at 10.149.1.1@o2ib313) [196485.351499] Lustre: Skipped 83 previous similar messages [197604.295697] Lustre: MGS: Connection restored to 0ccffbb6-881a-2c60-ddd3-dabf577c025d (at 10.151.8.94@o2ib) [197604.295703] Lustre: Skipped 195 previous similar messages [198597.739290] Lustre: MGS: Connection restored to 2efcc868-3185-ef57-948d-8dccfe40481e (at 10.151.35.251@o2ib) [198597.739295] Lustre: Skipped 165 previous similar messages [199827.996297] Lustre: MGS: Connection restored to 199671bb-7b30-93cb-9541-11952b2e9720 (at 10.149.15.227@o2ib313) [199827.996302] Lustre: Skipped 229 previous similar messages [200247.998725] Lustre: MGS: Connection restored to f3bd9e6b-29ea-a897-6861-5df23a3ce7d7 (at 10.149.3.149@o2ib313) [200247.998731] Lustre: Skipped 59 previous similar messages [200385.455290] Lustre: MGS: haven't heard from client f62cc89f-5969-413b-ede3-f06db42c9649 (at 10.153.10.156@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897ef0f25c00, cur 1590882505 expire 1590882355 last 1590882278 [200385.526284] Lustre: Skipped 11 previous similar messages [200416.091567] Lustre: MGS: Connection restored to 7141b3e0-b5a7-68e5-002d-84e5dd333078 (at 10.151.36.245@o2ib) [200416.091573] Lustre: Skipped 3 previous similar messages [200831.654513] Lustre: MGS: Connection restored to d8522f51-9eb8-3e86-a1b0-60ed40952839 (at 10.151.3.179@o2ib) [200831.654519] Lustre: Skipped 25 previous similar messages [201529.458337] Lustre: MGS: Connection restored to e4f2a051-ff16-399d-763d-720c0bee9730 (at 10.149.1.4@o2ib313) [201529.458343] Lustre: Skipped 241 previous similar messages [202199.607057] Lustre: MGS: Connection restored to 90cdb164-6ddb-7b7e-93df-781abb386d99 (at 10.151.24.129@o2ib) [202199.607062] Lustre: Skipped 221 previous similar messages [203015.094969] Lustre: MGS: Connection restored to 1a162bfd-345b-63c7-53b3-820eead01574 (at 10.149.3.52@o2ib313) [203015.094975] Lustre: Skipped 175 previous similar messages [203675.084885] Lustre: MGS: Connection restored to a78c09bf-393d-8f6d-e893-943157a65419 (at 10.149.10.26@o2ib313) [203675.084890] Lustre: Skipped 5 previous similar messages [204011.587304] Lustre: nbp8-MDT0000: haven't heard from client 02872753-6bbb-468f-84a3-33ed2ae31e38 (at 10.149.1.6@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899fe2b47800, cur 1590886131 expire 1590885981 last 1590885904 [204011.660023] Lustre: Skipped 1 previous similar message [204326.430185] Lustre: MGS: Connection restored to 4e4408ab-8834-94dc-837d-7165099ca9b3 (at 10.151.8.76@o2ib) [204326.430190] Lustre: Skipped 333 previous similar messages [204996.458611] Lustre: MGS: Connection restored to b930058a-3d9c-8e6e-ac53-15d5bc088355 (at 10.151.33.84@o2ib) [204996.458617] Lustre: Skipped 211 previous similar messages [205975.448370] Lustre: MGS: Connection restored to 5df1551a-3e8d-1005-12a6-30d0edf22592 (at 10.151.3.59@o2ib) [205975.448376] Lustre: Skipped 197 previous similar messages [206605.898389] Lustre: MGS: Connection restored to 92fe2e1b-f270-4401-3600-9a66b1703739 (at 10.151.54.126@o2ib) [206605.898395] Lustre: Skipped 121 previous similar messages [207309.849108] Lustre: MGS: Connection restored to c6cbc796-c252-5c74-dc59-8df0d72a2eee (at 10.151.55.160@o2ib) [207309.849114] Lustre: Skipped 153 previous similar messages [207910.341905] Lustre: MGS: Connection restored to 77192031-d44f-e31e-7073-d9b8c21e8866 (at 10.151.11.73@o2ib) [207910.341911] Lustre: Skipped 443 previous similar messages [208753.579966] Lustre: MGS: Connection restored to 8b4f7da7-69b6-7cc6-a73a-c8dd4b879b3e (at 10.151.51.219@o2ib) [208753.579972] Lustre: Skipped 479 previous similar messages [209359.434000] Lustre: MGS: Connection restored to d971419a-c125-5917-5088-53240cba29cf (at 10.151.47.90@o2ib) [209359.434006] Lustre: Skipped 75 previous similar messages [210078.430472] Lustre: MGS: Connection restored to 00062ffe-9beb-0c95-4d8a-a286495ea877 (at 10.151.1.94@o2ib) [210078.430477] Lustre: Skipped 249 previous similar messages [211033.567110] Lustre: MGS: Connection restored to 64f6f2c9-0bc1-ae05-b286-02fe0809c778 (at 10.151.54.162@o2ib) [211033.567116] Lustre: Skipped 133 previous similar messages [211833.951828] Lustre: MGS: Connection restored to 85da14c3-96de-5390-1a62-4371e504a1f5 (at 10.141.5.225@o2ib417) [211833.951834] Lustre: Skipped 65 previous similar messages [212460.436850] Lustre: MGS: Connection restored to 96b552f4-29bc-44d4-c4eb-9f4fb003a5da (at 10.141.6.61@o2ib417) [212460.436856] Lustre: Skipped 93 previous similar messages [213096.463825] Lustre: MGS: Connection restored to 31ac9602-9317-0433-a718-972633e1ca15 (at 10.149.1.6@o2ib313) [213096.463831] Lustre: Skipped 117 previous similar messages [213730.560139] Lustre: MGS: Connection restored to 87f19c83-5f0b-c2a4-65e6-7d6921f4cc25 (at 10.151.54.94@o2ib) [213730.560145] Lustre: Skipped 13 previous similar messages [214439.438996] Lustre: MGS: Connection restored to 71c0b54c-03f4-d80e-6e60-bbe2fdace5f7 (at 10.151.38.108@o2ib) [214439.439002] Lustre: Skipped 507 previous similar messages [215113.122323] Lustre: MGS: Connection restored to 5421c3f0-0b17-8631-d684-3c3fae1afee4 (at 10.151.11.185@o2ib) [215113.122329] Lustre: Skipped 127 previous similar messages [215767.074615] Lustre: MGS: Connection restored to fefeedd7-b459-cb4a-295c-a184cae9a961 (at 10.151.4.64@o2ib) [215767.074621] Lustre: Skipped 7 previous similar messages [217095.194676] Lustre: MGS: Connection restored to 400be2be-0deb-98dc-fd32-a96626a77e7e (at 10.151.4.53@o2ib) [217095.194682] Lustre: Skipped 135 previous similar messages [217319.391496] Lustre: MGS: Connection restored to 6d2cdec9-9833-1c64-aaa9-958e118cc581 (at 10.149.14.228@o2ib313) [217319.391502] Lustre: Skipped 3 previous similar messages [217536.074785] Lustre: MGS: Connection restored to 4aed73e5-26e5-a503-4e6c-79edec41178d (at 10.151.18.152@o2ib) [217536.074791] Lustre: Skipped 207 previous similar messages [217842.891554] Lustre: MGS: Connection restored to e64a9065-602f-b113-63e4-4598c4399a89 (at 10.149.1.83@o2ib313) [217842.891560] Lustre: Skipped 3 previous similar messages [218630.362037] Lustre: MGS: Connection restored to 5e13c3b8-dd6e-9167-d958-5a42729f1494 (at 10.151.3.191@o2ib) [218630.362042] Lustre: Skipped 137 previous similar messages [219136.051040] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [219136.084534] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.53.107@o2ib (303): c: 32, oc: 0, rc: 32 [219324.588264] Lustre: MGS: Connection restored to 1a162bfd-345b-63c7-53b3-820eead01574 (at 10.149.3.52@o2ib313) [219324.588270] Lustre: Skipped 65 previous similar messages [219493.065122] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [219493.098608] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.1.94@o2ib (301): c: 32, oc: 0, rc: 32 [219495.064204] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [219495.097709] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 2 previous similar messages [219495.131209] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.1.98@o2ib (304): c: 32, oc: 0, rc: 32 [219495.171567] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 2 previous similar messages [219594.067882] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [219594.101373] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [219594.134574] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.1.95@o2ib (304): c: 32, oc: 0, rc: 32 [219594.174914] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [219947.296642] Lustre: MGS: Connection restored to 66f6e90d-7ffa-bfd7-68dd-d75842dbd8a6 (at 10.149.4.140@o2ib313) [219947.296647] Lustre: Skipped 15 previous similar messages [220608.254035] Lustre: MGS: Connection restored to 5ba69706-60bd-fc5c-f65a-87a29d554331 (at 10.151.30.113@o2ib) [220608.254041] Lustre: Skipped 179 previous similar messages [220685.198004] Lustre: nbp8-MDT0000: haven't heard from client cdc1c667-e65d-306f-07a4-ff937a0ac115 (at 10.151.19.226@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899ec56c6000, cur 1590902804 expire 1590902654 last 1590902577 [220685.270702] Lustre: Skipped 1 previous similar message [220802.112195] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [220802.145702] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [220802.178905] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.19.226@o2ib (344): c: 30, oc: 0, rc: 32 [220802.219826] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [220819.112755] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [220819.146258] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.23.144@o2ib (290): c: 32, oc: 0, rc: 32 [221341.499861] Lustre: MGS: Connection restored to fda8e5ee-e754-938d-ae05-838c5769e465 (at 10.151.54.121@o2ib) [221341.499867] Lustre: Skipped 511 previous similar messages [221668.233477] Lustre: nbp8-MDT0000: haven't heard from client ed02e50d-db50-7849-82e6-c2102f5a3f19 (at 10.149.1.57@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899fa3e21c00, cur 1590903787 expire 1590903637 last 1590903560 [221668.306485] Lustre: Skipped 1 previous similar message [221941.252177] Lustre: MGS: haven't heard from client cf959e4c-55b7-95ba-1830-e827ebadd637 (at 10.151.19.226@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897975776400, cur 1590904060 expire 1590903910 last 1590903833 [221941.322295] Lustre: Skipped 3 previous similar messages [221948.243210] Lustre: nbp8-MDT0000: haven't heard from client 3d6fe293-e1ec-3560-f4d1-be20a684542b (at 10.151.19.226@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899e4a711800, cur 1590904067 expire 1590903917 last 1590903840 [222065.158466] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [222065.191950] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.19.226@o2ib (343): c: 31, oc: 0, rc: 32 [222264.353528] Lustre: MGS: Connection restored to 20c94b5c-3bc2-da8c-76a4-e42d5b402030 (at 10.151.4.28@o2ib) [222264.353535] Lustre: Skipped 137 previous similar messages [222560.267108] Lustre: MGS: haven't heard from client 44d12886-6836-deb0-456e-d4f9fa568c96 (at 10.153.16.38@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897dcdbc3c00, cur 1590904679 expire 1590904529 last 1590904452 [222567.267015] Lustre: nbp8-MDT0000: haven't heard from client 1d4f9066-da3f-2961-5976-1e74395326ae (at 10.153.16.38@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897eb4e30800, cur 1590904686 expire 1590904536 last 1590904459 [223317.779746] Lustre: MGS: Connection restored to 54fe4958-f967-e953-a351-4bf657b91071 (at 10.151.53.107@o2ib) [223317.779752] Lustre: Skipped 123 previous similar messages [223966.106262] Lustre: MGS: Connection restored to aa97ecd4-3b7f-d76c-0f4f-fdbe0a393acb (at 10.149.9.199@o2ib313) [223966.106268] Lustre: Skipped 243 previous similar messages [224569.513920] Lustre: MGS: Connection restored to eb1d0b66-0c1f-e993-df5a-92e96e558b68 (at 10.149.1.52@o2ib313) [224569.513925] Lustre: Skipped 35 previous similar messages [225209.579959] Lustre: MGS: Connection restored to 36d8a2fd-6d3f-fab0-e271-4ddc77020533 (at 10.151.28.236@o2ib) [225209.579965] Lustre: Skipped 85 previous similar messages [226081.549776] Lustre: MGS: Connection restored to 45fba122-b316-cbb5-7fdf-b65721a0fb01 (at 10.151.3.121@o2ib) [226081.549782] Lustre: Skipped 123 previous similar messages [226874.170519] Lustre: MGS: Connection restored to bd2d7f81-a8c3-6532-9e36-ffe67f5239ef (at 10.141.2.61@o2ib417) [226874.170524] Lustre: Skipped 81 previous similar messages [227003.163189] Process accounting resumed [227495.354358] Lustre: MGS: Connection restored to dea2b915-e1b7-7ebd-b17b-00394a92cb47 (at 10.151.3.130@o2ib) [227495.354364] Lustre: Skipped 103 previous similar messages [228121.065289] Lustre: MGS: Connection restored to 7736954e-ac6f-2f4d-abee-b3f1e09d25e8 (at 10.151.9.136@o2ib) [228121.065294] Lustre: Skipped 115 previous similar messages [228754.688939] Lustre: MGS: Connection restored to 54617e84-6c4b-2e3a-ae54-282a8b606cc7 (at 10.141.2.28@o2ib417) [228754.688946] Lustre: Skipped 25 previous similar messages [229355.357638] Lustre: MGS: Connection restored to 26e0858a-1505-a067-b3be-2c80ce6cb760 (at 10.141.3.232@o2ib417) [229355.357644] Lustre: Skipped 159 previous similar messages [229986.584559] Lustre: MGS: Connection restored to aedf05c2-7619-1a12-a5a6-fb97e5dcc9d6 (at 10.151.36.148@o2ib) [229986.584564] Lustre: Skipped 57 previous similar messages [230625.006060] Lustre: MGS: Connection restored to b5e9023c-12f5-18f1-4f3d-c96c5fe89733 (at 10.141.2.93@o2ib417) [230625.006066] Lustre: Skipped 91 previous similar messages [231272.046481] Lustre: MGS: Connection restored to 6d468527-61d8-b83d-1baa-c21b0457125f (at 10.141.5.6@o2ib417) [231272.046487] Lustre: Skipped 145 previous similar messages [231978.437190] Lustre: MGS: Connection restored to b930058a-3d9c-8e6e-ac53-15d5bc088355 (at 10.151.33.84@o2ib) [231978.437195] Lustre: Skipped 135 previous similar messages [232735.544705] Lustre: MGS: Connection restored to 4feb0e64-ae6a-ead9-3d62-f6e1111ee5ee (at 10.151.9.150@o2ib) [232735.544711] Lustre: Skipped 59 previous similar messages [233466.921575] Lustre: MGS: Connection restored to 05b4156e-d5c3-5ed2-dbdf-a6ab5770c6b9 (at 10.151.9.124@o2ib) [233466.921580] Lustre: Skipped 31 previous similar messages [234115.101623] Lustre: MGS: Connection restored to a99d67b9-bbf3-9058-64dd-5a249fbeedd2 (at 10.151.15.14@o2ib) [234115.101629] Lustre: Skipped 67 previous similar messages [234718.193823] Lustre: MGS: Connection restored to b8978ab3-0f9a-6608-f685-250972ee3cf5 (at 10.141.2.105@o2ib417) [234718.193829] Lustre: Skipped 188 previous similar messages [235361.993586] Lustre: MGS: Connection restored to b670da30-cb83-103d-0982-be89255e7f6e (at 10.151.3.192@o2ib) [235361.993592] Lustre: Skipped 91 previous similar messages [235590.748337] Lustre: MGS: haven't heard from client cec91969-413a-6c11-4650-0059e6d44984 (at 10.151.7.40@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897fcaf61000, cur 1590917709 expire 1590917559 last 1590917482 [235602.744493] Lustre: nbp8-MDT0000: haven't heard from client 01875270-178c-8cd5-a536-66b4623dfcfe (at 10.151.6.63@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897cc745e800, cur 1590917721 expire 1590917571 last 1590917494 [235602.816617] Lustre: Skipped 70 previous similar messages [235687.658550] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [235687.692048] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.7.40@o2ib (311): c: 31, oc: 0, rc: 32 [235692.658770] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [235692.692269] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.7.51@o2ib (316): c: 31, oc: 0, rc: 32 [235695.658881] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [235695.692381] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 2 previous similar messages [235695.725869] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.7.57@o2ib (319): c: 31, oc: 0, rc: 32 [235695.766210] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 2 previous similar messages [235699.659952] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [235699.693460] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 3 previous similar messages [235699.726948] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.7.64@o2ib (323): c: 31, oc: 0, rc: 32 [235699.767298] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 3 previous similar messages [235704.659144] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [235704.692627] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 4 previous similar messages [235704.726115] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.8.20@o2ib (329): c: 31, oc: 0, rc: 32 [235704.766461] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 4 previous similar messages [235716.659596] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [235716.693103] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 6 previous similar messages [235716.726584] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.7.98@o2ib (340): c: 31, oc: 0, rc: 32 [235716.766932] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 6 previous similar messages [235733.660272] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [235733.693778] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 15 previous similar messages [235733.727545] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.15.105@o2ib (314): c: 31, oc: 0, rc: 32 [235733.768469] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 15 previous similar messages [235766.661517] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [235766.695024] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 31 previous similar messages [235766.728790] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.9.91@o2ib (349): c: 31, oc: 0, rc: 32 [235766.769147] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 31 previous similar messages [236026.588895] Lustre: MGS: Connection restored to 9b94e4b8-a656-a80c-47da-deac5951c76f (at 10.149.15.166@o2ib313) [236026.588900] Lustre: Skipped 173 previous similar messages [236902.853231] Lustre: MGS: Connection restored to 2e86fecc-bee8-0b22-ff22-a242feed6f55 (at 10.151.33.192@o2ib) [236902.853236] Lustre: Skipped 157 previous similar messages [237504.051326] Lustre: MGS: Connection restored to 76c01b4d-38c9-ce7d-8761-c6872414dd3c (at 10.151.32.21@o2ib) [237504.051332] Lustre: Skipped 79 previous similar messages [238595.179827] Lustre: MGS: Connection restored to ad19a607-d676-885e-bdf5-c6c7c231a448 (at 10.149.1.89@o2ib313) [238595.179833] Lustre: Skipped 289 previous similar messages [239287.674360] Lustre: MGS: Connection restored to 4040db01-67f2-2874-dd43-3f69433468b4 (at 10.151.28.110@o2ib) [239287.674366] Lustre: Skipped 85 previous similar messages [240135.771751] Lustre: MGS: Connection restored to b5ed098a-d3bb-a001-86e7-38271b04cb09 (at 10.151.9.158@o2ib) [240135.771760] Lustre: Skipped 85 previous similar messages [240754.518801] Lustre: MGS: Connection restored to afe41bd4-3c7c-959e-85cb-11d7afe24235 (at 10.151.8.157@o2ib) [240754.518806] Lustre: Skipped 219 previous similar messages [241359.225314] Lustre: MGS: Connection restored to f56225d6-a6f9-20f8-7503-240edd9f72de (at 10.151.6.63@o2ib) [241359.225320] Lustre: Skipped 97 previous similar messages [242121.497105] Lustre: MGS: Connection restored to e1c3827f-8378-0090-e29f-1abc0b891bed (at 10.151.12.31@o2ib) [242121.497110] Lustre: Skipped 327 previous similar messages [242780.760392] Lustre: MGS: Connection restored to b35ccd6e-b909-e38d-1613-9499695482c0 (at 10.151.3.78@o2ib) [242780.760397] Lustre: Skipped 247 previous similar messages [243636.666826] Lustre: MGS: Connection restored to c2bf2336-1e0a-a6ae-897e-af927c56651b (at 10.151.3.34@o2ib) [243636.666831] Lustre: Skipped 1 previous similar message [244436.595824] Lustre: MGS: Connection restored to 8b6d5c18-79a8-ce47-6b2c-a03d193cd562 (at 10.149.14.33@o2ib313) [244436.595830] Lustre: Skipped 97 previous similar messages [245374.590401] Lustre: MGS: Connection restored to e5bb5b2b-fc4f-e6fe-a334-118009cc6832 (at 10.149.2.55@o2ib313) [245374.590406] Lustre: Skipped 151 previous similar messages [246147.212727] Lustre: MGS: Connection restored to d241e15f-cf93-e88e-16cd-4d485bea372c (at 10.149.14.174@o2ib313) [246147.212732] Lustre: Skipped 109 previous similar messages [246901.087684] Lustre: MGS: Connection restored to 6cef8b95-2d70-941c-1f5a-99626ab36fcd (at 10.141.2.11@o2ib417) [246901.087690] Lustre: Skipped 243 previous similar messages [247701.748820] Lustre: MGS: Connection restored to 3102d262-dfcf-5a55-9152-5d1ffede99b7 (at 10.151.32.111@o2ib) [247701.748826] Lustre: Skipped 797 previous similar messages [248320.737771] Lustre: MGS: Connection restored to 70b4975c-08ab-619f-4c5a-d611bd9838f2 (at 10.151.29.112@o2ib) [248320.737780] Lustre: Skipped 59 previous similar messages [249302.796024] Lustre: MGS: Connection restored to 757d6ca9-c5e0-b98b-4b76-32bc600b4b83 (at 10.149.11.70@o2ib313) [249302.796030] Lustre: Skipped 201 previous similar messages [249745.175219] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [249745.208709] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 2 previous similar messages [249745.242209] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.33.87@o2ib (284): c: 32, oc: 0, rc: 32 [249745.282844] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 2 previous similar messages [249755.176622] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [249755.210119] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.33.107@o2ib (232): c: 32, oc: 0, rc: 32 [249818.177900] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [249818.211400] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 4 previous similar messages [249818.244896] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.33.31@o2ib (296): c: 32, oc: 0, rc: 32 [249818.285531] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 4 previous similar messages [250177.256008] Lustre: MGS: Connection restored to 341a54e8-0687-be28-77da-95ee7edff1ee (at 10.151.32.34@o2ib) [250177.256019] Lustre: Skipped 213 previous similar messages [250778.117320] Lustre: MGS: Connection restored to 31fcc579-d49d-5bac-4c95-1d3630827b29 (at 10.141.2.212@o2ib417) [250778.117325] Lustre: Skipped 231 previous similar messages [251397.050062] Lustre: MGS: Connection restored to e92d65b9-79e1-4bca-81ab-b56531b75a24 (at 10.151.10.144@o2ib) [251397.050068] Lustre: Skipped 507 previous similar messages [252268.340830] Lustre: MGS: Connection restored to 757ad630-c529-2b5a-a47c-913605314f59 (at 10.151.36.20@o2ib) [252268.340836] Lustre: Skipped 45 previous similar messages [253020.533278] Lustre: MGS: Connection restored to d4a7f315-556b-99db-d3d2-9ba54e2b6bd2 (at 10.151.28.244@o2ib) [253020.533283] Lustre: Skipped 55 previous similar messages [253695.300561] Lustre: MGS: Connection restored to bd27729b-9aea-ca43-8f97-140f9575c053 (at 10.151.30.69@o2ib) [253695.300567] Lustre: Skipped 665 previous similar messages [254343.626686] Lustre: MGS: Connection restored to 50fe6650-3e7e-aa8f-53df-97c1d30a66eb (at 10.151.19.165@o2ib) [254343.626692] Lustre: Skipped 219 previous similar messages [254960.402971] Lustre: MGS: Connection restored to 7a0d5faf-d18b-5224-892e-aac666559533 (at 10.141.4.18@o2ib417) [254960.402977] Lustre: Skipped 81 previous similar messages [255684.483957] Lustre: nbp8-MDT0000: haven't heard from client 6daae159-59fa-7955-5f5d-72a41267e698 (at 10.151.14.132@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a2dca8a000, cur 1590937802 expire 1590937652 last 1590937575 [255684.556665] Lustre: Skipped 70 previous similar messages [255763.396746] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [255763.430245] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 2 previous similar messages [255763.463727] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.2.60@o2ib (304): c: 30, oc: 0, rc: 32 [255763.504077] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 2 previous similar messages [255769.397078] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [255769.430579] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.14.132@o2ib (311): c: 30, oc: 0, rc: 32 [255797.071894] Lustre: MGS: Connection restored to 5e44c96a-f56a-7fa8-f681-517306800fb7 (at 10.151.32.89@o2ib) [255797.071899] Lustre: Skipped 929 previous similar messages [255811.398564] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [255811.432075] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.14.216@o2ib (351): c: 30, oc: 0, rc: 32 [256440.110055] Lustre: MGS: Connection restored to 146bc2dd-ca37-9e24-5311-330843c3bab1 (at 10.151.3.107@o2ib) [256440.110060] Lustre: Skipped 199 previous similar messages [257109.210821] Lustre: MGS: Connection restored to 859d4cb8-8600-4a6b-13be-93182d756131 (at 10.151.39.105@o2ib) [257109.210827] Lustre: Skipped 85 previous similar messages [257780.685166] Lustre: MGS: Connection restored to a7bb350d-7386-7557-d865-c51d244d109c (at 10.151.10.217@o2ib) [257780.685172] Lustre: Skipped 69 previous similar messages [258460.657711] Lustre: MGS: Connection restored to 3f80972f-cf3c-4e38-0c71-9cb0c5e758ec (at 10.149.1.188@o2ib313) [258460.657717] Lustre: Skipped 2759 previous similar messages [259159.629452] Lustre: MGS: Connection restored to ebb10982-c7cc-8aa0-e34c-9a045c41eeb8 (at 10.151.30.142@o2ib) [259159.629459] Lustre: Skipped 91 previous similar messages [259771.864739] Lustre: MGS: Connection restored to ab1e8b4e-5618-76fc-01a3-e2e2bdaa2d8c (at 10.149.3.129@o2ib313) [259771.864745] Lustre: Skipped 295 previous similar messages [260476.604155] Lustre: MGS: Connection restored to e8f4e8a8-72a7-0b09-4f97-fe721a271298 (at 10.151.7.84@o2ib) [260476.604160] Lustre: Skipped 391 previous similar messages [261291.873868] Lustre: MGS: Connection restored to 388b4895-674e-7b6e-fecd-5dc7ed4cb382 (at 10.149.9.211@o2ib313) [261291.873873] Lustre: Skipped 9 previous similar messages [262389.573133] Lustre: MGS: Connection restored to 2d806f70-e767-ef6a-cac2-0a3812dfb917 (at 10.149.14.135@o2ib313) [262389.573139] Lustre: Skipped 201 previous similar messages [262683.745319] Lustre: nbp8-MDT0000: haven't heard from client 0ba7e72a-4ca6-2a98-51f6-33819619921e (at 10.153.10.30@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899f54f0c400, cur 1590944801 expire 1590944651 last 1590944574 [262683.818580] Lustre: Skipped 5 previous similar messages [263236.829297] Lustre: MGS: Connection restored to fd774300-17a9-605b-46c8-e7388235fcda (at 10.149.1.239@o2ib313) [263236.829303] Lustre: Skipped 63 previous similar messages [264017.098411] Lustre: MGS: Connection restored to 62168799-ffd3-66ba-8c23-becd27eff00d (at 10.151.28.200@o2ib) [264017.098417] Lustre: Skipped 231 previous similar messages [265033.591037] Lustre: MGS: Connection restored to 4a8a7200-8bb4-784b-9b59-0142254a4cfd (at 10.151.31.173@o2ib) [265033.591043] Lustre: Skipped 121 previous similar messages [265686.929298] Lustre: MGS: Connection restored to 65567ae0-1400-0b9e-e8a3-454385aedccd (at 10.151.38.29@o2ib) [265686.929309] Lustre: Skipped 327 previous similar messages [266338.257843] Lustre: MGS: Connection restored to d2bde30e-a598-b540-a44c-44d50d7aca87 (at 10.151.13.217@o2ib) [266338.257849] Lustre: Skipped 197 previous similar messages [266939.404106] Lustre: MGS: Connection restored to 68f3562d-4e63-e0ec-a3b7-5b919bf6a36a (at 10.151.31.43@o2ib) [266939.404112] Lustre: Skipped 353 previous similar messages [267795.019852] Lustre: MGS: Connection restored to ae3bb913-29f8-b241-5fd8-234542ea3b34 (at 10.151.29.101@o2ib) [267795.019857] Lustre: Skipped 147 previous similar messages [268397.084547] Lustre: MGS: Connection restored to 1a162bfd-345b-63c7-53b3-820eead01574 (at 10.149.3.52@o2ib313) [268397.084553] Lustre: Skipped 129 previous similar messages [268839.971247] Lustre: MGS: haven't heard from client 5ccf945e-9702-024d-ac48-a3b38368248b (at 10.151.0.52@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897e28eda800, cur 1590950957 expire 1590950807 last 1590950730 [268840.040808] Lustre: Skipped 3 previous similar messages [268850.969471] Lustre: nbp8-MDT0000: haven't heard from client 3e5ed764-eca4-7146-436c-118cb15c6b21 (at 10.151.15.10@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897ef11da000, cur 1590950968 expire 1590950818 last 1590950741 [268851.041869] Lustre: Skipped 92 previous similar messages [268930.882017] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [268930.915517] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.14.92@o2ib (306): c: 31, oc: 0, rc: 32 [268933.882136] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [268933.915624] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.14.199@o2ib (309): c: 31, oc: 0, rc: 32 [268939.882345] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [268939.915838] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 8 previous similar messages [268939.949322] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.14.211@o2ib (316): c: 31, oc: 0, rc: 32 [268939.990239] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 8 previous similar messages [268948.882696] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [268948.916204] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 23 previous similar messages [268948.949970] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.8.149@o2ib (318): c: 31, oc: 0, rc: 32 [268948.990599] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 23 previous similar messages [268966.883359] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [268966.916844] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 34 previous similar messages [268966.950611] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.13.16@o2ib (326): c: 30, oc: 0, rc: 32 [268966.991245] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 34 previous similar messages [269070.738242] Lustre: MGS: Connection restored to de2df681-ae8f-a020-e93e-8aede773ccf7 (at 10.151.30.202@o2ib) [269070.738248] Lustre: Skipped 221 previous similar messages [269711.993571] Lustre: MGS: Connection restored to a00b0069-13ea-bbcb-ea6c-6738d4680787 (at 10.149.9.241@o2ib313) [269711.993576] Lustre: Skipped 353 previous similar messages [270409.026687] Lustre: MGS: haven't heard from client 1ed1f69c-d2d1-f56b-4914-02f37cf72953 (at 10.153.13.77@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899e3836cc00, cur 1590952526 expire 1590952376 last 1590952299 [270409.097414] Lustre: Skipped 92 previous similar messages [270930.328187] Lustre: MGS: Connection restored to caa74562-645f-81b2-e38d-598b9d1a7ae7 (at 10.151.32.146@o2ib) [270930.328193] Lustre: Skipped 293 previous similar messages [271078.768109] Lustre: MGS: Connection restored to 229cfaa5-e019-221c-f750-f3a8a21ec9b1 (at 10.151.32.195@o2ib) [271078.768114] Lustre: Skipped 5 previous similar messages [271373.699135] Lustre: MGS: Connection restored to f2cfcd35-36b6-5339-84fc-a54934ea12d2 (at 10.151.35.242@o2ib) [271373.699140] Lustre: Skipped 1 previous similar message [271675.861263] Lustre: MGS: Connection restored to 27d4cc23-a1a4-ecfd-7337-bd3fd009625b (at 10.149.11.35@o2ib313) [271675.861269] Lustre: Skipped 43 previous similar messages [272320.420274] Lustre: MGS: Connection restored to f467ab6f-3bca-16b3-80e2-2138de6e9fcf (at 10.151.33.190@o2ib) [272320.420280] Lustre: Skipped 185 previous similar messages [272934.691386] Lustre: MGS: Connection restored to ad015e40-9327-4d7e-d362-f46707d9990a (at 10.149.3.213@o2ib313) [272934.691392] Lustre: Skipped 357 previous similar messages [273539.792525] Lustre: MGS: Connection restored to c8872b05-8e08-92c2-2259-7129751dd471 (at 10.151.24.94@o2ib) [273539.792531] Lustre: Skipped 715 previous similar messages [274180.131599] Lustre: MGS: Connection restored to 2f32cb12-1966-3d4c-3ed3-a1206cede70d (at 10.151.9.97@o2ib) [274180.131605] Lustre: Skipped 103 previous similar messages [274914.192567] Lustre: nbp8-MDT0000: haven't heard from client 54e4529d-1739-fc93-45ff-14ec76c27a5c (at 10.151.4.115@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899f6df68400, cur 1590957031 expire 1590956881 last 1590956804 [274914.264978] Lustre: Skipped 3 previous similar messages [274929.276667] LNet: 4151:0:(o2iblnd_cb.c:2602:kiblnd_passive_connect()) Conn stale 10.151.4.115@o2ib version 12/12 incarnation 1590650077281925/1590956969903788 [274929.329149] Lustre: MGS: Connection restored to fd57c382-2aa3-f6c8-b1f9-afdc9b8d7691 (at 10.151.4.115@o2ib) [274929.329153] Lustre: Skipped 19 previous similar messages [274934.197789] Lustre: MGS: haven't heard from client d162d9ee-9d9f-c71e-b762-6f567e4793f2 (at 10.151.4.115@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899eaa711400, cur 1590957051 expire 1590956901 last 1590956824 [275898.465938] Lustre: MGS: Connection restored to d5010093-5631-91e0-8a07-c425428c746c (at 10.151.29.155@o2ib) [275898.465943] Lustre: Skipped 207 previous similar messages [276583.766346] Lustre: MGS: Connection restored to 1a162bfd-345b-63c7-53b3-820eead01574 (at 10.149.3.52@o2ib313) [276583.766351] Lustre: Skipped 449 previous similar messages [277358.562344] Lustre: MGS: Connection restored to 88050e70-b88f-1a66-ca16-7abbae0cc7cf (at 10.151.6.72@o2ib) [277358.562350] Lustre: Skipped 49 previous similar messages [277998.248064] Lustre: MGS: Connection restored to ef03d887-eed6-d4ff-4d00-06a466bd214c (at 10.151.37.77@o2ib) [277998.248069] Lustre: Skipped 113 previous similar messages [278635.289704] Lustre: MGS: Connection restored to f0202ddd-358a-a141-0cc0-b35f22947721 (at 10.151.31.227@o2ib) [278635.289710] Lustre: Skipped 285 previous similar messages [279243.416324] Lustre: MGS: Connection restored to 73ea9bc9-d9f2-20a6-3d26-c0a370b601b2 (at 10.151.35.252@o2ib) [279243.416329] Lustre: Skipped 197 previous similar messages [279857.008740] Lustre: MGS: Connection restored to 9b326615-9480-5d14-f33b-f0b3742f6ca9 (at 10.151.3.124@o2ib) [279857.008745] Lustre: Skipped 129 previous similar messages [280548.400063] Lustre: MGS: Connection restored to e3058b17-2d03-29e3-f6e0-7b806ee9966d (at 10.149.10.46@o2ib313) [280548.400069] Lustre: Skipped 85 previous similar messages [281021.416832] Lustre: nbp8-MDT0000: haven't heard from client 44d3f5d1-2973-73c4-6eff-7094322e96ed (at 10.151.3.41@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a033fd9800, cur 1590963138 expire 1590962988 last 1590962911 [281030.417021] Lustre: MGS: haven't heard from client b54162ef-dfd3-321f-7158-07da13a8db0b (at 10.151.3.41@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897901578800, cur 1590963147 expire 1590962997 last 1590962920 [281132.330450] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [281132.363949] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 23 previous similar messages [281132.397738] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.3.41@o2ib (329): c: 31, oc: 0, rc: 32 [281132.438088] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 23 previous similar messages [281227.067623] Lustre: MGS: Connection restored to a00b0069-13ea-bbcb-ea6c-6738d4680787 (at 10.149.9.241@o2ib313) [281227.067629] Lustre: Skipped 223 previous similar messages [282117.400710] Lustre: MGS: Connection restored to 60f94c13-6f4d-9245-72c1-3a1d2e49038d (at 10.151.3.41@o2ib) [282117.400716] Lustre: Skipped 157 previous similar messages [282735.298278] Lustre: MGS: Connection restored to d0df8f7b-7c48-7d3e-53f6-2ee6ca69e579 (at 10.151.15.61@o2ib) [282735.298283] Lustre: Skipped 191 previous similar messages [283994.742880] Lustre: MGS: Connection restored to 5a633082-e61c-c493-0412-a310897f37ba (at 10.151.36.82@o2ib) [283994.742887] Lustre: Skipped 159 previous similar messages [284073.139275] Lustre: MGS: Connection restored to 80d95b86-8ad6-d170-a831-231d3a953fce (at 10.151.36.195@o2ib) [284073.139281] Lustre: Skipped 13 previous similar messages [284334.064393] Lustre: MGS: Connection restored to 4e56664b-1a3e-00b7-b794-624b576ed99b (at 10.141.4.56@o2ib417) [284334.064399] Lustre: Skipped 1 previous similar message [284804.803874] Lustre: MGS: Connection restored to 1a162bfd-345b-63c7-53b3-820eead01574 (at 10.149.3.52@o2ib313) [284804.803881] Lustre: Skipped 371 previous similar messages [285679.019321] Lustre: MGS: Connection restored to aae5d6c0-5d71-c32d-6ddb-1d51b00209f5 (at 10.149.14.145@o2ib313) [285679.019327] Lustre: Skipped 147 previous similar messages [286363.397527] Lustre: MGS: Connection restored to 56042238-17ff-e43c-7a9f-3d2465e58f19 (at 10.151.3.47@o2ib) [286363.397533] Lustre: Skipped 129 previous similar messages [286985.185939] Lustre: MGS: Connection restored to fda8e5ee-e754-938d-ae05-838c5769e465 (at 10.151.54.121@o2ib) [286985.185944] Lustre: Skipped 229 previous similar messages [287600.800173] Lustre: MGS: Connection restored to e200a537-f458-d536-805b-a2b5705fbd76 (at 10.141.6.73@o2ib417) [287600.800179] Lustre: Skipped 167 previous similar messages [288202.090171] Lustre: MGS: Connection restored to f2a560bd-03e7-7171-9a31-8717e3142d14 (at 10.149.1.35@o2ib313) [288202.090177] Lustre: Skipped 41 previous similar messages [288816.951616] Lustre: MGS: Connection restored to 4cf2f4f0-4459-ce04-2260-7d4be47dd393 (at 10.151.3.37@o2ib) [288816.951622] Lustre: Skipped 127 previous similar messages [289498.140290] Lustre: MGS: Connection restored to c36882b2-82c9-6af7-dc48-1fdd47cab2c7 (at 10.151.55.188@o2ib) [289498.140296] Lustre: Skipped 527 previous similar messages [290106.659761] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [290106.693255] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.37.195@o2ib (303): c: 32, oc: 0, rc: 32 [290207.353393] Lustre: MGS: Connection restored to 6c00740e-ac23-6ae1-62da-4bb5ebcd6264 (at 10.151.5.58@o2ib) [290207.353399] Lustre: Skipped 179 previous similar messages [290384.761066] Lustre: nbp8-MDT0000: haven't heard from client c2c153a1-77af-91ca-1dc1-cf94945a28a1 (at 10.151.37.177@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897edf7a8400, cur 1590972501 expire 1590972351 last 1590972274 [290499.674067] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [290499.707559] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.37.172@o2ib (320): c: 30, oc: 0, rc: 32 [290501.674231] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [290501.707725] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.37.177@o2ib (343): c: 30, oc: 0, rc: 32 [290503.675238] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [290503.708737] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.37.180@o2ib (344): c: 30, oc: 0, rc: 32 [290751.683332] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [290751.716833] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.31.91@o2ib (303): c: 32, oc: 0, rc: 32 [290803.686250] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [290803.719757] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [290803.752966] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.31.94@o2ib (304): c: 32, oc: 0, rc: 32 [290803.793600] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [290867.472025] Lustre: MGS: Connection restored to 704bb05d-94b8-d42f-28e8-92de7ce0cc72 (at 10.149.14.169@o2ib313) [290867.472031] Lustre: Skipped 167 previous similar messages [291525.160001] Lustre: MGS: Connection restored to 705239e6-9d49-17a0-c815-1e8e1437399c (at 10.151.35.211@o2ib) [291525.160007] Lustre: Skipped 155 previous similar messages [291951.727346] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [291951.760827] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [291951.794038] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.30.121@o2ib (304): c: 32, oc: 0, rc: 32 [291951.834959] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [292346.994520] Lustre: MGS: Connection restored to fd8c38ce-4271-d98b-2c39-639f314c27b3 (at 10.151.5.226@o2ib) [292346.994526] Lustre: Skipped 97 previous similar messages [293014.009604] Lustre: MGS: Connection restored to 1a162bfd-345b-63c7-53b3-820eead01574 (at 10.149.3.52@o2ib313) [293014.009610] Lustre: Skipped 83 previous similar messages [293641.216674] Lustre: MGS: Connection restored to 20c1f83d-b05d-bc4f-b80c-cf3d4025ce9a (at 10.151.50.119@o2ib) [293641.216679] Lustre: Skipped 97 previous similar messages [294453.400451] Lustre: MGS: Connection restored to e92d65b9-79e1-4bca-81ab-b56531b75a24 (at 10.151.10.144@o2ib) [294453.400457] Lustre: Skipped 77 previous similar messages [294528.821875] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [294528.855376] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.19.11@o2ib (303): c: 32, oc: 0, rc: 32 [295129.707293] Lustre: MGS: Connection restored to ccbcdd31-f6b8-e1f5-aaa8-d9961efac674 (at 10.151.23.135@o2ib) [295129.707299] Lustre: Skipped 185 previous similar messages [296101.608192] Lustre: MGS: Connection restored to 11a76c06-767e-4805-9c13-08a564dddcaa (at 10.151.7.113@o2ib) [296101.608199] Lustre: Skipped 53 previous similar messages [296811.495884] Lustre: MGS: Connection restored to 18ac275c-4cea-c249-5ddf-ed3ca2bed372 (at 10.141.2.0@o2ib417) [296811.495890] Lustre: Skipped 181 previous similar messages [297455.173412] Lustre: MGS: Connection restored to 8d66dfc0-01ee-ac61-dd15-10cf54ab76f0 (at 10.149.15.185@o2ib313) [297455.173419] Lustre: Skipped 103 previous similar messages [297664.936958] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [297664.970450] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.52.57@o2ib (303): c: 32, oc: 0, rc: 32 [298233.431236] Lustre: MGS: Connection restored to 1fc9c115-15e4-7df8-1092-28a97a71affb (at 10.151.12.28@o2ib) [298233.431242] Lustre: Skipped 59 previous similar messages [298635.972595] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [298636.006095] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.8.234@o2ib (221): c: 32, oc: 0, rc: 32 [298859.841226] Lustre: MGS: Connection restored to 62945951-71fe-553f-347c-2d06eef74091 (at 10.151.31.144@o2ib) [298859.841231] Lustre: Skipped 165 previous similar messages [298994.986795] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [298995.020289] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.16.116@o2ib (303): c: 32, oc: 0, rc: 32 [299048.987838] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [299049.021344] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.23.49@o2ib (281): c: 32, oc: 0, rc: 32 [299657.941001] Lustre: MGS: Connection restored to f292be86-ab28-1d3a-32b5-3250f8fe551f (at 10.151.44.242@o2ib) [299657.941007] Lustre: Skipped 201 previous similar messages [300281.059110] Lustre: MGS: Connection restored to 53033a28-0e24-26a2-0813-ce4cc726f446 (at 10.149.15.164@o2ib313) [300281.059116] Lustre: Skipped 3 previous similar messages [300565.043410] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [300565.076904] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.16.24@o2ib (299): c: 32, oc: 0, rc: 32 [301122.152827] Lustre: nbp8-MDT0000: haven't heard from client 23e48411-e50b-6ce3-c80e-c41c9c27ef1a (at 10.153.10.30@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a3023a9400, cur 1590983238 expire 1590983088 last 1590983011 [301122.226098] Lustre: Skipped 5 previous similar messages [301147.778842] Lustre: MGS: Connection restored to 214be00d-099d-12bf-29c5-007a1eee785d (at 10.151.38.151@o2ib) [301147.778847] Lustre: Skipped 121 previous similar messages [301350.072281] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [301350.105770] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.6.14@o2ib (302): c: 32, oc: 0, rc: 32 [301814.479338] Lustre: MGS: Connection restored to 22b11e0d-d4c6-f836-3867-0f9c07d30ec6 (at 10.149.14.182@o2ib313) [301814.479344] Lustre: Skipped 273 previous similar messages [302417.972516] Lustre: MGS: Connection restored to 0e2dba24-a742-6eb1-3455-cedb12c432c6 (at 10.151.29.194@o2ib) [302417.972522] Lustre: Skipped 225 previous similar messages [302593.117853] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [302593.151346] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.16.142@o2ib (299): c: 32, oc: 0, rc: 32 [303019.364403] Lustre: MGS: Connection restored to 0a52f9cb-f8d4-61bf-d7db-1000a861384f (at 10.149.3.237@o2ib313) [303019.364414] Lustre: Skipped 183 previous similar messages [303620.790574] Lustre: MGS: Connection restored to 8ffa3dc1-a86f-e44b-e7d8-dad8cc655401 (at 10.149.2.84@o2ib313) [303620.790580] Lustre: Skipped 183 previous similar messages [304222.030707] Lustre: MGS: Connection restored to bbf5027b-8803-e3e3-276f-cf763c490294 (at 10.149.4.36@o2ib313) [304222.030713] Lustre: Skipped 115 previous similar messages [305022.047110] Lustre: MGS: Connection restored to 8b23b844-50f0-f961-2c9d-efbb6f0268ed (at 10.149.2.165@o2ib313) [305022.047116] Lustre: Skipped 61 previous similar messages [306021.042467] Lustre: MGS: Connection restored to b1dc3bd6-89d9-411b-2a93-ca7254024ca0 (at 10.151.29.200@o2ib) [306021.042473] Lustre: Skipped 177 previous similar messages [306868.569306] Lustre: MGS: Connection restored to feadee23-5be4-8ec9-eded-61b63f8a07bd (at 10.151.4.81@o2ib) [306868.569312] Lustre: Skipped 127 previous similar messages [307530.389135] Lustre: nbp8-MDT0000: haven't heard from client fb9c9db9-8a49-767a-4b6c-f36be6093aaa (at 10.153.13.50@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899f1b37d400, cur 1590989646 expire 1590989496 last 1590989419 [307530.462400] Lustre: Skipped 17 previous similar messages [307724.717458] Lustre: MGS: Connection restored to fe22b396-df88-6464-bdf6-cc24a3121634 (at 10.151.8.97@o2ib) [307724.717464] Lustre: Skipped 73 previous similar messages [308207.421935] Lustre: MGS: haven't heard from client 1ddacfdd-0cd1-7c6b-2439-452736c5abec (at 10.141.3.176@o2ib417) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a383f2f400, cur 1590990323 expire 1590990173 last 1590990096 [308207.492632] Lustre: Skipped 1 previous similar message [308232.414260] Lustre: nbp8-MDT0000: haven't heard from client 9d5aa4f2-5a18-48f1-9980-079a80541181 (at 10.141.3.176@o2ib417) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897d7a34fc00, cur 1590990348 expire 1590990198 last 1590990121 [308344.901318] Lustre: MGS: Connection restored to 13933e69-5f9b-7b7a-2693-fc1bbc1569cb (at 10.151.32.10@o2ib) [308344.901324] Lustre: Skipped 233 previous similar messages [309075.355898] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [309075.389402] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.30.128@o2ib (303): c: 32, oc: 0, rc: 32 [309274.362949] Lustre: MGS: Connection restored to 0773b5f7-08d7-b1cf-ba3d-7ce46d84f19b (at 10.149.10.8@o2ib313) [309274.362955] Lustre: Skipped 137 previous similar messages [309939.009309] Lustre: MGS: Connection restored to 9ef891df-ecf5-4bf9-4a4a-896e05606718 (at 10.151.37.251@o2ib) [309939.009315] Lustre: Skipped 15 previous similar messages [310547.739621] Lustre: MGS: Connection restored to d86ed199-0822-7892-d7cb-3e424f7dd1b6 (at 10.151.34.14@o2ib) [310547.739626] Lustre: Skipped 41 previous similar messages [311378.669035] Lustre: MGS: Connection restored to 8a9478af-15af-f749-5598-4725b85c069b (at 10.149.14.139@o2ib313) [311378.669041] Lustre: Skipped 1351 previous similar messages [311913.565901] Lustre: MGS: haven't heard from client 91c021ad-4731-2041-5c4e-00d45eec26e5 (at 10.151.18.19@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897ddf9b1c00, cur 1590994029 expire 1590993879 last 1590993802 [311918.550752] Lustre: nbp8-MDT0000: haven't heard from client 95b13f8c-0859-4c2d-a283-87823a69feb7 (at 10.151.18.22@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a098323000, cur 1590994034 expire 1590993884 last 1590993807 [311918.623208] Lustre: Skipped 1 previous similar message [312029.464534] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [312029.498025] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [312029.531221] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.18.19@o2ib (325): c: 31, oc: 0, rc: 32 [312029.571851] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [312031.465522] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [312031.499022] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.18.22@o2ib (338): c: 30, oc: 0, rc: 32 [312807.184103] Lustre: MGS: Connection restored to f7e8fec9-760c-1b71-0031-b2930cb74d5d (at 10.151.54.100@o2ib) [312807.184108] Lustre: Skipped 145 previous similar messages [312891.272548] Lustre: MGS: Connection restored to 35361c16-a7fb-632e-89ec-943bda3b4a93 (at 10.151.35.83@o2ib) [312891.272554] Lustre: Skipped 19 previous similar messages [313096.753299] Lustre: MGS: Connection restored to 6b361ea1-8f8c-b3c4-954d-351c8bba3039 (at 10.149.8.115@o2ib313) [313096.753305] Lustre: Skipped 3 previous similar messages [313106.555288] Process accounting resumed [313557.934906] Lustre: MGS: Connection restored to 1a162bfd-345b-63c7-53b3-820eead01574 (at 10.149.3.52@o2ib313) [313557.934912] Lustre: Skipped 23 previous similar messages [314158.372196] Lustre: MGS: Connection restored to 5d77c741-2a1d-2088-ad09-94f4ed54398a (at 10.149.4.70@o2ib313) [314158.372202] Lustre: Skipped 1 previous similar message [314769.654451] Lustre: MGS: Connection restored to 1a2abbaa-640c-f538-d692-4ad3daadbb31 (at 10.151.16.204@o2ib) [314769.654457] Lustre: Skipped 273 previous similar messages [315486.995250] Lustre: MGS: Connection restored to f467ab6f-3bca-16b3-80e2-2138de6e9fcf (at 10.151.33.190@o2ib) [315486.995256] Lustre: Skipped 3 previous similar messages [317544.352853] Lustre: MGS: Connection restored to bd9ba461-200b-1442-d888-cad91bededbd (at 10.149.6.51@o2ib313) [317544.352859] Lustre: Skipped 1 previous similar message [317656.241043] Lustre: MGS: Connection restored to 1a162bfd-345b-63c7-53b3-820eead01574 (at 10.149.3.52@o2ib313) [317656.241049] Lustre: Skipped 59 previous similar messages [317912.609285] Lustre: MGS: Connection restored to b8d6c159-732e-52ab-54b5-7a764fa96204 (at 10.141.6.64@o2ib417) [317912.609291] Lustre: Skipped 1 previous similar message [318656.405772] Lustre: MGS: Connection restored to 8e7e5410-047a-fa53-2289-c74fbd8c94ef (at 10.149.15.86@o2ib313) [318656.405779] Lustre: Skipped 9 previous similar messages [319317.868711] Lustre: MGS: Connection restored to a9198ff8-6464-8a9e-dc40-600c5b168d7c (at 10.151.12.169@o2ib) [319317.868717] Lustre: Skipped 129 previous similar messages [320297.190523] Lustre: MGS: Connection restored to cd9928f0-5978-b445-9279-f9ceffb6e137 (at 10.151.3.118@o2ib) [320297.190529] Lustre: Skipped 103 previous similar messages [320944.612223] Lustre: MGS: Connection restored to 967ac92d-78e9-409b-3c50-f0d4127fb9a7 (at 10.151.32.36@o2ib) [320944.612229] Lustre: Skipped 98 previous similar messages [321606.010125] Lustre: MGS: Connection restored to 5e44c96a-f56a-7fa8-f681-517306800fb7 (at 10.151.32.89@o2ib) [321606.010130] Lustre: Skipped 27 previous similar messages [325631.496957] Lustre: MGS: Connection restored to 13d88218-3670-99a6-0276-9917b59bcd82 (at 10.151.28.13@o2ib) [325631.496963] Lustre: Skipped 189 previous similar messages [325911.219461] Lustre: MGS: Connection restored to 1a162bfd-345b-63c7-53b3-820eead01574 (at 10.149.3.52@o2ib313) [325911.219467] Lustre: Skipped 3 previous similar messages [326286.240427] Lustre: MGS: Connection restored to c3116a2a-89e8-49f0-44bb-249f30736d6c (at 10.151.3.161@o2ib) [326286.240433] Lustre: Skipped 1 previous similar message [326678.155842] Lustre: MGS: Connection restored to 27d4cc23-a1a4-ecfd-7337-bd3fd009625b (at 10.149.11.35@o2ib313) [326678.155847] Lustre: Skipped 37 previous similar messages [327346.699134] Lustre: MGS: Connection restored to 61192f1d-45b2-eeae-8270-1c9d64636f5a (at 10.151.15.101@o2ib) [327346.699139] Lustre: Skipped 195 previous similar messages [328519.821305] Lustre: MGS: Connection restored to b16b208c-ec66-9a8a-eb51-6f645bafaa2a (at 10.151.12.156@o2ib) [328519.821311] Lustre: Skipped 51 previous similar messages [329279.807140] Lustre: MGS: Connection restored to 676d4605-a96b-36bf-0ac4-d2334f1dc256 (at 10.149.15.199@o2ib313) [329279.807146] Lustre: Skipped 77 previous similar messages [329946.226056] Lustre: MGS: Connection restored to e3058b17-2d03-29e3-f6e0-7b806ee9966d (at 10.149.10.46@o2ib313) [329946.226062] Lustre: Skipped 257 previous similar messages [330565.750645] Lustre: MGS: Connection restored to 490149d5-321f-9e30-0b40-e9893af43946 (at 10.153.13.102@o2ib233) [330565.750651] Lustre: Skipped 1355 previous similar messages [331253.829182] Lustre: MGS: Connection restored to 15426714-2421-b871-bc71-11f93339900b (at 10.149.14.223@o2ib313) [331253.829187] Lustre: Skipped 1339 previous similar messages [331894.543707] Lustre: MGS: Connection restored to e3ad2374-3eae-ac19-ab2c-d144c1aaac75 (at 10.141.3.40@o2ib417) [331894.543712] Lustre: Skipped 119 previous similar messages [332781.815298] Lustre: MGS: Connection restored to bfedb826-bab1-fe20-4275-52b1b124dec2 (at 10.151.28.159@o2ib) [332781.815305] Lustre: Skipped 327 previous similar messages [333409.217027] Lustre: MGS: Connection restored to 6b361ea1-8f8c-b3c4-954d-351c8bba3039 (at 10.149.8.115@o2ib313) [333409.217033] Lustre: Skipped 3 previous similar messages [334035.763630] Lustre: MGS: Connection restored to 43244a41-0e26-3af4-e4be-9bb4f39f1557 (at 10.151.3.183@o2ib) [334035.763635] Lustre: Skipped 5 previous similar messages [335286.406524] Lustre: MGS: Connection restored to b40dfaab-4365-2bca-92d7-86e92d97e282 (at 10.149.12.135@o2ib313) [335286.406530] Lustre: Skipped 63 previous similar messages [335580.416292] Lustre: nbp8-MDT0000: haven't heard from client 4c0e1886-5c9b-a636-0eb6-f02ea2f25771 (at 10.153.14.12@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a2e83f2c00, cur 1591017695 expire 1591017545 last 1591017468 [335580.489539] Lustre: Skipped 1 previous similar message [335589.252369] Lustre: MGS: Connection restored to 81df8a59-e8bc-88a0-d563-cb62440a832e (at 10.149.14.149@o2ib313) [335589.252375] Lustre: Skipped 59 previous similar messages [335755.957359] Lustre: MGS: Connection restored to 38c07e8b-7f1f-c964-d7d2-dc3a0078c38f (at 10.151.28.70@o2ib) [335755.957372] Lustre: Skipped 59 previous similar messages [336559.674829] Lustre: MGS: Connection restored to 9857a835-9155-2f29-b27e-e053b9fe6694 (at 10.151.39.8@o2ib) [336559.674834] Lustre: Skipped 121 previous similar messages [337299.841357] Lustre: MGS: Connection restored to f7b315dd-8cca-3f3c-ff16-b14b5837ed45 (at 10.151.31.48@o2ib) [337299.841362] Lustre: Skipped 37 previous similar messages [337921.363706] Lustre: MGS: Connection restored to 18ac275c-4cea-c249-5ddf-ed3ca2bed372 (at 10.141.2.0@o2ib417) [337921.363712] Lustre: Skipped 1819 previous similar messages [338644.189257] Lustre: MGS: Connection restored to 5fb66e64-b957-a3a3-4de6-7b1afa9cfda8 (at 10.149.11.150@o2ib313) [338644.189263] Lustre: Skipped 543 previous similar messages [339327.890863] Lustre: MGS: Connection restored to 62168799-ffd3-66ba-8c23-becd27eff00d (at 10.151.28.200@o2ib) [339327.890870] Lustre: Skipped 155 previous similar messages [340509.485843] Lustre: MGS: Connection restored to e8f4e8a8-72a7-0b09-4f97-fe721a271298 (at 10.151.7.84@o2ib) [340509.485849] Lustre: Skipped 199 previous similar messages [341257.082815] Lustre: MGS: Connection restored to 1cf494bc-d035-b3c5-7058-ec583d6c6a84 (at 10.149.2.190@o2ib313) [341257.082820] Lustre: Skipped 85 previous similar messages [342019.703683] Lustre: MGS: Connection restored to d30aac62-d821-5fea-a960-f9d6d4e35e7a (at 10.151.11.215@o2ib) [342019.703688] Lustre: Skipped 793 previous similar messages [342024.653291] Lustre: MGS: haven't heard from client c7f07aae-6263-42b7-dea4-15cab3558969 (at 10.149.4.54@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897f6a5b1400, cur 1591024139 expire 1591023989 last 1591023912 [342024.723685] Lustre: Skipped 15 previous similar messages [342035.653412] Lustre: nbp8-MDT0000: haven't heard from client 5cc4833f-3b9e-ddac-732d-be974f851eeb (at 10.149.4.54@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a39fd54c00, cur 1591024150 expire 1591024000 last 1591023923 [342035.726373] Lustre: Skipped 2 previous similar messages [342444.668428] Lustre: nbp8-MDT0000: haven't heard from client 1f64a0e8-e505-2146-8a20-e2de9d3879ec (at 10.153.15.180@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899ec363f800, cur 1591024559 expire 1591024409 last 1591024332 [342444.741989] Lustre: Skipped 2 previous similar messages [342712.238758] Lustre: MGS: Connection restored to 8c1bd8bc-9be6-7328-fa72-16ae8e505a12 (at 10.149.2.227@o2ib313) [342712.238764] Lustre: Skipped 209 previous similar messages [343126.603558] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [343126.637067] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.14.110@o2ib (302): c: 32, oc: 0, rc: 32 [343440.011239] Lustre: MGS: Connection restored to 99b300f1-fc6c-437f-4135-6ca87dc3687b (at 10.151.34.211@o2ib) [343440.011244] Lustre: Skipped 199 previous similar messages [344351.202718] Lustre: MGS: Connection restored to 3d466810-00ec-4ea6-ac35-4dffdc2f1431 (at 10.151.45.156@o2ib) [344351.202724] Lustre: Skipped 1293 previous similar messages [345047.401977] Lustre: MGS: Connection restored to 3e756db3-a641-b9aa-05b4-791333c726d9 (at 10.149.9.166@o2ib313) [345047.401983] Lustre: Skipped 17 previous similar messages [345303.797569] Lustre: 8606:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (30/30), not sending early reply req@ffff899a68f98d80 x1666002432540224/t1529804920342(0) o36->30b12848-7192-9b96-069f-ad68e260ba38@10.151.12.75@o2ib:93/0 lens 488/3152 e 1 to 0 dl 1591027448 ref 2 fl Interpret:/0/0 rc 0/0 [345303.896561] Lustre: 8606:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 71 previous similar messages [345536.894086] LNet: Service thread pid 12650 was inactive for 551.92s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [345536.950171] Pid: 12650, comm: mdt01_098 3.10.0-1062.12.1.el7_lustre2124.x86_64 #1 SMP Tue Mar 17 13:32:19 PDT 2020 [345536.950172] Call Trace: [345536.950184] [] ldlm_completion_ast+0x430/0x860 [ptlrpc] [345536.973205] [] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] [345536.973231] [] mdt_object_local_lock+0x50b/0xb20 [mdt] [345536.973241] [] mdt_object_lock_internal+0x70/0x360 [mdt] [345536.973251] [] mdt_reint_object_lock+0x2c/0x60 [mdt] [345536.973264] [] mdt_reint_striped_lock+0x8c/0x510 [mdt] [345536.973275] [] mdt_reint_setattr+0x676/0x1290 [mdt] [345536.973286] [] mdt_reint_rec+0x83/0x210 [mdt] [345536.973295] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [345536.973305] [] mdt_reint+0x67/0x140 [mdt] [345536.973356] [] tgt_request_handle+0xada/0x1570 [ptlrpc] [345536.973389] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [345536.973421] [] ptlrpc_main+0xb34/0x1470 [ptlrpc] [345536.973425] [] kthread+0xd1/0xe0 [345536.973428] [] ret_from_fork_nospec_end+0x0/0x39 [345536.973453] [] 0xffffffffffffffff [345536.973455] LustreError: dumping log to /tmp/lustre-log.1591027651.12650 [345537.722609] LNet: Service thread pid 10534 was inactive for 552.77s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [345537.778695] Pid: 10534, comm: mdt01_073 3.10.0-1062.12.1.el7_lustre2124.x86_64 #1 SMP Tue Mar 17 13:32:19 PDT 2020 [345537.778696] Call Trace: [345537.778709] [] ldlm_completion_ast+0x430/0x860 [ptlrpc] [345537.801733] [] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] [345537.801758] [] mdt_object_local_lock+0x50b/0xb20 [mdt] [345537.801769] [] mdt_object_lock_internal+0x70/0x360 [mdt] [345537.801778] [] mdt_reint_object_lock+0x2c/0x60 [mdt] [345537.801792] [] mdt_reint_striped_lock+0x8c/0x510 [mdt] [345537.801803] [] mdt_reint_setattr+0x676/0x1290 [mdt] [345537.801814] [] mdt_reint_rec+0x83/0x210 [mdt] [345537.801824] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [345537.801834] [] mdt_reint+0x67/0x140 [mdt] [345537.801884] [] tgt_request_handle+0xada/0x1570 [ptlrpc] [345537.801918] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [345537.801949] [] ptlrpc_main+0xb34/0x1470 [ptlrpc] [345537.801954] [] kthread+0xd1/0xe0 [345537.801957] [] ret_from_fork_nospec_end+0x0/0x39 [345537.801982] [] 0xffffffffffffffff [345537.801986] Pid: 12651, comm: mdt01_099 3.10.0-1062.12.1.el7_lustre2124.x86_64 #1 SMP Tue Mar 17 13:32:19 PDT 2020 [345537.801986] Call Trace: [345537.802017] [] ldlm_completion_ast+0x430/0x860 [ptlrpc] [345537.802043] [] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] [345537.802053] [] mdt_object_local_lock+0x50b/0xb20 [mdt] [345537.802063] [] mdt_object_lock_internal+0x70/0x360 [mdt] [345537.802072] [] mdt_reint_object_lock+0x2c/0x60 [mdt] [345537.802083] [] mdt_reint_striped_lock+0x8c/0x510 [mdt] [345537.802094] [] mdt_reint_setattr+0x676/0x1290 [mdt] [345537.802105] [] mdt_reint_rec+0x83/0x210 [mdt] [345537.802125] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [345537.802140] [] mdt_reint+0x67/0x140 [mdt] [345537.802181] [] tgt_request_handle+0xada/0x1570 [ptlrpc] [345537.802217] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [345537.802253] [] ptlrpc_main+0xb34/0x1470 [ptlrpc] [345537.802259] [] kthread+0xd1/0xe0 [345537.802266] [] ret_from_fork_nospec_end+0x0/0x39 [345537.802273] [] 0xffffffffffffffff [345537.802278] Pid: 7286, comm: mdt01_009 3.10.0-1062.12.1.el7_lustre2124.x86_64 #1 SMP Tue Mar 17 13:32:19 PDT 2020 [345537.802280] Call Trace: [345537.802316] [] ldlm_completion_ast+0x430/0x860 [ptlrpc] [345537.802346] [] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] [345537.802361] [] mdt_object_local_lock+0x50b/0xb20 [mdt] [345537.802374] [] mdt_object_lock_internal+0x70/0x360 [mdt] [345537.802388] [] mdt_reint_object_lock+0x2c/0x60 [mdt] [345537.802404] [] mdt_reint_striped_lock+0x8c/0x510 [mdt] [345537.802418] [] mdt_reint_setattr+0x676/0x1290 [mdt] [345537.802434] [] mdt_reint_rec+0x83/0x210 [mdt] [345537.802448] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [345537.802462] [] mdt_reint+0x67/0x140 [mdt] [345537.802502] [] tgt_request_handle+0xada/0x1570 [ptlrpc] [345537.802536] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [345537.802571] [] ptlrpc_main+0xb34/0x1470 [ptlrpc] [345537.802582] [] kthread+0xd1/0xe0 [345537.802586] [] ret_from_fork_nospec_end+0x0/0x39 [345537.802593] [] 0xffffffffffffffff [345537.802596] Pid: 8526, comm: mdt01_015 3.10.0-1062.12.1.el7_lustre2124.x86_64 #1 SMP Tue Mar 17 13:32:19 PDT 2020 [345537.802596] Call Trace: [345537.802628] [] ldlm_completion_ast+0x430/0x860 [ptlrpc] [345537.802656] [] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] [345537.802667] [] mdt_object_local_lock+0x50b/0xb20 [mdt] [345537.802678] [] mdt_object_lock_internal+0x70/0x360 [mdt] [345537.802688] [] mdt_reint_object_lock+0x2c/0x60 [mdt] [345537.802699] [] mdt_reint_striped_lock+0x8c/0x510 [mdt] [345537.802710] [] mdt_reint_setattr+0x676/0x1290 [mdt] [345537.802722] [] mdt_reint_rec+0x83/0x210 [mdt] [345537.802732] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [345537.802742] [] mdt_reint+0x67/0x140 [mdt] [345537.802780] [] tgt_request_handle+0xada/0x1570 [ptlrpc] [345537.802812] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [345537.802842] [] ptlrpc_main+0xb34/0x1470 [ptlrpc] [345537.802845] [] kthread+0xd1/0xe0 [345537.802849] [] ret_from_fork_nospec_end+0x0/0x39 [345537.802853] [] 0xffffffffffffffff [345537.802856] LNet: Service thread pid 12642 was inactive for 552.85s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. [345609.198151] Lustre: nbp8-MDT0000: Client 30b12848-7192-9b96-069f-ad68e260ba38 (at 10.151.12.75@o2ib) reconnecting [345609.232216] Lustre: Skipped 2 previous similar messages [345609.755414] Lustre: nbp8-MDT0000: Client ca2cbe72-47bc-7410-be7d-4a6a4ce27e15 (at 10.151.12.13@o2ib) reconnecting [345609.789476] Lustre: Skipped 4 previous similar messages [345809.945113] LustreError: 6562:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1591027099, 825s ago); not entering recovery in server code, just going back to sleep ns: mdt-nbp8-MDT0000_UUID lock: ffff899fca7bba80/0xa22cee2e014577b3 lrc: 3/0,1 mode: --/PW res: [0x3608a9c51:0x10:0x0].0x0 bits 0x2/0x0 rrc: 123 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 6562 timeout: 0 lvb_type: 0 [345810.075019] LustreError: 6562:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 58 previous similar messages [345810.713153] LustreError: dumping log to /tmp/lustre-log.1591027924.8561 [345927.052426] Lustre: 8545:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (30/30), not sending early reply req@ffff899bcabb2d00 x1666040071682528/t1529806081655(0) o36->1bfe3e3b-a913-0364-0c66-53ddea7a42a3@10.151.12.47@o2ib:716/0 lens 488/3152 e 0 to 0 dl 1591028071 ref 2 fl Interpret:/0/0 rc 0/0 [346126.674574] Lustre: MGS: Connection restored to c3a022bc-2b93-2237-de2b-6b27eb91f564 (at 10.149.8.142@o2ib313) [346126.674580] Lustre: Skipped 238 previous similar messages [346231.862708] Lustre: nbp8-MDT0000: Client 30b12848-7192-9b96-069f-ad68e260ba38 (at 10.151.12.75@o2ib) reconnecting [346231.896776] Lustre: Skipped 2 previous similar messages [346232.440213] Lustre: nbp8-MDT0000: Client 1bfe3e3b-a913-0364-0c66-53ddea7a42a3 (at 10.151.12.47@o2ib) reconnecting [346232.474273] Lustre: Skipped 1 previous similar message [346233.478728] Lustre: nbp8-MDT0000: Client 7b269e69-afb6-2856-cbbf-9e25064bdcdc (at 10.151.11.240@o2ib) reconnecting [346233.513078] Lustre: Skipped 4 previous similar messages [346309.018449] LNet: Service thread pid 12631 was inactive for 699.73s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [346309.074537] LNet: Skipped 3 previous similar messages [346309.091437] Pid: 12631, comm: mdt01_084 3.10.0-1062.12.1.el7_lustre2124.x86_64 #1 SMP Tue Mar 17 13:32:19 PDT 2020 [346309.091443] Call Trace: [346309.091458] [] ldlm_completion_ast+0x430/0x860 [ptlrpc] [346309.096437] [] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] [346309.096466] [] mdt_object_local_lock+0x50b/0xb20 [mdt] [346309.096476] [] mdt_object_lock_internal+0x70/0x360 [mdt] [346309.096486] [] mdt_reint_object_lock+0x2c/0x60 [mdt] [346309.096500] [] mdt_reint_striped_lock+0x8c/0x510 [mdt] [346309.096513] [] mdt_reint_setattr+0x676/0x1290 [mdt] [346309.096524] [] mdt_reint_rec+0x83/0x210 [mdt] [346309.096534] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [346309.096545] [] mdt_reint+0x67/0x140 [mdt] [346309.096597] [] tgt_request_handle+0xada/0x1570 [ptlrpc] [346309.096631] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [346309.096664] [] ptlrpc_main+0xb34/0x1470 [ptlrpc] [346309.096671] [] kthread+0xd1/0xe0 [346309.096676] [] ret_from_fork_nospec_end+0x0/0x39 [346309.096700] [] 0xffffffffffffffff [346309.096702] LustreError: dumping log to /tmp/lustre-log.1591028423.12631 [346309.806274] LNet: Service thread pid 14062 was inactive for 700.52s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [346309.862357] Pid: 14062, comm: mdt01_110 3.10.0-1062.12.1.el7_lustre2124.x86_64 #1 SMP Tue Mar 17 13:32:19 PDT 2020 [346309.862359] Call Trace: [346309.862371] [] ldlm_completion_ast+0x430/0x860 [ptlrpc] [346309.885394] [] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] [346309.885421] [] mdt_object_local_lock+0x50b/0xb20 [mdt] [346309.885432] [] mdt_object_lock_internal+0x70/0x360 [mdt] [346309.885441] [] mdt_reint_object_lock+0x2c/0x60 [mdt] [346309.885455] [] mdt_reint_striped_lock+0x8c/0x510 [mdt] [346309.885476] [] mdt_reint_setattr+0x676/0x1290 [mdt] [346309.885493] [] mdt_reint_rec+0x83/0x210 [mdt] [346309.885508] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [346309.885523] [] mdt_reint+0x67/0x140 [mdt] [346309.885578] [] tgt_request_handle+0xada/0x1570 [ptlrpc] [346309.885615] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [346309.885651] [] ptlrpc_main+0xb34/0x1470 [ptlrpc] [346309.885656] [] kthread+0xd1/0xe0 [346309.885661] [] ret_from_fork_nospec_end+0x0/0x39 [346309.885684] [] 0xffffffffffffffff [346309.885688] Pid: 12654, comm: mdt01_101 3.10.0-1062.12.1.el7_lustre2124.x86_64 #1 SMP Tue Mar 17 13:32:19 PDT 2020 [346309.885688] Call Trace: [346309.885720] [] ldlm_completion_ast+0x430/0x860 [ptlrpc] [346309.885749] [] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] [346309.885760] [] mdt_object_local_lock+0x50b/0xb20 [mdt] [346309.885771] [] mdt_object_lock_internal+0x70/0x360 [mdt] [346309.885781] [] mdt_reint_object_lock+0x2c/0x60 [mdt] [346309.885792] [] mdt_reint_striped_lock+0x8c/0x510 [mdt] [346309.885805] [] mdt_reint_setattr+0x676/0x1290 [mdt] [346309.885818] [] mdt_reint_rec+0x83/0x210 [mdt] [346309.885827] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [346309.885838] [] mdt_reint+0x67/0x140 [mdt] [346309.885876] [] tgt_request_handle+0xada/0x1570 [ptlrpc] [346309.885910] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [346309.885942] [] ptlrpc_main+0xb34/0x1470 [ptlrpc] [346309.885945] [] kthread+0xd1/0xe0 [346309.885947] [] ret_from_fork_nospec_end+0x0/0x39 [346309.885951] [] 0xffffffffffffffff [346309.885954] Pid: 7279, comm: mdt01_006 3.10.0-1062.12.1.el7_lustre2124.x86_64 #1 SMP Tue Mar 17 13:32:19 PDT 2020 [346309.885954] Call Trace: [346309.885985] [] ldlm_completion_ast+0x430/0x860 [ptlrpc] [346309.886013] [] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] [346309.886025] [] mdt_object_local_lock+0x50b/0xb20 [mdt] [346309.886034] [] mdt_object_lock_internal+0x70/0x360 [mdt] [346309.886044] [] mdt_reint_object_lock+0x2c/0x60 [mdt] [346309.886057] [] mdt_reint_striped_lock+0x8c/0x510 [mdt] [346309.886069] [] mdt_reint_setattr+0x676/0x1290 [mdt] [346309.886080] [] mdt_reint_rec+0x83/0x210 [mdt] [346309.886090] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [346309.886101] [] mdt_reint+0x67/0x140 [mdt] [346309.886140] [] tgt_request_handle+0xada/0x1570 [ptlrpc] [346309.886172] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [346309.886204] [] ptlrpc_main+0xb34/0x1470 [ptlrpc] [346309.886207] [] kthread+0xd1/0xe0 [346309.886210] [] ret_from_fork_nospec_end+0x0/0x39 [346309.886214] [] 0xffffffffffffffff [346309.886218] Pid: 12638, comm: mdt01_088 3.10.0-1062.12.1.el7_lustre2124.x86_64 #1 SMP Tue Mar 17 13:32:19 PDT 2020 [346309.886218] Call Trace: [346309.886249] [] ldlm_completion_ast+0x430/0x860 [ptlrpc] [346309.886277] [] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] [346309.886289] [] mdt_object_local_lock+0x50b/0xb20 [mdt] [346309.886298] [] mdt_object_lock_internal+0x70/0x360 [mdt] [346309.886308] [] mdt_reint_object_lock+0x2c/0x60 [mdt] [346309.886320] [] mdt_reint_striped_lock+0x8c/0x510 [mdt] [346309.886334] [] mdt_reint_setattr+0x676/0x1290 [mdt] [346309.886345] [] mdt_reint_rec+0x83/0x210 [mdt] [346309.886355] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [346309.886366] [] mdt_reint+0x67/0x140 [mdt] [346309.886405] [] tgt_request_handle+0xada/0x1570 [ptlrpc] [346309.886437] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [346309.886469] [] ptlrpc_main+0xb34/0x1470 [ptlrpc] [346309.886476] [] kthread+0xd1/0xe0 [346309.886478] [] ret_from_fork_nospec_end+0x0/0x39 [346309.886481] [] 0xffffffffffffffff [346309.886485] LNet: Service thread pid 12657 was inactive for 700.52s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. [346309.886486] LNet: Skipped 57 previous similar messages [346434.282046] LustreError: 12645:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1591027723, 825s ago); not entering recovery in server code, just going back to sleep ns: mdt-nbp8-MDT0000_UUID lock: ffff89a3ef70b180/0xa22cee2e04b4a4fa lrc: 3/0,1 mode: --/PW res: [0x3608a9c51:0x10:0x0].0x0 bits 0x2/0x0 rrc: 135 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 12645 timeout: 0 lvb_type: 0 [346434.412509] LustreError: 12645:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 21 previous similar messages [346434.800063] LustreError: 8614:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1591027723, 825s ago); not entering recovery in server code, just going back to sleep ns: mdt-nbp8-MDT0000_UUID lock: ffff89a3eb3f7bc0/0xa22cee2e04b560b1 lrc: 3/0,1 mode: --/PW res: [0x3608a9c51:0x10:0x0].0x0 bits 0x2/0x0 rrc: 135 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 8614 timeout: 0 lvb_type: 0 [346434.929947] LustreError: 8614:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 22 previous similar messages [346462.816801] Lustre: nbp8-MDT0000: haven't heard from client c7d3662a-8068-ca2c-a08d-fddd0fd53743 (at 10.151.44.240@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899e4de91800, cur 1591028577 expire 1591028427 last 1591028350 [346462.889481] Lustre: Skipped 23 previous similar messages [346538.819112] Lustre: nbp8-MDT0000: haven't heard from client 72215bd9-f6de-4d1b-bdd1-3154627538f3 (at 10.151.7.232@o2ib) in 193 seconds. I think it's dead, and I am evicting it. exp ffff89a295b81800, cur 1591028653 expire 1591028503 last 1591028460 [346538.891504] Lustre: Skipped 16 previous similar messages [346550.691336] Lustre: 10531:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (30/30), not sending early reply req@ffff899a4c72a400 x1666144599929184/t0(0) o101->fd383c68-a298-e470-ee9e-a53c9f5c226b@10.149.3.97@o2ib313:584/0 lens 4512/0 e 0 to 0 dl 1591028694 ref 2 fl New:/0/ffffffff rc 0/-1 [346550.788323] Lustre: 10531:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 46 previous similar messages [346641.830770] Lustre: 10531:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (30/30), not sending early reply req@ffff89a3cb037080 x1667381327676928/t0(0) o101->8efe3273-fd53-5756-284f-8aee3a20f907@10.151.52.12@o2ib:676/0 lens 4512/0 e 0 to 0 dl 1591028786 ref 2 fl New:/0/ffffffff rc 0/-1 [346641.927194] Lustre: 10531:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 365 previous similar messages [346748.826638] Lustre: nbp8-MDT0000: haven't heard from client 166bafea-e3ff-93d3-94bd-8c7878e0f2f4 (at 10.151.13.37@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a3ba82b400, cur 1591028863 expire 1591028713 last 1591028636 [346748.899051] Lustre: Skipped 7 previous similar messages [346770.221373] Lustre: 8564:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (30/30), not sending early reply req@ffff8977a5e1a400 x1665998000367984/t0(0) o101->2409db20-ad83-0d48-49df-e89c150ac84e@10.151.5.27@o2ib:49/0 lens 376/0 e 1 to 0 dl 1591028914 ref 2 fl New:/0/ffffffff rc 0/-1 [346770.316646] Lustre: 8564:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2065 previous similar messages [346824.832414] Lustre: nbp8-MDT0000: haven't heard from client 0d6db048-e9d0-7ae8-804a-92c499b4f7d4 (at 10.151.10.184@o2ib) in 189 seconds. I think it's dead, and I am evicting it. exp ffff899bec259c00, cur 1591028939 expire 1591028789 last 1591028750 [346824.905119] Lustre: Skipped 12 previous similar messages [346855.450601] Lustre: nbp8-MDT0000: Client fd383c68-a298-e470-ee9e-a53c9f5c226b (at 10.149.3.97@o2ib313) reconnecting [346855.485272] Lustre: nbp8-MDT0000: Connection restored to c81ceb9a-57db-738f-4859-b8577b295a5a (at 10.149.3.97@o2ib313) [346855.485274] Lustre: Skipped 19 previous similar messages [346856.048717] Lustre: nbp8-MDT0000: Client ca2cbe72-47bc-7410-be7d-4a6a4ce27e15 (at 10.151.12.13@o2ib) reconnecting [346856.082779] Lustre: Skipped 2 previous similar messages [346857.105231] Lustre: nbp8-MDT0000: Client 7b269e69-afb6-2856-cbbf-9e25064bdcdc (at 10.151.11.240@o2ib) reconnecting [346857.139578] Lustre: Skipped 14 previous similar messages [346859.205097] Lustre: nbp8-MDT0000: Client c197eaaa-b219-35fb-0fd7-2f3be32a83c5 (at 10.149.7.58@o2ib313) reconnecting [346859.239753] Lustre: Skipped 42 previous similar messages [346867.645846] Lustre: nbp8-MDT0000: Client 8da09fb9-92ee-3ae4-9a51-76a1b4f0abc6 (at 10.141.6.153@o2ib417) reconnecting [346867.680766] Lustre: Skipped 8 previous similar messages [346899.871024] Lustre: nbp8-MDT0000: Client 9bfdc95f-6a10-9895-b85e-0bf342740352 (at 10.149.9.239@o2ib313) reconnecting [346899.905961] Lustre: Skipped 1 previous similar message [346900.839526] Lustre: nbp8-MDT0000: haven't heard from client 16a5ffb5-bee4-4243-3ffd-ed1c61c206a3 (at 10.151.51.106@o2ib) in 192 seconds. I think it's dead, and I am evicting it. exp ffff897eec78dc00, cur 1591029015 expire 1591028865 last 1591028823 [346929.585208] LNet: Service thread pid 8536 was inactive for 697.57s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [346929.641018] LNet: Skipped 3 previous similar messages [346929.657926] Pid: 8536, comm: mdt01_020 3.10.0-1062.12.1.el7_lustre2124.x86_64 #1 SMP Tue Mar 17 13:32:19 PDT 2020 [346929.657931] Call Trace: [346929.657943] [] ldlm_completion_ast+0x430/0x860 [ptlrpc] [346929.662922] [] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] [346929.662947] [] mdt_object_local_lock+0x50b/0xb20 [mdt] [346929.662958] [] mdt_object_lock_internal+0x70/0x360 [mdt] [346929.662967] [] mdt_reint_object_lock+0x2c/0x60 [mdt] [346929.662982] [] mdt_reint_striped_lock+0x8c/0x510 [mdt] [346929.662996] [] mdt_reint_setattr+0x676/0x1290 [mdt] [346929.663007] [] mdt_reint_rec+0x83/0x210 [mdt] [346929.663017] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [346929.663028] [] mdt_reint+0x67/0x140 [mdt] [346929.663083] [] tgt_request_handle+0xada/0x1570 [ptlrpc] [346929.663118] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [346929.663152] [] ptlrpc_main+0xb34/0x1470 [ptlrpc] [346929.663157] [] kthread+0xd1/0xe0 [346929.663162] [] ret_from_fork_nospec_end+0x0/0x39 [346929.663185] [] 0xffffffffffffffff [346929.663189] LustreError: dumping log to /tmp/lustre-log.1591029043.8536 [346930.510255] LNet: Service thread pid 12628 was inactive for 698.57s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [346930.566339] Pid: 12628, comm: mdt01_081 3.10.0-1062.12.1.el7_lustre2124.x86_64 #1 SMP Tue Mar 17 13:32:19 PDT 2020 [346930.566341] Call Trace: [346930.566354] [] ldlm_completion_ast+0x430/0x860 [ptlrpc] [346930.589382] [] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] [346930.589407] [] mdt_object_local_lock+0x50b/0xb20 [mdt] [346930.589418] [] mdt_object_lock_internal+0x70/0x360 [mdt] [346930.589428] [] mdt_reint_object_lock+0x2c/0x60 [mdt] [346930.589442] [] mdt_reint_striped_lock+0x8c/0x510 [mdt] [346930.589454] [] mdt_reint_setattr+0x676/0x1290 [mdt] [346930.589465] [] mdt_reint_rec+0x83/0x210 [mdt] [346930.589474] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [346930.589484] [] mdt_reint+0x67/0x140 [mdt] [346930.589535] [] tgt_request_handle+0xada/0x1570 [ptlrpc] [346930.589569] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [346930.589601] [] ptlrpc_main+0xb34/0x1470 [ptlrpc] [346930.589607] [] kthread+0xd1/0xe0 [346930.589610] [] ret_from_fork_nospec_end+0x0/0x39 [346930.589635] [] 0xffffffffffffffff [346930.589638] Pid: 12641, comm: mdt01_090 3.10.0-1062.12.1.el7_lustre2124.x86_64 #1 SMP Tue Mar 17 13:32:19 PDT 2020 [346930.589639] Call Trace: [346930.589670] [] ldlm_completion_ast+0x430/0x860 [ptlrpc] [346930.589696] [] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] [346930.589707] [] mdt_object_local_lock+0x50b/0xb20 [mdt] [346930.589716] [] mdt_object_lock_internal+0x70/0x360 [mdt] [346930.589726] [] mdt_reint_object_lock+0x2c/0x60 [mdt] [346930.589737] [] mdt_reint_striped_lock+0x8c/0x510 [mdt] [346930.589748] [] mdt_reint_setattr+0x676/0x1290 [mdt] [346930.589758] [] mdt_reint_rec+0x83/0x210 [mdt] [346930.589767] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [346930.589777] [] mdt_reint+0x67/0x140 [mdt] [346930.589815] [] tgt_request_handle+0xada/0x1570 [ptlrpc] [346930.589846] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [346930.589877] [] ptlrpc_main+0xb34/0x1470 [ptlrpc] [346930.589879] [] kthread+0xd1/0xe0 [346930.589881] [] ret_from_fork_nospec_end+0x0/0x39 [346930.589885] [] 0xffffffffffffffff [346930.589888] Pid: 14076, comm: mdt01_114 3.10.0-1062.12.1.el7_lustre2124.x86_64 #1 SMP Tue Mar 17 13:32:19 PDT 2020 [346930.589888] Call Trace: [346930.589919] [] ldlm_completion_ast+0x430/0x860 [ptlrpc] [346930.589945] [] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] [346930.589955] [] mdt_object_local_lock+0x50b/0xb20 [mdt] [346930.589965] [] mdt_object_lock_internal+0x70/0x360 [mdt] [346930.589974] [] mdt_reint_object_lock+0x2c/0x60 [mdt] [346930.589985] [] mdt_reint_striped_lock+0x8c/0x510 [mdt] [346930.589996] [] mdt_reint_setattr+0x676/0x1290 [mdt] [346930.590006] [] mdt_reint_rec+0x83/0x210 [mdt] [346930.590016] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [346930.590025] [] mdt_reint+0x67/0x140 [mdt] [346930.590063] [] tgt_request_handle+0xada/0x1570 [ptlrpc] [346930.590095] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [346930.590125] [] ptlrpc_main+0xb34/0x1470 [ptlrpc] [346930.590127] [] kthread+0xd1/0xe0 [346930.590129] [] ret_from_fork_nospec_end+0x0/0x39 [346930.590133] [] 0xffffffffffffffff [346930.590139] Pid: 8545, comm: mdt01_024 3.10.0-1062.12.1.el7_lustre2124.x86_64 #1 SMP Tue Mar 17 13:32:19 PDT 2020 [346930.590139] Call Trace: [346930.590170] [] ldlm_completion_ast+0x430/0x860 [ptlrpc] [346930.590196] [] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] [346930.590206] [] mdt_object_local_lock+0x50b/0xb20 [mdt] [346930.590216] [] mdt_object_lock_internal+0x70/0x360 [mdt] [346930.590230] [] mdt_reint_object_lock+0x2c/0x60 [mdt] [346930.590250] [] mdt_reint_striped_lock+0x8c/0x510 [mdt] [346930.590266] [] mdt_reint_setattr+0x676/0x1290 [mdt] [346930.590282] [] mdt_reint_rec+0x83/0x210 [mdt] [346930.590293] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [346930.590305] [] mdt_reint+0x67/0x140 [mdt] [346930.590346] [] tgt_request_handle+0xada/0x1570 [ptlrpc] [346930.590381] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [346930.590416] [] ptlrpc_main+0xb34/0x1470 [ptlrpc] [346930.590426] [] kthread+0xd1/0xe0 [346930.590430] [] ret_from_fork_nospec_end+0x0/0x39 [346930.590436] [] 0xffffffffffffffff [346930.590439] LNet: Service thread pid 8535 was inactive for 698.64s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. [346930.590441] LNet: Skipped 41 previous similar messages [346946.099344] Lustre: nbp8-MDT0000: Client e0ae77b1-ac96-89f1-29b6-b7084dd72f17 (at 10.151.37.234@o2ib) reconnecting [346976.836311] Lustre: nbp8-MDT0000: haven't heard from client b044b6b7-6997-5fba-790a-8a720ba955c3 (at 10.151.19.166@o2ib) in 166 seconds. I think it's dead, and I am evicting it. exp ffff89a139a56000, cur 1591029091 expire 1591028941 last 1591028925 [346978.685469] Lustre: nbp8-MDT0000: Client c03c59fe-6eac-8cee-4fea-a49c8c22f9b4 (at 10.151.9.121@o2ib) reconnecting [346978.719532] Lustre: Skipped 254 previous similar messages [347028.772859] Lustre: 10531:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (30/30), not sending early reply req@ffff899bd72c1b00 x1666040092545008/t0(0) o101->26b489ef-a54d-07af-a2f7-cc23dc1b054e@10.151.3.194@o2ib:307/0 lens 376/0 e 1 to 0 dl 1591029172 ref 2 fl New:/0/ffffffff rc 0/-1 [347028.868992] Lustre: 10531:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1360 previous similar messages [347047.745439] Lustre: nbp8-MDT0000: Client 8af36469-27a7-018d-2d46-92002e98076d (at 10.151.4.27@o2ib) reconnecting [347047.779221] Lustre: Skipped 102 previous similar messages [347052.843299] Lustre: nbp8-MDT0000: haven't heard from client 2f51c5e0-bd2c-99eb-45a0-bf484f8eca70 (at 10.151.15.48@o2ib) in 221 seconds. I think it's dead, and I am evicting it. exp ffff899f57fa5800, cur 1591029167 expire 1591029017 last 1591028946 [347056.926888] LustreError: 12627:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1591028346, 825s ago); not entering recovery in server code, just going back to sleep ns: mdt-nbp8-MDT0000_UUID lock: ffff897ec4bdf500/0xa22cee2e083cce81 lrc: 3/0,1 mode: --/PW res: [0x3608a9c51:0x10:0x0].0x0 bits 0x2/0x0 rrc: 134 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 12627 timeout: 0 lvb_type: 0 [347056.926890] LustreError: 12628:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1591028346, 825s ago); not entering recovery in server code, just going back to sleep ns: mdt-nbp8-MDT0000_UUID lock: ffff897d3dee2d00/0xa22cee2e083ccf37 lrc: 3/0,1 mode: --/PW res: [0x3608a9c51:0x10:0x0].0x0 bits 0x2/0x0 rrc: 134 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 12628 timeout: 0 lvb_type: 0 [347056.926894] LustreError: 12628:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 1 previous similar message [347057.223565] LustreError: 12627:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 9 previous similar messages [347126.680903] LustreError: 47155:0:(service.c:3361:ptlrpc_svcpt_health_check()) mdt: unhealthy - request has been waiting 701s [347128.840959] Lustre: nbp8-MDT0000: haven't heard from client 582d7185-56a3-be10-ef42-8555f655e84c (at 10.151.3.165@o2ib) in 204 seconds. I think it's dead, and I am evicting it. exp ffff899f6674e000, cur 1591029243 expire 1591029093 last 1591029039 [347128.913345] Lustre: Skipped 2 previous similar messages [347178.249195] Lustre: nbp8-MDT0000: Client 6a0c6c96-cfcd-d63a-407e-4b229d3ba7a4 (at 10.151.1.187@o2ib) reconnecting [347178.283257] Lustre: Skipped 56 previous similar messages [347204.845957] Lustre: nbp8-MDT0000: haven't heard from client e7e89747-dab2-6d1a-f30c-d15a02b09f9b (at 10.151.3.186@o2ib) in 209 seconds. I think it's dead, and I am evicting it. exp ffff89a0176f7800, cur 1591029319 expire 1591029169 last 1591029110 [347204.918357] Lustre: Skipped 1 previous similar message [347306.911564] LustreError: 47406:0:(service.c:3361:ptlrpc_svcpt_health_check()) mdt: unhealthy - request has been waiting 881s [347357.850330] Lustre: nbp8-MDT0000: haven't heard from client 6b847652-320d-597c-fd51-697e85ef5722 (at 10.151.32.139@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a27ae5b800, cur 1591029472 expire 1591029322 last 1591029245 [347357.923008] Lustre: Skipped 13 previous similar messages [347475.913751] Lustre: nbp8-MDT0000: Client 9c5fd75d-41f8-48fe-af1e-ec9dbb64cfc9 (at 10.141.5.87@o2ib417) reconnecting [347475.948386] Lustre: Skipped 189 previous similar messages [347475.966489] Lustre: nbp8-MDT0000: Connection restored to 081f5f86-b7a8-f83e-b1f3-5f0095ae6587 (at 10.141.5.87@o2ib417) [347475.966491] Lustre: Skipped 686 previous similar messages [347486.142087] LustreError: 47758:0:(service.c:3361:ptlrpc_svcpt_health_check()) mdt: unhealthy - request has been waiting 1060s [347540.811655] Lustre: 10531:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (30/30), not sending early reply req@ffff8976577aad00 x1666540226317376/t0(0) o101->28b5ff43-d28c-23f6-c83b-a634f37e1c31@10.151.36.29@o2ib:64/0 lens 584/0 e 1 to 0 dl 1591029684 ref 2 fl New:/2/ffffffff rc 0/-1 [347540.907505] Lustre: 10531:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1978 previous similar messages [347666.372627] LustreError: 48015:0:(service.c:3361:ptlrpc_svcpt_health_check()) mdt: unhealthy - request has been waiting 1240s [347744.863533] Lustre: nbp8-MDT0000: haven't heard from client fcf9f7c9-b14a-b288-7124-c2bb8ef467dd (at 10.149.5.140@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a016fbe000, cur 1591029859 expire 1591029709 last 1591029632 [347744.936784] Lustre: Skipped 28 previous similar messages [347846.603360] LustreError: 48260:0:(service.c:3361:ptlrpc_svcpt_health_check()) mdt: unhealthy - request has been waiting 1421s [347993.370338] Lustre: nbp8-MDT0000: Client 6ffe721f-2510-e57b-42bf-e864d69894c8 (at 10.151.16.219@o2ib) reconnecting [347993.404698] Lustre: Skipped 726 previous similar messages [348026.837929] LustreError: 48486:0:(service.c:3361:ptlrpc_svcpt_health_check()) mdt: unhealthy - request has been waiting 1601s [348093.420712] Lustre: nbp8-MDT0000: Connection restored to 3a7f693d-0cf6-040d-ceb6-1b46c897deb4 (at 10.151.28.176@o2ib) [348093.420718] Lustre: Skipped 797 previous similar messages [348142.013746] Lustre: 8564:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (30/30), not sending early reply req@ffff89776aad1f80 x1665998009902912/t0(0) o101->5b4ec80d-d386-937b-cf65-2720c362ed88@10.151.16.138@o2ib:666/0 lens 576/0 e 1 to 0 dl 1591030286 ref 2 fl New:/2/ffffffff rc 0/-1 [348142.109883] Lustre: 8564:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 2191 previous similar messages [348206.080542] LustreError: 48740:0:(service.c:3361:ptlrpc_svcpt_health_check()) mdt: unhealthy - request has been waiting 1780s [348386.311142] LustreError: 48986:0:(service.c:3361:ptlrpc_svcpt_health_check()) mdt: unhealthy - request has been waiting 1960s [348517.893976] Lustre: nbp8-MDT0000: haven't heard from client ad9757fd-c484-bc95-8e42-a9600c25419e (at 10.151.16.102@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a038262400, cur 1591030632 expire 1591030482 last 1591030405 [348517.966659] Lustre: Skipped 88 previous similar messages [348566.555236] LustreError: 49871:0:(service.c:3361:ptlrpc_svcpt_health_check()) mdt: unhealthy - request has been waiting 2141s [348601.690661] Lustre: nbp8-MDT0000: Client b2b67ca9-cd3c-016d-63e7-c10b522d051b (at 10.151.32.166@o2ib) reconnecting [348601.725016] Lustre: Skipped 311 previous similar messages [348694.962247] Lustre: nbp8-MDT0000: Connection restored to 7b3686c1-ef5b-d5f8-5f4c-7101a5662cf2 (at 10.151.28.231@o2ib) [348694.962252] Lustre: Skipped 613 previous similar messages [348744.323817] Lustre: 10531:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (30/-125), not sending early reply req@ffff899bf3216780 x1666050911692720/t0(0) o39->3329fbed-8f91-3835-d596-ba72c1983716@10.151.17.189@o2ib:513/0 lens 224/0 e 0 to 0 dl 1591030888 ref 2 fl New:/0/ffffffff rc 0/-1 [348744.420530] Lustre: 10531:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 3693 previous similar messages [348746.792295] LustreError: 50220:0:(service.c:3361:ptlrpc_svcpt_health_check()) mdt: unhealthy - request has been waiting 2321s [348886.938657] Lustre: 6562:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (349:3553s); client may timeout. req@ffff899a6d68c800 x1666423864517424/t1529804920178(0) o36->f8849689-cb43-8380-e884-febc61cd8bbb@10.151.12.23@o2ib:93/0 lens 488/424 e 1 to 0 dl 1591027448 ref 1 fl Complete:/0/0 rc 0/0 [348886.938808] LNet: Service thread pid 8561 completed after 3901.88s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). [348886.944232] LustreError: 8609:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.151.10.38@o2ib: deadline 600:268s ago req@ffff899bf321cc80 x1667917600295552/t0(0) o101->7a202e19-79a8-8021-be41-1e21053c44b5@10.151.10.38@o2ib:358/0 lens 576/0 e 1 to 0 dl 1591030733 ref 1 fl Interpret:/2/ffffffff rc 0/-1 [348887.196150] Lustre: 6562:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 2781 previous similar messages [348887.438328] Lustre: 8605:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (440:2215s); client may timeout. req@ffff899b823a0000 x1666041654638160/t0(0) o101->c6421dff-6e50-6854-8a25-08abd78ea57a@10.151.1.83@o2ib:676/0 lens 4512/0 e 0 to 0 dl 1591028786 ref 1 fl Interpret:/0/ffffffff rc 0/-1 [348887.444311] LustreError: 8617:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.151.15.51@o2ib: deadline 600:265s ago req@ffff899bf0e25580 x1666148353950560/t0(0) o101->f9f783fc-0e10-5e61-55fa-610140c7d632@10.151.15.51@o2ib:361/0 lens 576/0 e 1 to 0 dl 1591030736 ref 1 fl Interpret:/2/ffffffff rc 0/-1 [348887.444313] LustreError: 8617:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 82 previous similar messages [348887.676370] Lustre: 8605:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 2330 previous similar messages [348888.438539] Lustre: 12656:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (600:501s); client may timeout. req@ffff899b973cd100 x1666040084519664/t0(0) o101->b2cf1e44-0392-7ecb-8bc1-74faa00ddb42@10.151.8.29@o2ib:126/0 lens 376/0 e 1 to 0 dl 1591030501 ref 1 fl Interpret:/2/ffffffff rc 0/-1 [348888.505555] LustreError: 6562:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.149.14.176@o2ib313: deadline 702:389s ago req@ffff8977687b5a00 x1665995743094000/t0(0) o39->1281bc9e-41f8-4b68-c205-c49028f49193@10.149.14.176@o2ib313:238/0 lens 224/0 e 0 to 0 dl 1591030613 ref 1 fl Interpret:/0/ffffffff rc 0/-1 [348888.505557] LustreError: 6562:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 219 previous similar messages [348888.678577] Lustre: 12656:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 3857 previous similar messages [349213.484700] Lustre: nbp8-MDT0000: Client 1f21c281-53aa-e915-9838-95586f3154e2 (at 10.149.8.151@o2ib313) reconnecting [349213.519679] Lustre: Skipped 776 previous similar messages [349324.794301] Lustre: nbp8-MDT0000: Connection restored to 0ca0379a-0411-c72a-6740-b3d9fb97743a (at 10.149.2.102@o2ib313) [349324.794305] Lustre: Skipped 598 previous similar messages [350005.389941] Lustre: MGS: Connection restored to 478b4dd1-0dfb-ddee-98e5-051c8dd315b5 (at 10.151.32.170@o2ib) [350005.389946] Lustre: Skipped 427 previous similar messages [350656.250115] Lustre: MGS: Connection restored to 9993618a-49ff-a824-898c-07ec2980d6e1 (at 10.149.12.205@o2ib313) [350656.250120] Lustre: Skipped 81 previous similar messages [351323.131689] Lustre: MGS: Connection restored to 0439b7e9-be09-01ea-fcb7-b52dd00e6bbb (at 10.151.3.52@o2ib) [351323.131695] Lustre: Skipped 118 previous similar messages [351975.216357] Lustre: MGS: Connection restored to 78a678a0-8656-3b51-653c-6f5889418844 (at 10.149.6.76@o2ib313) [351975.216363] Lustre: Skipped 147 previous similar messages [352580.641696] Lustre: MGS: Connection restored to 341a80d3-c0d7-e3ba-5761-1c14491a5644 (at 10.151.8.13@o2ib) [352580.641702] Lustre: Skipped 19 previous similar messages [353298.800044] Lustre: MGS: Connection restored to dfade296-1a9c-c36a-7c35-b42a21810cca (at 10.151.52.178@o2ib) [353298.800050] Lustre: Skipped 61 previous similar messages [353999.349877] Lustre: MGS: Connection restored to 2d2e3458-86ac-3a1b-8021-41b81c6ff140 (at 10.151.0.179@o2ib) [353999.349883] Lustre: Skipped 249 previous similar messages [355106.547599] Lustre: MGS: Connection restored to f862147d-5e3a-7163-1b87-d083af5eafee (at 10.151.16.88@o2ib) [355106.547604] Lustre: Skipped 299 previous similar messages [355720.200507] Lustre: MGS: Connection restored to 20f66789-4fa3-9522-e9ae-48cbb0a7baa2 (at 10.151.18.166@o2ib) [355720.200512] Lustre: Skipped 177 previous similar messages [356320.308978] Lustre: MGS: Connection restored to 133b5650-0e70-30ea-d81d-d33d97cb0d22 (at 10.151.45.12@o2ib) [356320.308984] Lustre: Skipped 126 previous similar messages [356986.231491] Lustre: MGS: Connection restored to 2e5ebb1e-8c4a-fbdb-472f-faa4f07f0c76 (at 10.151.32.151@o2ib) [356986.231497] Lustre: Skipped 242 previous similar messages [357664.765444] Lustre: MGS: Connection restored to f1ee187c-b61e-568a-f8d6-1d3d9a736dcd (at 10.151.18.187@o2ib) [357664.765450] Lustre: Skipped 115 previous similar messages [358402.990836] Lustre: MGS: Connection restored to dec6bbdc-5499-107f-4598-472388c71e25 (at 10.151.33.16@o2ib) [358402.990841] Lustre: Skipped 131 previous similar messages [359021.985811] Lustre: MGS: Connection restored to 06ca9284-da8c-feaf-8520-b650ed3d379a (at 10.151.8.35@o2ib) [359021.985817] Lustre: Skipped 39 previous similar messages [359646.743187] Lustre: MGS: Connection restored to 0439b7e9-be09-01ea-fcb7-b52dd00e6bbb (at 10.151.3.52@o2ib) [359646.743192] Lustre: Skipped 343 previous similar messages [360362.663720] Lustre: MGS: Connection restored to 33ece1cc-94f6-cdc5-5a1e-2eb7b352dad6 (at 10.151.2.221@o2ib) [360362.663726] Lustre: Skipped 289 previous similar messages [361237.933892] Lustre: MGS: Connection restored to 1a162bfd-345b-63c7-53b3-820eead01574 (at 10.149.3.52@o2ib313) [361237.933898] Lustre: Skipped 53 previous similar messages [362228.465044] Lustre: MGS: Connection restored to e8f4e8a8-72a7-0b09-4f97-fe721a271298 (at 10.151.7.84@o2ib) [362228.465050] Lustre: Skipped 23 previous similar messages [362851.077822] Lustre: MGS: Connection restored to 0195fd6b-38b1-b4ae-ece3-9087fa459a68 (at 10.149.1.221@o2ib313) [362851.077829] Lustre: Skipped 205 previous similar messages [363513.535647] Lustre: MGS: Connection restored to 341a80d3-c0d7-e3ba-5761-1c14491a5644 (at 10.151.8.13@o2ib) [363513.535653] Lustre: Skipped 543 previous similar messages [364141.853247] Lustre: MGS: Connection restored to 3d41c02c-4d21-55f6-1aa8-4fa4624d1dc0 (at 10.151.43.172@o2ib) [364141.853253] Lustre: Skipped 237 previous similar messages [364809.032440] Lustre: MGS: Connection restored to b050374b-85f1-ab29-d6c4-d70b76cf28ec (at 10.149.6.96@o2ib313) [364809.032446] Lustre: Skipped 209 previous similar messages [365422.609996] Lustre: MGS: Connection restored to fa8605d1-1fca-91d2-ae8b-ca14b201c3ee (at 10.151.3.70@o2ib) [365422.610001] Lustre: Skipped 67 previous similar messages [366088.630171] Lustre: MGS: Connection restored to a11ba706-b0e2-dd53-5fc2-b9e5dfa01e3d (at 10.149.14.147@o2ib313) [366088.630178] Lustre: Skipped 649 previous similar messages [366748.168206] Lustre: MGS: Connection restored to ea7d9d30-6667-7dcc-724a-6418e0e7c090 (at 10.151.52.180@o2ib) [366748.168213] Lustre: Skipped 275 previous similar messages [367110.483810] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [367110.517320] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.43.151@o2ib (303): c: 32, oc: 0, rc: 32 [367352.143917] Lustre: MGS: Connection restored to 014540f7-ef97-d4b8-0dda-a12f0abb185a (at 10.151.44.11@o2ib) [367352.143922] Lustre: Skipped 283 previous similar messages [367500.586648] Lustre: nbp8-MDT0000: haven't heard from client 76563dfd-0351-fe61-4561-478da8854dba (at 10.151.23.82@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899f6af53000, cur 1591049614 expire 1591049464 last 1591049387 [367596.500740] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [367596.534246] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.23.82@o2ib (322): c: 30, oc: 0, rc: 32 [367621.501670] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [367621.535176] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [367621.568372] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.23.31@o2ib (328): c: 30, oc: 0, rc: 32 [367621.609000] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [368170.315170] Lustre: MGS: Connection restored to 5bd5ffc9-14b8-df9a-b857-081db70589ba (at 10.149.12.57@o2ib313) [368170.315176] Lustre: Skipped 225 previous similar messages [368829.055002] Lustre: MGS: Connection restored to db667276-4973-4a4a-8224-95848fc65722 (at 10.151.37.116@o2ib) [368829.055008] Lustre: Skipped 603 previous similar messages [369453.313426] Lustre: MGS: Connection restored to 0439b7e9-be09-01ea-fcb7-b52dd00e6bbb (at 10.151.3.52@o2ib) [369453.313431] Lustre: Skipped 315 previous similar messages [370058.903873] Lustre: MGS: Connection restored to 62d35b90-d867-daa0-1bfa-23b99e00b477 (at 10.149.15.121@o2ib313) [370058.903879] Lustre: Skipped 455 previous similar messages [370829.457566] Lustre: MGS: Connection restored to 44396008-8470-2e98-ca5a-d708f7dc8f21 (at 10.151.54.122@o2ib) [370829.457572] Lustre: Skipped 271 previous similar messages [371516.835506] Lustre: MGS: Connection restored to 53167723-c802-9e3f-56d8-cf6d8c602d53 (at 10.151.15.97@o2ib) [371516.835512] Lustre: Skipped 337 previous similar messages [372195.610328] Lustre: MGS: Connection restored to 80ac0db9-0648-e11c-edcf-cbf3d852a1a0 (at 10.151.6.66@o2ib) [372195.610334] Lustre: Skipped 95 previous similar messages [372484.681339] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [372484.714845] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 2 previous similar messages [372484.748332] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.12.149@o2ib (220): c: 32, oc: 0, rc: 32 [372484.789268] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 2 previous similar messages [372497.680951] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [372497.714453] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.12.75@o2ib (303): c: 32, oc: 0, rc: 32 [372526.681873] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [372526.715379] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.23.43@o2ib (278): c: 32, oc: 0, rc: 32 [372567.683458] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [372567.716947] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.12.13@o2ib (303): c: 32, oc: 0, rc: 32 [372921.560682] Lustre: MGS: Connection restored to 962c8c09-a517-4e56-da46-daed47075107 (at 10.151.44.146@o2ib) [372921.560688] Lustre: Skipped 41 previous similar messages [373527.332531] Lustre: MGS: Connection restored to 86361629-928c-1713-40e7-83271c4e2cd6 (at 10.151.50.144@o2ib) [373527.332537] Lustre: Skipped 59 previous similar messages [374089.739235] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [374089.772728] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.24.86@o2ib (298): c: 32, oc: 0, rc: 32 [374129.915491] Lustre: MGS: Connection restored to e649c367-03ff-3bd9-05ba-c1b5581b2008 (at 10.151.52.51@o2ib) [374129.915497] Lustre: Skipped 229 previous similar messages [374295.746704] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [374295.780206] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.24.93@o2ib (216): c: 32, oc: 0, rc: 32 [374734.899110] Lustre: MGS: Connection restored to 2b8bf937-bf58-0198-e800-32deb53d6be7 (at 10.151.38.58@o2ib) [374734.899116] Lustre: Skipped 127 previous similar messages [375364.795941] Lustre: MGS: Connection restored to b8d95100-2b1b-f0e1-a23e-6691c941ea1d (at 10.149.3.214@o2ib313) [375364.795948] Lustre: Skipped 59 previous similar messages [375849.803588] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [375849.837084] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.3.93@o2ib (291): c: 32, oc: 0, rc: 32 [375861.894110] Lustre: nbp8-MDT0000: haven't heard from client cc365eed-5ed6-4234-506f-cca88976f1c2 (at 10.149.12.57@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897fe305a400, cur 1591057975 expire 1591057825 last 1591057748 [375861.967359] Lustre: Skipped 9 previous similar messages [375876.804676] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [375876.838175] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.54.121@o2ib (237): c: 32, oc: 0, rc: 32 [375940.807035] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [375940.840532] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.35.63@o2ib (303): c: 32, oc: 0, rc: 32 [375996.537740] Lustre: MGS: Connection restored to e8f4e8a8-72a7-0b09-4f97-fe721a271298 (at 10.151.7.84@o2ib) [375996.537746] Lustre: Skipped 75 previous similar messages [376607.088874] Lustre: MGS: Connection restored to 2c3cf03c-a5e4-3604-be1d-a5148cb632d2 (at 10.151.32.145@o2ib) [376607.088880] Lustre: Skipped 69 previous similar messages [377207.847483] Lustre: MGS: Connection restored to 5ba6afc4-9da7-a13a-c1e9-d5dbeace6089 (at 10.141.7.15@o2ib417) [377207.847490] Lustre: Skipped 83 previous similar messages [377922.487827] Lustre: MGS: Connection restored to b1de6b7f-9681-fc8a-0b52-d5075c812369 (at 10.149.1.28@o2ib313) [377922.487833] Lustre: Skipped 831 previous similar messages [378606.208856] Lustre: MGS: Connection restored to dff685b1-b215-cda0-74e8-ab793721e82d (at 10.151.23.26@o2ib) [378606.208861] Lustre: Skipped 247 previous similar messages [379234.915540] Lustre: MGS: Connection restored to 43a19893-5e17-37a4-2eba-6b220c6a7aa7 (at 10.149.2.77@o2ib313) [379234.915545] Lustre: Skipped 57 previous similar messages [379845.021419] Lustre: MGS: Connection restored to 9a2357c7-aec3-c1bb-be7c-4ecabf1d424a (at 10.149.1.36@o2ib313) [379845.021424] Lustre: Skipped 389 previous similar messages [380454.176235] Lustre: MGS: Connection restored to eca01c1a-4f27-33ed-f68d-9c1850d1067a (at 10.151.28.205@o2ib) [380454.176244] Lustre: Skipped 177 previous similar messages [381104.602761] Lustre: MGS: Connection restored to b8e2a3ab-8ef7-74b7-cf09-b205f410abc7 (at 10.151.3.32@o2ib) [381104.602767] Lustre: Skipped 49 previous similar messages [381157.997836] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [381158.031335] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.5.98@o2ib (303): c: 32, oc: 0, rc: 32 [381706.820281] Lustre: MGS: Connection restored to 4f98ebbc-0e62-e0d7-ee70-3e40aa3cacd4 (at 10.151.38.33@o2ib) [381706.820287] Lustre: Skipped 439 previous similar messages [382358.865379] Lustre: MGS: Connection restored to 1a162bfd-345b-63c7-53b3-820eead01574 (at 10.149.3.52@o2ib313) [382358.865385] Lustre: Skipped 1005 previous similar messages [382916.063031] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [382916.096525] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.33.182@o2ib (303): c: 32, oc: 0, rc: 32 [382919.062250] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [382919.095751] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.33.189@o2ib (243): c: 32, oc: 0, rc: 32 [383196.394640] Lustre: MGS: Connection restored to ff6c501f-06bb-1bf6-fc6e-ab8b05d5e3f1 (at 10.151.33.138@o2ib) [383196.394646] Lustre: Skipped 87 previous similar messages [383801.472652] Lustre: MGS: Connection restored to 1e2368f8-8a83-e8a6-5b93-034eb93036ad (at 10.151.31.52@o2ib) [383801.472658] Lustre: Skipped 135 previous similar messages [384233.199333] Lustre: nbp8-MDT0000: haven't heard from client 15094f45-69ba-d5a6-9717-b745d15ad224 (at 10.153.10.138@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899f60653c00, cur 1591066346 expire 1591066196 last 1591066119 [384233.272876] Lustre: Skipped 1 previous similar message [384411.299707] Lustre: MGS: Connection restored to e4609b52-58d4-aef5-4d33-07120cb6a51c (at 10.151.35.42@o2ib) [384411.299712] Lustre: Skipped 105 previous similar messages [385042.350335] Lustre: MGS: Connection restored to 7940b99f-4383-6199-02be-28607a11853f (at 10.149.3.216@o2ib313) [385042.350341] Lustre: Skipped 161 previous similar messages [385231.146710] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [385231.180202] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.52.150@o2ib (303): c: 32, oc: 0, rc: 32 [385650.086056] Lustre: MGS: Connection restored to f5303ff0-3f2b-4994-eeab-a06f75a9211b (at 10.151.24.96@o2ib) [385650.086062] Lustre: Skipped 249 previous similar messages [386171.182120] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [386171.215622] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.3.30@o2ib (289): c: 32, oc: 0, rc: 32 [386252.410012] Lustre: MGS: Connection restored to 7fb19670-1e52-853d-d731-8b3d6855525c (at 10.149.1.189@o2ib313) [386252.410018] Lustre: Skipped 33 previous similar messages [387185.003350] Lustre: MGS: Connection restored to e8f4e8a8-72a7-0b09-4f97-fe721a271298 (at 10.151.7.84@o2ib) [387185.003355] Lustre: Skipped 61 previous similar messages [387865.978071] Lustre: MGS: Connection restored to ffc4228f-969f-de11-73a6-0d5b0d3b0d4c (at 10.151.6.20@o2ib) [387865.978077] Lustre: Skipped 169 previous similar messages [388551.695984] Lustre: MGS: Connection restored to 7ca242c4-795a-4af9-ac41-6891addcf4e3 (at 10.149.3.54@o2ib313) [388551.695990] Lustre: Skipped 487 previous similar messages [389151.757206] Lustre: nbp8-MDT0000: Connection restored to 8c8d02ed-185c-ab75-4b73-845cbd8cb96b (at 10.151.45.172@o2ib) [389151.757212] Lustre: Skipped 482 previous similar messages [389842.436433] Lustre: MGS: Connection restored to cb6a9d05-64f7-136a-9693-50b82d31a723 (at 10.151.28.10@o2ib) [389842.436438] Lustre: Skipped 160 previous similar messages [390530.444468] Lustre: MGS: Connection restored to 1a162bfd-345b-63c7-53b3-820eead01574 (at 10.149.3.52@o2ib313) [390530.444474] Lustre: Skipped 77 previous similar messages [391240.766569] Lustre: MGS: Connection restored to 822c9fd3-c1f9-b5de-0111-7e7ac5c49f0c (at 10.151.30.73@o2ib) [391240.766575] Lustre: Skipped 91 previous similar messages [391902.294596] Lustre: MGS: Connection restored to 531d14ac-af2c-37f6-8b15-b06350a81066 (at 10.149.8.32@o2ib313) [391902.294601] Lustre: Skipped 43 previous similar messages [392541.591820] Lustre: MGS: Connection restored to 2b3d1328-1dc8-abac-e861-22d578f73508 (at 10.149.3.182@o2ib313) [392541.591826] Lustre: Skipped 177 previous similar messages [393219.331883] Lustre: MGS: Connection restored to e8f4e8a8-72a7-0b09-4f97-fe721a271298 (at 10.151.7.84@o2ib) [393219.331889] Lustre: Skipped 1111 previous similar messages [393840.902020] Lustre: MGS: Connection restored to 0b712552-7f02-4663-21e3-1a5788b918a2 (at 10.149.8.133@o2ib313) [393840.902026] Lustre: Skipped 305 previous similar messages [394526.166197] Lustre: MGS: Connection restored to 9e1f58d0-acc9-498f-989f-4588d2ba53a5 (at 10.149.15.62@o2ib313) [394526.166203] Lustre: Skipped 49 previous similar messages [395159.477666] Lustre: MGS: Connection restored to 7cab292e-c0d0-5ea8-c6bd-8590697bdcd2 (at 10.151.34.182@o2ib) [395159.477672] Lustre: Skipped 51 previous similar messages [395787.220173] Lustre: MGS: Connection restored to da35e41b-589e-c834-31db-baecb5183c77 (at 10.149.1.148@o2ib313) [395787.220178] Lustre: Skipped 191 previous similar messages [396389.094529] Lustre: MGS: Connection restored to 36fae12b-9d76-f032-55a5-96e9806ca024 (at 10.151.53.45@o2ib) [396389.094535] Lustre: Skipped 335 previous similar messages [397019.799749] Lustre: MGS: Connection restored to eb120f55-295a-e301-45bd-3e0f9785ba7d (at 10.151.55.146@o2ib) [397019.799754] Lustre: Skipped 223 previous similar messages [397681.127251] Lustre: MGS: Connection restored to 6296cee4-436b-1552-edb0-7ace037f5d8f (at 10.151.32.38@o2ib) [397681.127257] Lustre: Skipped 23 previous similar messages [398356.367062] Lustre: MGS: Connection restored to 341a54e8-0687-be28-77da-95ee7edff1ee (at 10.151.32.34@o2ib) [398356.367067] Lustre: Skipped 223 previous similar messages [398972.936538] Lustre: MGS: Connection restored to 9638535b-97cd-7dec-92aa-1cc32babcd34 (at 10.149.12.242@o2ib313) [398972.936544] Lustre: Skipped 67 previous similar messages [399603.999300] Lustre: MGS: Connection restored to 3ad6b2c7-6ebd-c0b2-f227-3b98313da0f8 (at 10.151.51.45@o2ib) [399603.999305] Lustre: Skipped 151 previous similar messages [399749.579882] Process accounting resumed [400204.033604] Lustre: nbp8-MDT0000: Connection restored to dd7dffa7-74e6-e853-35db-7668a970317b (at 10.151.45.17@o2ib) [400204.033610] Lustre: Skipped 402 previous similar messages [400831.755959] Lustre: MGS: Connection restored to c24e08ae-87c3-ba27-4ee6-e8f7f31ecb48 (at 10.151.14.221@o2ib) [400831.755965] Lustre: Skipped 42 previous similar messages [402477.827462] Lustre: MGS: Connection restored to 75102e82-b863-6587-32ed-df83142cc4d1 (at 10.151.37.130@o2ib) [402477.827468] Lustre: Skipped 39 previous similar messages [402649.131497] Lustre: MGS: Connection restored to d9703531-97c2-541a-4026-8a8322871217 (at 10.151.32.11@o2ib) [402649.131503] Lustre: Skipped 1 previous similar message [402799.581807] Lustre: MGS: Connection restored to cbcc2b67-9b2c-cf71-5d69-e3773bcb417f (at 10.151.31.186@o2ib) [402799.581813] Lustre: Skipped 143 previous similar messages [403100.020026] Lustre: MGS: Connection restored to 860268f4-4c6d-8d91-371a-52118324f182 (at 10.151.2.80@o2ib) [403100.020032] Lustre: Skipped 43 previous similar messages [403755.350768] Lustre: MGS: Connection restored to a61f5698-d82f-307a-e8bc-aa0434b464ef (at 10.151.52.162@o2ib) [403755.350774] Lustre: Skipped 459 previous similar messages [404356.800147] Lustre: MGS: Connection restored to d7e2252b-90da-86b3-c9de-6b6fcaf41019 (at 10.151.45.170@o2ib) [404356.800153] Lustre: Skipped 323 previous similar messages [405083.795932] Lustre: MGS: Connection restored to a652f244-366c-e7d7-8f1b-1a98d30ccc72 (at 10.149.14.12@o2ib313) [405083.795937] Lustre: Skipped 63 previous similar messages [405709.180530] Lustre: MGS: Connection restored to 24b59e80-805c-5807-78fd-352dc87ae71d (at 10.151.33.130@o2ib) [405709.180536] Lustre: Skipped 341 previous similar messages [406437.544579] Lustre: MGS: Connection restored to af57eec2-0fb3-ba53-4de5-8b4d60dd756b (at 10.149.4.231@o2ib313) [406437.544585] Lustre: Skipped 875 previous similar messages [407067.676476] Lustre: MGS: Connection restored to 9e1f58d0-acc9-498f-989f-4588d2ba53a5 (at 10.149.15.62@o2ib313) [407067.676481] Lustre: Skipped 349 previous similar messages [407774.553407] Lustre: MGS: Connection restored to 9a6a5a4e-909b-e27f-e647-ef3d7d0aca58 (at 10.151.30.66@o2ib) [407774.553413] Lustre: Skipped 569 previous similar messages [408565.780510] Lustre: MGS: Connection restored to b7df03de-38f0-4b7e-c627-60fd8e03b250 (at 10.151.52.177@o2ib) [408565.780516] Lustre: Skipped 125 previous similar messages [409168.505150] Lustre: MGS: Connection restored to 3d18af77-3100-6b52-cb58-9948533e9545 (at 10.151.2.170@o2ib) [409168.505156] Lustre: Skipped 333 previous similar messages [409760.046287] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [409760.079774] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.5.234@o2ib (298): c: 32, oc: 0, rc: 32 [409930.540388] Lustre: MGS: Connection restored to da374612-18be-df13-fa94-67dec6846d6e (at 10.149.1.222@o2ib313) [409930.540394] Lustre: Skipped 83 previous similar messages [410682.710981] Lustre: MGS: Connection restored to 532cf7e9-cf67-9b67-1799-6031044c4bc4 (at 10.151.35.155@o2ib) [410682.710986] Lustre: Skipped 145 previous similar messages [411385.923130] Lustre: MGS: Connection restored to 348d6495-6c85-6344-8b48-4cd1733923bc (at 10.149.2.151@o2ib313) [411385.923135] Lustre: Skipped 7 previous similar messages [412035.384936] Lustre: MGS: Connection restored to 8c7f1583-cd05-475b-c645-b7ce71b9aee2 (at 10.151.37.173@o2ib) [412035.384942] Lustre: Skipped 199 previous similar messages [412939.863927] Lustre: MGS: Connection restored to 3d829d99-e2ac-37ab-df37-b8cda276285a (at 10.151.3.108@o2ib) [412939.863933] Lustre: Skipped 33 previous similar messages [414294.441983] Lustre: MGS: Connection restored to 794d391e-09c9-aca7-59f8-9476b57600f0 (at 10.149.2.13@o2ib313) [414294.441989] Lustre: Skipped 63 previous similar messages [414443.371549] Lustre: MGS: Connection restored to b1c12598-6a8e-effe-23c5-d154100caa66 (at 10.149.6.68@o2ib313) [414443.371556] Lustre: Skipped 25 previous similar messages [414846.280262] Lustre: MGS: Connection restored to 886909c0-6137-b107-1fe8-8c7196e8018c (at 10.151.34.38@o2ib) [414846.280268] Lustre: Skipped 59 previous similar messages [415148.626444] Lustre: MGS: Connection restored to 1a162bfd-345b-63c7-53b3-820eead01574 (at 10.149.3.52@o2ib313) [415148.626450] Lustre: Skipped 33 previous similar messages [415934.132713] Lustre: MGS: Connection restored to ce249dc8-07d3-f6e8-1821-8129fca9bee6 (at 10.151.3.116@o2ib) [415934.132719] Lustre: Skipped 401 previous similar messages [416969.950240] Lustre: MGS: Connection restored to bc298ed8-1eac-4d29-91fc-78335c128377 (at 10.151.57.149@o2ib) [416969.950246] Lustre: Skipped 19 previous similar messages [417621.482690] Lustre: MGS: Connection restored to f0a2a298-faf5-a153-1c33-33bfe7f95f10 (at 10.149.1.20@o2ib313) [417621.482696] Lustre: Skipped 933 previous similar messages [418576.601774] Lustre: MGS: Connection restored to eeb1448f-55ff-f621-3a97-52faf83cd8c7 (at 10.151.3.29@o2ib) [418576.601780] Lustre: Skipped 119 previous similar messages [419216.614009] Lustre: MGS: Connection restored to 980b1d02-7620-a8cd-6f6c-eb4f55672bb4 (at 10.149.3.25@o2ib313) [419216.614015] Lustre: Skipped 225 previous similar messages [419911.077206] Lustre: MGS: Connection restored to f0eb3ae7-965f-b8e6-c0ba-ba2a2d0815de (at 10.151.37.48@o2ib) [419911.077212] Lustre: Skipped 417 previous similar messages [420537.920551] Lustre: MGS: Connection restored to aec6fa76-579c-c980-58e0-c130582947c4 (at 10.151.29.201@o2ib) [420537.920557] Lustre: Skipped 63 previous similar messages [421233.058999] Lustre: MGS: Connection restored to f79c50fb-7aca-cf50-f545-c8b869c2d280 (at 10.149.5.88@o2ib313) [421233.059005] Lustre: Skipped 417 previous similar messages [422065.165099] Lustre: MGS: Connection restored to 5dee1a2b-1d7c-fe4c-9413-fb0319a28a88 (at 10.151.32.167@o2ib) [422065.165104] Lustre: Skipped 239 previous similar messages [422897.421624] Lustre: MGS: Connection restored to 63d0e821-ae84-23d1-1d62-56c6e19bff24 (at 10.151.31.180@o2ib) [422897.421631] Lustre: Skipped 189 previous similar messages [423521.970242] Lustre: MGS: Connection restored to 9b1093c2-681f-9b98-477d-8c4cac071790 (at 10.151.3.123@o2ib) [423521.970247] Lustre: Skipped 13 previous similar messages [424147.071953] Lustre: MGS: Connection restored to eeb1448f-55ff-f621-3a97-52faf83cd8c7 (at 10.151.3.29@o2ib) [424147.071959] Lustre: Skipped 649 previous similar messages [424918.120627] Lustre: MGS: Connection restored to 99815eb2-0974-c6ba-b3d8-43bc6fade23e (at 10.151.19.11@o2ib) [424918.120634] Lustre: Skipped 327 previous similar messages [425018.696110] Lustre: MGS: haven't heard from client eb995142-1111-7da4-64ef-1dece4509b52 (at 10.151.54.145@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897ec4e94000, cur 1591107130 expire 1591106980 last 1591106903 [425018.766217] Lustre: Skipped 1 previous similar message [425031.695426] Lustre: nbp8-MDT0000: haven't heard from client 254a10a9-836c-045c-7d8e-bcc9611bf9d8 (at 10.151.54.145@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897a817c3400, cur 1591107143 expire 1591106993 last 1591106916 [425128.608995] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [425128.642494] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.54.145@o2ib (323): c: 23, oc: 0, rc: 32 [425648.202518] Lustre: MGS: Connection restored to 6da922de-242e-6e03-d6b6-7bf4f884d289 (at 10.151.38.136@o2ib) [425648.202524] Lustre: Skipped 357 previous similar messages [426499.830881] Lustre: MGS: Connection restored to f5e1f854-b1e0-0a23-a85d-126023cb26cf (at 10.151.54.76@o2ib) [426499.830887] Lustre: Skipped 117 previous similar messages [427291.381783] Lustre: MGS: Connection restored to 6dfd7528-d462-3839-2761-884b6553ca52 (at 10.151.17.152@o2ib) [427291.381788] Lustre: Skipped 63 previous similar messages [427935.466338] Lustre: MGS: Connection restored to 9e1f58d0-acc9-498f-989f-4588d2ba53a5 (at 10.149.15.62@o2ib313) [427935.466344] Lustre: Skipped 13 previous similar messages [428578.237365] Lustre: MGS: Connection restored to a69b6b8c-942a-f015-5d31-da18ef969e1b (at 10.151.35.117@o2ib) [428578.237371] Lustre: Skipped 1177 previous similar messages [429286.982475] Lustre: MGS: Connection restored to ad21b351-e495-fb61-ce7c-9d00a9527292 (at 10.151.3.36@o2ib) [429286.982481] Lustre: Skipped 25 previous similar messages [430115.416529] Lustre: MGS: Connection restored to 6ac2ded7-ad05-3d22-1c29-9e4dbd4741d7 (at 10.149.1.127@o2ib313) [430115.416535] Lustre: Skipped 51 previous similar messages [430820.669529] Lustre: MGS: Connection restored to 7285a73f-f154-8347-38a9-d99d2e060024 (at 10.149.1.122@o2ib313) [430820.669535] Lustre: Skipped 411 previous similar messages [431478.329168] Lustre: MGS: Connection restored to 6f1b15de-ce6b-2b46-936d-26fea05aec13 (at 10.151.32.82@o2ib) [431478.329173] Lustre: Skipped 335 previous similar messages [432141.324617] Lustre: MGS: Connection restored to 18ac275c-4cea-c249-5ddf-ed3ca2bed372 (at 10.141.2.0@o2ib417) [432141.324623] Lustre: Skipped 307 previous similar messages [432791.094211] Lustre: MGS: Connection restored to f93ed0ab-4a33-3467-8037-df44553b8314 (at 10.149.9.22@o2ib313) [432791.094217] Lustre: Skipped 403 previous similar messages [433537.665395] Lustre: MGS: Connection restored to 0459cb74-ab0a-c8f3-4585-d7b188fd787a (at 10.151.15.108@o2ib) [433537.665401] Lustre: Skipped 91 previous similar messages [434232.821205] Lustre: MGS: Connection restored to 25982681-ff15-e968-2e48-8282f3570871 (at 10.151.39.110@o2ib) [434232.821211] Lustre: Skipped 53 previous similar messages [434899.691226] Lustre: MGS: Connection restored to c10360c0-4372-bcc5-2b7c-45cd934d9c05 (at 10.151.28.90@o2ib) [434899.691232] Lustre: Skipped 213 previous similar messages [434911.968434] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [434912.001928] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.28.208@o2ib (303): c: 32, oc: 0, rc: 32 [435183.205281] LNet: 4181:0:(o2iblnd_cb.c:2602:kiblnd_passive_connect()) Conn stale 10.151.28.202@o2ib version 12/12 incarnation 1588879414034303/1591117248521459 [435601.660166] Lustre: MGS: Connection restored to 1a162bfd-345b-63c7-53b3-820eead01574 (at 10.149.3.52@o2ib313) [435601.660172] Lustre: Skipped 99 previous similar messages [436209.722775] Lustre: MGS: Connection restored to 83a150ef-597e-44e9-919d-d29628e48d58 (at 10.151.13.65@o2ib) [436209.722781] Lustre: Skipped 25 previous similar messages [436914.260760] Lustre: MGS: Connection restored to c463a1d6-d378-e7f1-9b12-b81fb84ee1d9 (at 10.151.39.114@o2ib) [436914.260766] Lustre: Skipped 135 previous similar messages [437524.217881] Lustre: MGS: Connection restored to 7d5eb50b-31b9-606b-163c-01d254a8f244 (at 10.151.6.112@o2ib) [437524.217887] Lustre: Skipped 343 previous similar messages [438144.345721] Lustre: MGS: Connection restored to 53cd4d34-c799-913d-6a1f-89b1361e11a8 (at 10.151.45.176@o2ib) [438144.345726] Lustre: Skipped 335 previous similar messages [438794.015029] Lustre: MGS: Connection restored to fa8605d1-1fca-91d2-ae8b-ca14b201c3ee (at 10.151.3.70@o2ib) [438794.015035] Lustre: Skipped 43 previous similar messages [439409.964564] Lustre: MGS: Connection restored to 7597a689-1ca8-c821-eb54-2e918a29e0ea (at 10.151.3.149@o2ib) [439409.964570] Lustre: Skipped 297 previous similar messages [440036.479247] Lustre: MGS: Connection restored to 978adafd-41be-5f86-5dce-a5d900bb33ce (at 10.151.19.167@o2ib) [440036.479253] Lustre: Skipped 781 previous similar messages [440835.924656] Lustre: MGS: Connection restored to f66129d5-f427-070b-bf60-568072b303d3 (at 10.151.6.221@o2ib) [440835.924661] Lustre: Skipped 187 previous similar messages [441488.306151] Lustre: MGS: Connection restored to f886de78-c57a-9a5e-06d6-17b2a3694c7e (at 10.151.45.55@o2ib) [441488.306157] Lustre: Skipped 279 previous similar messages [442286.418084] Lustre: MGS: Connection restored to 9558b3c3-1842-d7e1-9f0d-456e260294ef (at 10.151.31.159@o2ib) [442286.418090] Lustre: Skipped 273 previous similar messages [442909.149492] Lustre: MGS: Connection restored to 64993c43-a9f1-097f-68c6-a06dad78885c (at 10.151.11.153@o2ib) [442909.149498] Lustre: Skipped 127 previous similar messages [443515.138490] Lustre: MGS: Connection restored to e64a9065-602f-b113-63e4-4598c4399a89 (at 10.149.1.83@o2ib313) [443515.138496] Lustre: Skipped 55 previous similar messages [444159.712220] Lustre: MGS: Connection restored to e8b9e34f-792c-9257-d8df-50ecde33052e (at 10.149.9.21@o2ib313) [444159.712226] Lustre: Skipped 297 previous similar messages [444775.581455] Lustre: MGS: Connection restored to 341a80d3-c0d7-e3ba-5761-1c14491a5644 (at 10.151.8.13@o2ib) [444775.581460] Lustre: Skipped 77 previous similar messages [445383.514488] Lustre: MGS: Connection restored to a2986d63-5549-8936-30bd-cb623d69ce7f (at 10.151.34.43@o2ib) [445383.514494] Lustre: Skipped 547 previous similar messages [445984.452692] Lustre: MGS: Connection restored to a78ec0c4-fba8-f537-a35a-bfcdb5edc427 (at 10.151.29.129@o2ib) [445984.452698] Lustre: Skipped 261 previous similar messages [446616.443230] Lustre: MGS: Connection restored to 2f5f22b1-1ecf-5940-402b-c21cb2e95691 (at 10.151.50.244@o2ib) [446616.443236] Lustre: Skipped 241 previous similar messages [447218.059119] Lustre: MGS: Connection restored to 12b4444c-58ae-4a21-d659-e97bdef02cec (at 10.151.10.209@o2ib) [447218.059124] Lustre: Skipped 153 previous similar messages [447432.425926] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [447432.459420] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.39.111@o2ib (316): c: 31, oc: 0, rc: 32 [447890.915231] Lustre: MGS: Connection restored to 1a162bfd-345b-63c7-53b3-820eead01574 (at 10.149.3.52@o2ib313) [447890.915237] Lustre: Skipped 263 previous similar messages [448511.570648] Lustre: MGS: Connection restored to 9ac2cfdf-c5f7-297d-4663-266582ea9a19 (at 10.151.36.55@o2ib) [448511.570654] Lustre: Skipped 93 previous similar messages [448562.468211] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [448562.501709] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.24.50@o2ib (272): c: 32, oc: 0, rc: 32 [449172.815605] Lustre: MGS: Connection restored to 0bd9b7ca-0378-55cb-c2a8-81782ffe3c16 (at 10.151.3.50@o2ib) [449172.815611] Lustre: Skipped 211 previous similar messages [449877.731273] Lustre: MGS: Connection restored to 1134ba53-b248-ab85-ab93-cc0036f98eaa (at 10.151.24.13@o2ib) [449877.731279] Lustre: Skipped 171 previous similar messages [450613.758619] Lustre: MGS: Connection restored to 2eaa62ea-aad8-398e-f716-57859c03b481 (at 10.151.56.50@o2ib) [450613.758625] Lustre: Skipped 165 previous similar messages [451308.784911] Lustre: MGS: Connection restored to 424cbdd1-afc8-7ebc-1bce-1c5dc6f7f5bf (at 10.149.3.128@o2ib313) [451308.784916] Lustre: Skipped 143 previous similar messages [452002.065979] Lustre: MGS: Connection restored to 1a162bfd-345b-63c7-53b3-820eead01574 (at 10.149.3.52@o2ib313) [452002.065985] Lustre: Skipped 189 previous similar messages [452607.358346] Lustre: MGS: Connection restored to 64993c43-a9f1-097f-68c6-a06dad78885c (at 10.151.11.153@o2ib) [452607.358352] Lustre: Skipped 131 previous similar messages [453297.134158] Lustre: MGS: Connection restored to d2191f1e-6a9f-75bb-d28d-f3b245f6e7e9 (at 10.149.9.215@o2ib313) [453297.134164] Lustre: Skipped 257 previous similar messages [454371.937510] Lustre: MGS: Connection restored to 0d10fd15-d358-2e2b-e9fb-b35b12f58bc2 (at 10.151.24.22@o2ib) [454371.937515] Lustre: Skipped 109 previous similar messages [455066.050654] Lustre: MGS: Connection restored to caaac0db-d49c-4ac3-bc1d-26a7393f693a (at 10.151.33.135@o2ib) [455066.050660] Lustre: Skipped 147 previous similar messages [455670.333345] Lustre: MGS: Connection restored to 0ff215ad-d6ce-8493-1573-0528507cda94 (at 10.149.1.21@o2ib313) [455670.333351] Lustre: Skipped 173 previous similar messages [456181.839854] Lustre: MGS: haven't heard from client 8a8ffe33-6a12-a61b-2b06-cdc7b119c08c (at 10.149.2.98@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897931841800, cur 1591138292 expire 1591138142 last 1591138065 [456185.836782] Lustre: nbp8-MDT0000: haven't heard from client 6e439042-25dc-6b48-82a6-8710b54f1809 (at 10.149.2.98@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897cfadc7800, cur 1591138296 expire 1591138146 last 1591138069 [456285.752533] Lustre: MGS: Connection restored to 859d4cb8-8600-4a6b-13be-93182d756131 (at 10.151.39.105@o2ib) [456285.752538] Lustre: Skipped 65 previous similar messages [456996.597087] Lustre: MGS: Connection restored to 455ce3dc-68f3-2577-c7dc-71cc2b058469 (at 10.151.8.11@o2ib) [456996.597093] Lustre: Skipped 101 previous similar messages [457625.925493] Lustre: MGS: Connection restored to e9f0adbd-df84-926c-d334-146599e2c671 (at 10.151.32.110@o2ib) [457625.925499] Lustre: Skipped 299 previous similar messages [458233.258643] Lustre: MGS: Connection restored to ae2ce2d7-609c-31c4-7188-49bb27b41c88 (at 10.151.3.45@o2ib) [458233.258649] Lustre: Skipped 99 previous similar messages [459108.267601] Lustre: MGS: Connection restored to 5a13c6d0-75f9-6b5b-deae-ee6a1fc2b5ef (at 10.151.8.45@o2ib) [459108.267607] Lustre: Skipped 27 previous similar messages [460149.174185] Lustre: MGS: Connection restored to 6b653834-384f-464e-186e-64680f354490 (at 10.149.4.149@o2ib313) [460149.174190] Lustre: Skipped 27 previous similar messages [460779.327277] Lustre: MGS: Connection restored to 6ab01443-653b-01d9-c1c1-20ef511a72c7 (at 10.149.3.158@o2ib313) [460779.327282] Lustre: Skipped 81 previous similar messages [461392.103627] Lustre: MGS: Connection restored to 356b75de-9243-965f-27ec-c08f41367e13 (at 10.149.5.111@o2ib313) [461392.103633] Lustre: Skipped 1113 previous similar messages [462174.428345] Lustre: MGS: Connection restored to 6131a211-ee36-c813-eac1-da7320e53d1c (at 10.151.33.100@o2ib) [462174.428350] Lustre: Skipped 19 previous similar messages [462741.074899] Lustre: nbp8-MDT0000: haven't heard from client a1ee1dab-3734-775e-049c-fc19a1437d66 (at 10.149.2.170@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a305ae2000, cur 1591144851 expire 1591144701 last 1591144624 [462911.785553] Lustre: MGS: Connection restored to df4d80c4-7418-0ca0-3c3d-c171bf363a79 (at 10.151.32.98@o2ib) [462911.785559] Lustre: Skipped 185 previous similar messages [463528.636859] Lustre: MGS: Connection restored to 5bd5a562-4cb7-3ce4-cd77-6303857ee2ea (at 10.153.10.199@o2ib233) [463528.636865] Lustre: Skipped 241 previous similar messages [464130.876934] Lustre: MGS: Connection restored to 9d89947a-bc9a-855a-9d1a-e8d9305615fc (at 10.151.35.129@o2ib) [464130.876940] Lustre: Skipped 963 previous similar messages [464830.007016] Lustre: MGS: Connection restored to 1ad03996-bd4d-dc9d-8702-4fa5a72a887b (at 10.149.3.123@o2ib313) [464830.007022] Lustre: Skipped 35 previous similar messages [465440.322727] Lustre: MGS: Connection restored to ff565539-2fee-f192-c387-e9bd9925018e (at 10.141.5.102@o2ib417) [465440.322733] Lustre: Skipped 63 previous similar messages [466103.456880] Lustre: MGS: Connection restored to f8f80a98-e4c4-d556-69f4-4bba068aa2d7 (at 10.151.56.47@o2ib) [466103.456886] Lustre: Skipped 87 previous similar messages [466803.470284] Lustre: MGS: Connection restored to 592543c3-3430-58cf-da91-5079fbc131b5 (at 10.151.29.156@o2ib) [466803.470290] Lustre: Skipped 399 previous similar messages [467622.728263] Lustre: MGS: Connection restored to 8f92048b-6350-0614-6f91-7a36ad2a1415 (at 10.151.30.179@o2ib) [467622.728269] Lustre: Skipped 629 previous similar messages [468249.538978] Lustre: MGS: Connection restored to a8e449d2-41ec-0d29-fd91-b686fceadc20 (at 10.151.17.237@o2ib) [468249.538983] Lustre: Skipped 11 previous similar messages [469038.001867] Lustre: MGS: Connection restored to 701b6b93-53d2-f911-2cf1-18973d89c6c1 (at 10.151.7.205@o2ib) [469038.001873] Lustre: Skipped 195 previous similar messages [469880.729164] Lustre: MGS: Connection restored to 2b4a0e6d-429c-2782-62f0-76edac40e9c7 (at 10.149.2.111@o2ib313) [469880.729170] Lustre: Skipped 95 previous similar messages [470486.266067] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [470486.299564] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [470486.332765] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.18.184@o2ib (228): c: 32, oc: 0, rc: 32 [470486.373679] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [470657.119782] Lustre: MGS: Connection restored to 2ff775bb-92bc-1750-d489-d033f02e6efb (at 10.151.32.14@o2ib) [470657.119788] Lustre: Skipped 137 previous similar messages [470825.279296] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [470825.312783] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.28.221@o2ib (300): c: 32, oc: 0, rc: 32 [471389.199716] Lustre: MGS: Connection restored to 79c977b4-8c7e-c101-53e1-7b11dea4b652 (at 10.151.47.76@o2ib) [471389.199722] Lustre: Skipped 121 previous similar messages [472022.724476] Lustre: MGS: Connection restored to 6ac2ded7-ad05-3d22-1c29-9e4dbd4741d7 (at 10.149.1.127@o2ib313) [472022.724481] Lustre: Skipped 139 previous similar messages [472649.219043] Lustre: MGS: Connection restored to d0cb7e2d-cac9-2336-94d8-40118e7ee9cb (at 10.151.18.184@o2ib) [472649.219049] Lustre: Skipped 109 previous similar messages [473255.456375] Lustre: nbp8-MDT0000: haven't heard from client 979b5209-84c7-0337-5e2d-e7cc5ab80524 (at 10.151.34.178@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897d73bd5000, cur 1591155365 expire 1591155215 last 1591155138 [473255.529055] Lustre: Skipped 1 previous similar message [473339.369268] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [473339.402776] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.34.178@o2ib (310): c: 30, oc: 0, rc: 32 [473341.941490] Lustre: MGS: Connection restored to 87bd60af-c88b-d66a-0c54-baf424c700af (at 10.153.11.189@o2ib233) [473341.941496] Lustre: Skipped 695 previous similar messages [473486.374617] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [473486.408111] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.3.126@o2ib (277): c: 32, oc: 0, rc: 32 [473794.477317] Lustre: MGS: haven't heard from client 53975164-4908-21a0-8275-02794158bc0f (at 10.153.17.81@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897ef0e8cc00, cur 1591155904 expire 1591155754 last 1591155677 [473794.547990] Lustre: Skipped 1 previous similar message [473818.476685] Lustre: nbp8-MDT0000: haven't heard from client dc66a1ed-1ec1-cad5-b921-2d09e3c97579 (at 10.153.17.81@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899ea1a2a800, cur 1591155928 expire 1591155778 last 1591155701 [474057.484326] Lustre: nbp8-MDT0000: haven't heard from client ab55889d-9470-e49f-4094-51277272fb02 (at 10.151.37.49@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8976bb60cc00, cur 1591156167 expire 1591156017 last 1591155940 [474163.399226] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [474163.432729] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.37.49@o2ib (332): c: 30, oc: 0, rc: 32 [474172.691333] Lustre: MGS: Connection restored to 0c8ae8d7-561c-2268-8c3c-b2d3081c9624 (at 10.149.1.3@o2ib313) [474172.691339] Lustre: Skipped 717 previous similar messages [474410.409092] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [474410.442598] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.3.156@o2ib (299): c: 32, oc: 0, rc: 32 [474813.633763] Lustre: MGS: Connection restored to 5a13c6d0-75f9-6b5b-deae-ee6a1fc2b5ef (at 10.151.8.45@o2ib) [474813.633768] Lustre: Skipped 23 previous similar messages [475644.750827] Lustre: MGS: Connection restored to b44e2fe3-dc0b-dd71-1002-cb4e24a719b3 (at 10.151.29.34@o2ib) [475644.750832] Lustre: Skipped 25 previous similar messages [476414.521357] Lustre: MGS: Connection restored to 9c797d4b-29f2-df56-eab1-4d97e5f2943e (at 10.151.37.185@o2ib) [476414.521363] Lustre: Skipped 1 previous similar message [477031.289608] Lustre: MGS: Connection restored to 6da922de-242e-6e03-d6b6-7bf4f884d289 (at 10.151.38.136@o2ib) [477031.289614] Lustre: Skipped 118 previous similar messages [477665.985247] Lustre: MGS: Connection restored to 8e73b6d5-08a3-f106-1f15-b41b52051abd (at 10.153.11.22@o2ib233) [477665.985253] Lustre: Skipped 127 previous similar messages [478293.394988] Lustre: MGS: Connection restored to 61ea0174-0851-5990-e590-0534fdab580d (at 10.151.32.153@o2ib) [478293.394994] Lustre: Skipped 39 previous similar messages [478948.990412] Lustre: MGS: Connection restored to 5e31a993-984d-6cc4-917d-2138326ea7d4 (at 10.153.11.39@o2ib233) [478948.990418] Lustre: Skipped 139 previous similar messages [479587.185860] Lustre: MGS: Connection restored to 6cc3ac70-cdb7-c812-6640-7524f9a1eac5 (at 10.153.10.62@o2ib233) [479587.185866] Lustre: Skipped 271 previous similar messages [480255.443390] Lustre: MGS: Connection restored to be47a464-f5cd-b537-ef4b-4cbec6b02b0c (at 10.151.35.17@o2ib) [480255.443396] Lustre: Skipped 177 previous similar messages [480889.199050] Lustre: MGS: Connection restored to 1a162bfd-345b-63c7-53b3-820eead01574 (at 10.149.3.52@o2ib313) [480889.199056] Lustre: Skipped 87 previous similar messages [481541.912127] Lustre: MGS: Connection restored to df9de228-566d-5477-2810-ac9e072b1938 (at 10.153.10.84@o2ib233) [481541.912133] Lustre: Skipped 91 previous similar messages [482141.986306] Lustre: nbp8-MDT0000: Connection restored to a5000f76-b630-e3e6-937f-89732301666c (at 10.151.35.103@o2ib) [482141.986312] Lustre: Skipped 188 previous similar messages [482743.316627] Lustre: MGS: Connection restored to 930f5ede-fe6a-12fe-be32-239a3c289140 (at 10.151.3.196@o2ib) [482743.316632] Lustre: Skipped 66 previous similar messages [483236.729329] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [483236.762830] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.30.191@o2ib (288): c: 32, oc: 0, rc: 32 [483268.730519] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [483268.764020] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.30.154@o2ib (303): c: 32, oc: 0, rc: 32 [483375.800946] Lustre: MGS: Connection restored to b006db7d-713d-dee8-37a5-b5d5dfba7dad (at 10.151.32.69@o2ib) [483375.800952] Lustre: Skipped 145 previous similar messages [483447.832157] Lustre: MGS: haven't heard from client 5ef1d0a3-29be-1f75-2b70-a3a7f5e631c8 (at 10.153.10.182@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897ec4b23000, cur 1591165557 expire 1591165407 last 1591165330 [483447.903121] Lustre: Skipped 1 previous similar message [483458.826333] Lustre: nbp8-MDT0000: haven't heard from client e7b53f44-1bd0-d618-0860-99fb6b6a3a65 (at 10.153.10.182@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8998d232e800, cur 1591165568 expire 1591165418 last 1591165341 [484432.766909] Lustre: MGS: Connection restored to 5bc28b4a-f5ac-013c-634d-9d90ea19a54f (at 10.151.15.146@o2ib) [484432.766915] Lustre: Skipped 213 previous similar messages [485038.411660] Lustre: MGS: Connection restored to 181fdad7-dd8d-abbd-4070-0842c7beb265 (at 10.151.9.123@o2ib) [485038.411666] Lustre: Skipped 277 previous similar messages [485680.206532] Lustre: MGS: Connection restored to e8f4e8a8-72a7-0b09-4f97-fe721a271298 (at 10.151.7.84@o2ib) [485680.206538] Lustre: Skipped 199 previous similar messages [485913.342829] Process accounting resumed [486450.313135] Lustre: MGS: Connection restored to 9ebf5ef8-244c-ea96-0159-06cbc92082fb (at 10.151.53.178@o2ib) [486450.313141] Lustre: Skipped 1 previous similar message [487163.733148] Lustre: MGS: Connection restored to df3de6b4-299d-1775-20f0-e63f4a41716e (at 10.153.10.16@o2ib233) [487163.733154] Lustre: Skipped 25 previous similar messages [487911.570086] Lustre: MGS: Connection restored to 57985ffa-bada-6230-a700-136a1313c537 (at 10.151.17.98@o2ib) [487911.570092] Lustre: Skipped 1161 previous similar messages [488604.132637] Lustre: MGS: Connection restored to 51aa82c9-6d50-19b4-ae0e-9086e1192334 (at 10.153.10.17@o2ib233) [488604.132643] Lustre: Skipped 371 previous similar messages [489374.426251] Lustre: MGS: Connection restored to 3e1eb6cd-4cbd-66d3-2734-fe45fe2b6cf7 (at 10.149.6.128@o2ib313) [489374.426256] Lustre: Skipped 313 previous similar messages [489986.344211] Lustre: MGS: Connection restored to 5b162dab-10b0-45c3-0269-3a2a876b3b57 (at 10.151.29.94@o2ib) [489986.344217] Lustre: Skipped 245 previous similar messages [490989.102727] Lustre: nbp8-MDT0000: haven't heard from client 128873a9-ac6f-a907-1385-ef8871bfeaa4 (at 10.149.12.57@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899a5ae7c400, cur 1591173098 expire 1591172948 last 1591172871 [490992.104017] Lustre: MGS: haven't heard from client 22410bea-fbb1-ed3a-542e-10054a1ea8e9 (at 10.149.12.57@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897e99ca0000, cur 1591173101 expire 1591172951 last 1591172874 [491002.554598] Lustre: MGS: Connection restored to f64be5fa-326a-83e0-c7d9-3b6777c5aacc (at 10.149.14.157@o2ib313) [491002.554604] Lustre: Skipped 171 previous similar messages [491838.622042] Lustre: MGS: Connection restored to a7af365b-7716-6868-af6b-360e3ab3adb8 (at 10.153.10.175@o2ib233) [491838.622048] Lustre: Skipped 61 previous similar messages [492563.591980] Lustre: MGS: Connection restored to 2c32692c-b89b-f527-478a-3c7d07ecb92e (at 10.149.8.224@o2ib313) [492563.591986] Lustre: Skipped 991 previous similar messages [493234.300708] Lustre: MGS: Connection restored to 1a162bfd-345b-63c7-53b3-820eead01574 (at 10.149.3.52@o2ib313) [493234.300714] Lustre: Skipped 271 previous similar messages [493297.187222] Lustre: nbp8-MDT0000: haven't heard from client ffcd3f9d-70cf-99e3-b35b-795ac063f7ae (at 10.149.12.57@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a3e6ac6800, cur 1591175406 expire 1591175256 last 1591175179 [494627.509219] Lustre: MGS: Connection restored to 81dad6cf-5538-2644-ce50-5f056a93b5aa (at 10.151.37.147@o2ib) [494627.509225] Lustre: Skipped 127 previous similar messages [494792.663674] Lustre: MGS: Connection restored to e7211b94-cb22-a945-ee99-be2dfdf9820c (at 10.149.16.54@o2ib313) [494792.663680] Lustre: Skipped 3 previous similar messages [494952.273956] Lustre: MGS: Connection restored to cbc57cfc-18ab-f00d-9a5f-0c0394a0135d (at 10.151.13.128@o2ib) [494952.273961] Lustre: Skipped 59 previous similar messages [495317.198337] Lustre: MGS: Connection restored to b1790f91-7ad6-bdbb-2016-d990f2efd3c6 (at 10.151.15.203@o2ib) [495317.198343] Lustre: Skipped 101 previous similar messages [496075.833016] Lustre: MGS: Connection restored to cfd65234-a1c6-a41d-a9f5-92798a0b911e (at 10.151.3.35@o2ib) [496075.833022] Lustre: Skipped 255 previous similar messages [497341.817822] Lustre: MGS: Connection restored to 1a162bfd-345b-63c7-53b3-820eead01574 (at 10.149.3.52@o2ib313) [497341.817827] Lustre: Skipped 767 previous similar messages [497432.170141] Lustre: MGS: Connection restored to b930058a-3d9c-8e6e-ac53-15d5bc088355 (at 10.151.33.84@o2ib) [497432.170150] Lustre: Skipped 1 previous similar message [497612.811382] Lustre: MGS: Connection restored to d4a7f315-556b-99db-d3d2-9ba54e2b6bd2 (at 10.151.28.244@o2ib) [497612.811394] Lustre: Skipped 33 previous similar messages [498098.615049] Lustre: MGS: Connection restored to 43244a41-0e26-3af4-e4be-9bb4f39f1557 (at 10.151.3.183@o2ib) [498098.615054] Lustre: Skipped 59 previous similar messages [498698.819259] Lustre: MGS: Connection restored to a499e01d-7352-f71b-8417-950eb6337a07 (at 10.151.18.175@o2ib) [498698.819265] Lustre: Skipped 306 previous similar messages [499337.067586] Lustre: MGS: Connection restored to 4ac36102-330f-cb94-453d-76ebfd2fdc97 (at 10.151.33.61@o2ib) [499337.067591] Lustre: Skipped 330 previous similar messages [499411.410277] Lustre: nbp8-MDT0000: haven't heard from client eaf36be3-6918-f4ba-2aac-4fa5203738e8 (at 10.149.7.101@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897a7061c400, cur 1591181520 expire 1591181370 last 1591181293 [499411.483526] Lustre: Skipped 1 previous similar message [499570.417107] Lustre: nbp8-MDT0000: haven't heard from client 79e90523-0624-8e61-92f2-2d07a4796ad5 (at 10.153.10.89@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8979b43d1400, cur 1591181679 expire 1591181529 last 1591181452 [499570.490353] Lustre: Skipped 1 previous similar message [500229.188354] Lustre: MGS: Connection restored to 604293af-9ea3-ebb3-1e89-16ea97e80431 (at 10.149.14.2@o2ib313) [500229.188360] Lustre: Skipped 47 previous similar messages [501176.992895] Lustre: MGS: Connection restored to ecce0121-d1eb-dc13-6ad3-1e09eee96902 (at 10.151.56.123@o2ib) [501176.992901] Lustre: Skipped 491 previous similar messages [501966.997458] Lustre: MGS: Connection restored to 9185efc9-a617-8e04-5abb-906c3759006b (at 10.151.57.233@o2ib) [501966.997464] Lustre: Skipped 129 previous similar messages [502602.111153] Lustre: MGS: Connection restored to d7d2a04f-9038-7bdd-ea8b-fcf31f6494e7 (at 10.151.12.210@o2ib) [502602.111158] Lustre: Skipped 1 previous similar message [503285.074298] Lustre: MGS: Connection restored to 08be1e8b-5ac9-9570-50cf-feb920093f88 (at 10.149.11.183@o2ib313) [503285.074303] Lustre: Skipped 157 previous similar messages [503928.527409] Lustre: MGS: Connection restored to e154ec6d-9a5f-79db-1831-5e3efeb7449e (at 10.153.10.10@o2ib233) [503928.527415] Lustre: Skipped 83 previous similar messages [504680.797985] Lustre: MGS: Connection restored to 8d35d970-3ef3-fdf7-17a3-4863cf85dc28 (at 10.151.8.111@o2ib) [504680.797990] Lustre: Skipped 793 previous similar messages [504996.614876] Lustre: nbp8-MDT0000: haven't heard from client cc93b295-58eb-4d77-638f-c5a4dbe3dd21 (at 10.141.6.244@o2ib417) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a262ef9000, cur 1591187105 expire 1591186955 last 1591186878 [504996.688129] Lustre: Skipped 1 previous similar message [505370.853202] Lustre: MGS: Connection restored to 752a5f3f-6951-4226-e2d9-83fbdc4a19a4 (at 10.141.5.185@o2ib417) [505370.853209] Lustre: Skipped 557 previous similar messages [506348.790020] Lustre: MGS: Connection restored to 5568d9d0-945f-0025-6e3e-92dd3513213d (at 10.141.6.35@o2ib417) [506348.790026] Lustre: Skipped 31 previous similar messages [507040.022994] Lustre: MGS: Connection restored to bdad8c8e-e2b6-a64d-dba6-5c06a799fc2e (at 10.151.50.57@o2ib) [507040.023000] Lustre: Skipped 103 previous similar messages [507282.697617] Lustre: nbp8-MDT0000: haven't heard from client 170aa36b-dd28-3a95-07c3-3c54a63f396d (at 10.151.6.160@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8974943eb800, cur 1591189391 expire 1591189241 last 1591189164 [507282.770015] Lustre: Skipped 3 previous similar messages [507361.610962] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [507361.644463] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.1.211@o2ib (302): c: 30, oc: 0, rc: 32 [507368.611318] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [507368.644816] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [507368.678019] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.3.118@o2ib (311): c: 30, oc: 0, rc: 32 [507368.718654] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [507370.612392] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [507370.645892] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [507370.679096] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.6.162@o2ib (314): c: 30, oc: 0, rc: 32 [507370.719725] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [507374.611543] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [507374.645041] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.6.170@o2ib (317): c: 30, oc: 0, rc: 32 [507379.611727] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [507379.645193] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 5 previous similar messages [507379.678677] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.3.140@o2ib (323): c: 30, oc: 0, rc: 32 [507379.719306] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 5 previous similar messages [507388.612060] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [507388.645559] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 6 previous similar messages [507388.679045] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.5.50@o2ib (329): c: 30, oc: 0, rc: 32 [507388.719395] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 6 previous similar messages [507405.612561] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [507405.646076] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 11 previous similar messages [507405.679850] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.6.231@o2ib (347): c: 30, oc: 0, rc: 32 [507405.720483] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 11 previous similar messages [507801.294772] Lustre: MGS: Connection restored to bf5cc7c2-11dc-5a3a-f17d-e126bbb7b48c (at 10.149.1.227@o2ib313) [507801.294778] Lustre: Skipped 313 previous similar messages [508411.248769] Lustre: MGS: Connection restored to 20e3e02b-e529-da98-9dfc-47930544d4d8 (at 10.151.12.107@o2ib) [508411.248775] Lustre: Skipped 93 previous similar messages [509022.511095] Lustre: MGS: Connection restored to 94a340ad-d6a7-4657-c76e-539adc70c31a (at 10.151.35.118@o2ib) [509022.511101] Lustre: Skipped 189 previous similar messages [509659.025510] Lustre: MGS: Connection restored to 1a162bfd-345b-63c7-53b3-820eead01574 (at 10.149.3.52@o2ib313) [509659.025515] Lustre: Skipped 1153 previous similar messages [510280.816050] Lustre: MGS: Connection restored to c56fab8b-5ca6-f403-05ba-32cf9611bf80 (at 10.151.46.21@o2ib) [510280.816056] Lustre: Skipped 89 previous similar messages [510938.096089] Lustre: MGS: Connection restored to 52a3b80c-758b-2878-d26b-037ebb23efda (at 10.151.4.72@o2ib) [510938.096094] Lustre: Skipped 145 previous similar messages [511593.671492] Lustre: MGS: Connection restored to c92768bf-630d-f9ca-d6e2-fbc3352df78d (at 10.153.12.151@o2ib233) [511593.671498] Lustre: Skipped 81 previous similar messages [512274.833722] Lustre: MGS: Connection restored to 04c77973-ece1-ffd6-49be-803a47b9b9ea (at 10.151.46.52@o2ib) [512274.833728] Lustre: Skipped 255 previous similar messages [512958.465387] Lustre: MGS: Connection restored to 781aa790-f031-de61-e376-f489a75d8eed (at 10.151.54.118@o2ib) [512958.465393] Lustre: Skipped 655 previous similar messages [513673.082973] Lustre: MGS: Connection restored to 1e6ad426-296f-f3a7-79ae-2d3ff733d65c (at 10.153.15.170@o2ib233) [513673.082978] Lustre: Skipped 279 previous similar messages [514285.886166] Lustre: MGS: Connection restored to 9b1093c2-681f-9b98-477d-8c4cac071790 (at 10.151.3.123@o2ib) [514285.886172] Lustre: Skipped 559 previous similar messages [514907.368657] Lustre: MGS: Connection restored to aa43a43c-4304-a3ed-f630-1361a160a7f0 (at 10.151.42.162@o2ib) [514907.368663] Lustre: Skipped 65 previous similar messages [515555.467756] Lustre: MGS: Connection restored to a4d15844-4c67-1d70-5198-028f5b441294 (at 10.151.3.84@o2ib) [515555.467761] Lustre: Skipped 495 previous similar messages [516223.339149] Lustre: MGS: Connection restored to 104a08eb-78d7-fdd4-57c6-e261bbb3a21c (at 10.153.11.23@o2ib233) [516223.339155] Lustre: Skipped 345 previous similar messages [516756.953838] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [516756.987324] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 6 previous similar messages [516757.020819] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.19.152@o2ib (304): c: 32, oc: 0, rc: 32 [516757.061732] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 6 previous similar messages [516852.122989] Lustre: MGS: Connection restored to cd66dc97-b25f-efcf-2d6c-4b8753350246 (at 10.153.11.51@o2ib233) [516852.122996] Lustre: Skipped 387 previous similar messages [517527.329279] Lustre: MGS: Connection restored to d438212c-8644-db3a-1343-796e51aa7d04 (at 10.151.54.119@o2ib) [517527.329285] Lustre: Skipped 77 previous similar messages [518168.259437] Lustre: MGS: Connection restored to c29e18ee-a942-3f85-6a5f-c944b9a24719 (at 10.151.39.11@o2ib) [518168.259443] Lustre: Skipped 9 previous similar messages [518960.154919] Lustre: MGS: Connection restored to 8771b9c7-1275-f6b6-a329-8430873a9cec (at 10.151.3.198@o2ib) [518960.154924] Lustre: Skipped 35 previous similar messages [519761.631849] Lustre: MGS: Connection restored to 15f7fb9d-d5cc-b843-e91e-519630276498 (at 10.151.57.113@o2ib) [519761.631855] Lustre: Skipped 343 previous similar messages [520373.499361] Lustre: MGS: Connection restored to e391c71f-f17d-e8ab-24a3-b23b7558dd79 (at 10.151.37.136@o2ib) [520373.499367] Lustre: Skipped 73 previous similar messages [520987.290059] Lustre: MGS: Connection restored to 8182afeb-81ed-6f83-8a6d-2424f46f9f24 (at 10.141.6.86@o2ib417) [520987.290065] Lustre: Skipped 1681 previous similar messages [521625.199439] Lustre: MGS: Connection restored to d8887404-9482-f94b-819e-8ef5d90d1df0 (at 10.151.38.156@o2ib) [521625.199444] Lustre: Skipped 57 previous similar messages [521895.141688] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [521895.175145] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [521895.208345] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.28.246@o2ib (218): c: 32, oc: 0, rc: 32 [521895.249258] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [521896.141820] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [521896.175321] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.28.249@o2ib (210): c: 32, oc: 0, rc: 32 [522299.246151] Lustre: MGS: Connection restored to a586f68b-67a7-1987-a05c-95c381984f93 (at 10.141.6.161@o2ib417) [522299.246156] Lustre: Skipped 369 previous similar messages [522454.162234] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [522454.195720] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.16.195@o2ib (281): c: 32, oc: 0, rc: 32 [522949.599192] Lustre: MGS: Connection restored to 55ea70e4-281d-85f0-d11c-d53b0e4f3d30 (at 10.151.36.188@o2ib) [522949.599198] Lustre: Skipped 257 previous similar messages [523562.557935] Lustre: MGS: Connection restored to 074e2a5c-dc61-0286-6d72-0fac885e1253 (at 10.141.7.54@o2ib417) [523562.557941] Lustre: Skipped 173 previous similar messages [523615.204588] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [523615.238091] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.15.146@o2ib (302): c: 32, oc: 0, rc: 32 [523647.205796] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [523647.239289] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.15.109@o2ib (303): c: 32, oc: 0, rc: 32 [524260.714041] Lustre: MGS: Connection restored to b244d0c6-c4db-7aee-c8c9-2e3cc3a1a2af (at 10.151.52.22@o2ib) [524260.714046] Lustre: Skipped 57 previous similar messages [524782.247274] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [524782.280751] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.17.49@o2ib (296): c: 32, oc: 0, rc: 32 [524935.217023] Lustre: MGS: Connection restored to 789c1fc6-a260-39a7-797f-a7c00e343f6f (at 10.151.36.217@o2ib) [524935.217029] Lustre: Skipped 543 previous similar messages [525064.348815] Lustre: MGS: haven't heard from client a027e453-9aee-d9e5-fc27-8e17a6e61f69 (at 10.153.12.197@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897d01d70c00, cur 1591207172 expire 1591207022 last 1591206945 [525064.419777] Lustre: Skipped 73 previous similar messages [525514.364088] Lustre: nbp8-MDT0000: haven't heard from client e2bd0ad2-4927-db13-ed54-881e42279beb (at 10.153.10.80@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8978ffb28000, cur 1591207622 expire 1591207472 last 1591207395 [525514.437336] Lustre: Skipped 1 previous similar message [525590.366857] Lustre: nbp8-MDT0000: haven't heard from client 9c129f43-6c14-1288-46f2-e1febee67237 (at 10.153.11.13@o2ib233) in 180 seconds. I think it's dead, and I am evicting it. exp ffff8999ebedb800, cur 1591207698 expire 1591207548 last 1591207518 [525590.440111] Lustre: Skipped 1 previous similar message [525606.723646] Lustre: MGS: Connection restored to 81059b6a-4c64-beeb-5353-d8dd06497790 (at 10.153.12.197@o2ib233) [525606.723655] Lustre: Skipped 139 previous similar messages [526340.794713] Lustre: MGS: Connection restored to 63e018ec-dad2-a1c2-463e-aee3a1eae544 (at 10.151.46.103@o2ib) [526340.794719] Lustre: Skipped 589 previous similar messages [526966.666174] Lustre: MGS: Connection restored to d525d984-c92b-e5f9-646d-3d7a7d72f272 (at 10.141.5.103@o2ib417) [526966.666180] Lustre: Skipped 55 previous similar messages [527672.386357] Lustre: MGS: Connection restored to 39dcff46-1e66-73f7-4e80-145b44c091ea (at 10.151.7.169@o2ib) [527672.386362] Lustre: Skipped 79 previous similar messages [528341.572927] Lustre: MGS: Connection restored to 280cc562-9abc-99d6-29d8-ff98418cc387 (at 10.149.10.80@o2ib313) [528341.572932] Lustre: Skipped 95 previous similar messages [528982.939111] Lustre: MGS: Connection restored to f08fecf0-02d2-6abf-ce93-b95c4d243c8d (at 10.151.29.139@o2ib) [528982.939117] Lustre: Skipped 75 previous similar messages [529729.518125] Lustre: nbp8-MDT0000: haven't heard from client 9146b1a2-7e9f-7a1b-3eee-258c660f04d1 (at 10.151.19.134@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899b84615c00, cur 1591211837 expire 1591211687 last 1591211610 [529729.590804] Lustre: Skipped 1 previous similar message [529730.428252] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [529730.461748] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.19.142@o2ib (208): c: 32, oc: 0, rc: 32 [529734.508193] Lustre: MGS: Connection restored to 8ed87ef7-7205-8a2f-86d1-8c5702d47622 (at 10.151.7.140@o2ib) [529734.508199] Lustre: Skipped 213 previous similar messages [529736.428493] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [529736.461990] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.19.154@o2ib (299): c: 32, oc: 0, rc: 32 [529743.428611] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [529743.462119] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.19.167@o2ib (303): c: 32, oc: 0, rc: 32 [529756.429184] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [529756.462677] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.0.210@o2ib (284): c: 32, oc: 0, rc: 32 [529785.430142] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [529785.463637] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.19.150@o2ib (303): c: 32, oc: 0, rc: 32 [529806.430908] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [529806.464399] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.0.208@o2ib (303): c: 32, oc: 0, rc: 32 [529827.432799] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [529827.466295] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.19.134@o2ib (324): c: 30, oc: 0, rc: 32 [530346.163184] Lustre: MGS: Connection restored to 61a4cef9-b12e-4d27-cf7f-cebdc87fd7ba (at 10.141.3.90@o2ib417) [530346.163190] Lustre: Skipped 125 previous similar messages [530947.520922] Lustre: MGS: Connection restored to 2482cf75-76ad-b172-82fd-719add69f1b0 (at 10.151.10.221@o2ib) [530947.520928] Lustre: Skipped 37 previous similar messages [531601.816313] Lustre: MGS: Connection restored to 0100a26a-ac59-c806-ad87-4490ea4abf15 (at 10.151.32.95@o2ib) [531601.816319] Lustre: Skipped 679 previous similar messages [532242.168064] Lustre: MGS: Connection restored to c6c5a086-c838-fb8f-84c9-bafd1fc80838 (at 10.151.9.146@o2ib) [532242.168069] Lustre: Skipped 117 previous similar messages [532987.046765] Lustre: MGS: Connection restored to 343c3d2c-411a-63e6-3370-d13cf1510aee (at 10.151.0.70@o2ib) [532987.046771] Lustre: Skipped 615 previous similar messages [533741.766531] Lustre: MGS: Connection restored to cf2676ba-e2eb-648d-9a5e-b72990b2fa58 (at 10.151.14.13@o2ib) [533741.766537] Lustre: Skipped 93 previous similar messages [534373.075140] Lustre: MGS: Connection restored to 1a162bfd-345b-63c7-53b3-820eead01574 (at 10.149.3.52@o2ib313) [534373.075146] Lustre: Skipped 145 previous similar messages [535104.444306] Lustre: MGS: Connection restored to 2ec8060f-6e0b-63ce-f183-f0b409adfa2c (at 10.151.9.45@o2ib) [535104.444312] Lustre: Skipped 91 previous similar messages [535781.072439] Lustre: MGS: Connection restored to bfc06ab4-f0bf-ed0f-c177-da2925f5d579 (at 10.151.38.16@o2ib) [535781.072445] Lustre: Skipped 53 previous similar messages [537112.409326] Lustre: MGS: Connection restored to 026d7d72-c302-e6d1-c0e4-a9247777c99d (at 10.149.3.221@o2ib313) [537112.409331] Lustre: Skipped 189 previous similar messages [537188.090423] Lustre: MGS: Connection restored to 1a1bdf34-d511-7c1f-9e96-156c9a8b7588 (at 10.151.33.23@o2ib) [537188.090429] Lustre: Skipped 45 previous similar messages [537445.275453] Lustre: MGS: Connection restored to 06ca9284-da8c-feaf-8520-b650ed3d379a (at 10.151.8.35@o2ib) [537445.275458] Lustre: Skipped 59 previous similar messages [537759.930457] Lustre: MGS: Connection restored to de844f28-b79e-7e43-f2d1-b2515fa4ef2b (at 10.151.8.235@o2ib) [537759.930463] Lustre: Skipped 29 previous similar messages [538361.883014] Lustre: MGS: Connection restored to 389245ba-42ab-947f-d925-8334404ed62e (at 10.151.30.164@o2ib) [538361.883020] Lustre: Skipped 97 previous similar messages [539094.623226] Lustre: MGS: Connection restored to 78f22f6c-b13a-d96a-afa1-66d75553e939 (at 10.151.4.40@o2ib) [539094.623231] Lustre: Skipped 251 previous similar messages [539996.912940] Lustre: MGS: Connection restored to 8f1f6b54-faf6-7c78-98ed-6f6c4c2a9900 (at 10.151.3.224@o2ib) [539996.912946] Lustre: Skipped 23 previous similar messages [540444.819918] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [540444.853418] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.38.142@o2ib (289): c: 32, oc: 0, rc: 32 [540452.820117] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [540452.853623] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.38.157@o2ib (301): c: 32, oc: 0, rc: 32 [540604.571977] Lustre: MGS: Connection restored to 2a8d2439-c44e-aec6-2cd2-bba9e6549b2f (at 10.151.36.118@o2ib) [540604.571983] Lustre: Skipped 115 previous similar messages [540840.834315] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [540840.867810] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [540840.901013] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.2.49@o2ib (222): c: 32, oc: 0, rc: 32 [540840.941360] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [541322.310515] Lustre: MGS: Connection restored to a7d64437-ecbb-242f-b3ec-58bcb0880f55 (at 10.151.32.84@o2ib) [541322.310520] Lustre: Skipped 259 previous similar messages [541924.674511] Lustre: MGS: Connection restored to 3b6c8c77-7195-4bd5-3ad6-7a6b2df31aef (at 10.153.13.43@o2ib233) [541924.674516] Lustre: Skipped 223 previous similar messages [542570.124859] Lustre: MGS: Connection restored to 1a162bfd-345b-63c7-53b3-820eead01574 (at 10.149.3.52@o2ib313) [542570.124864] Lustre: Skipped 1701 previous similar messages [542900.909820] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [542900.943316] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.5.170@o2ib (303): c: 32, oc: 0, rc: 32 [542935.910986] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [542935.944481] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.9.23@o2ib (303): c: 32, oc: 0, rc: 32 [543105.917205] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [543105.950705] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.8.114@o2ib (290): c: 32, oc: 0, rc: 32 [543183.419750] Lustre: MGS: Connection restored to 1323301a-730b-3fbd-7f61-acda8cc44155 (at 10.151.8.114@o2ib) [543183.419759] Lustre: Skipped 293 previous similar messages [543431.930229] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [543431.963740] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.47.75@o2ib (303): c: 32, oc: 0, rc: 32 [543783.035281] Lustre: nbp8-MDT0000: haven't heard from client 94a5a65d-9f2f-ac0c-7df6-a92bccf65535 (at 10.153.10.224@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89815ba6cc00, cur 1591225890 expire 1591225740 last 1591225663 [543783.108814] Lustre: Skipped 1 previous similar message [543896.542946] Lustre: MGS: Connection restored to dd957d2e-4fa2-e4b8-dc08-08927fd612cb (at 10.141.5.127@o2ib417) [543896.542951] Lustre: Skipped 119 previous similar messages [544506.275587] Lustre: MGS: Connection restored to 0806563e-47e9-bb0e-3012-cafd5d9c0ce4 (at 10.151.56.22@o2ib) [544506.275593] Lustre: Skipped 103 previous similar messages [545167.295258] Lustre: MGS: Connection restored to 9d4a8184-1cb9-4257-92ed-396943bc918c (at 10.151.44.21@o2ib) [545167.295263] Lustre: Skipped 299 previous similar messages [545889.427455] Lustre: MGS: Connection restored to e3d0963f-9bc0-9974-07b3-fa0f7712ba5a (at 10.151.14.86@o2ib) [545889.427460] Lustre: Skipped 209 previous similar messages [546694.871998] Lustre: MGS: Connection restored to 1a162bfd-345b-63c7-53b3-820eead01574 (at 10.149.3.52@o2ib313) [546694.872004] Lustre: Skipped 249 previous similar messages [547622.091125] Lustre: MGS: Connection restored to cef012f0-1a32-80ec-024a-0496ce3dc750 (at 10.151.32.70@o2ib) [547622.091130] Lustre: Skipped 149 previous similar messages [548440.660377] Lustre: MGS: Connection restored to 30aa3224-2587-e785-2268-30465e6d1028 (at 10.151.6.74@o2ib) [548440.660383] Lustre: Skipped 327 previous similar messages [549219.017512] Lustre: MGS: Connection restored to ec7e1b1a-2490-0435-2913-86f004cd4d5e (at 10.151.18.111@o2ib) [549219.017517] Lustre: Skipped 369 previous similar messages [549873.839767] Lustre: MGS: Connection restored to 58244d36-3e85-4029-f8b9-5cf941ad3acd (at 10.151.36.179@o2ib) [549873.839773] Lustre: Skipped 59 previous similar messages [550761.513835] Lustre: MGS: Connection restored to 041ecd90-5245-4515-2fa8-a752500f34ba (at 10.149.16.41@o2ib313) [550761.513841] Lustre: Skipped 281 previous similar messages [551597.935760] Lustre: MGS: Connection restored to 4daa1e44-d5d3-2154-d3a2-c05e55fce4e8 (at 10.149.11.158@o2ib313) [551597.935766] Lustre: Skipped 99 previous similar messages [552400.724533] Lustre: MGS: Connection restored to b3dfe7b8-9b3b-b0f5-9e39-e84d2dba53ff (at 10.151.34.144@o2ib) [552400.724538] Lustre: Skipped 139 previous similar messages [553091.511377] Lustre: MGS: Connection restored to 05d9f6ce-f065-77d6-95a8-8b646e73a08d (at 10.151.14.198@o2ib) [553091.511383] Lustre: Skipped 187 previous similar messages [553190.375126] Lustre: nbp8-MDT0000: haven't heard from client 6307a400-265a-38ff-6458-3f5f2db75c5f (at 10.153.10.80@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8974c6e18c00, cur 1591235297 expire 1591235147 last 1591235070 [553190.448383] Lustre: Skipped 1 previous similar message [553191.378376] Lustre: MGS: haven't heard from client 23727530-0022-dfb3-145b-fd1894d2b27c (at 10.153.10.80@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89743fe8f400, cur 1591235298 expire 1591235148 last 1591235071 [553706.049869] Lustre: MGS: Connection restored to 36db3890-6bec-71ee-2445-a06d59727b10 (at 10.151.24.69@o2ib) [553706.049875] Lustre: Skipped 99 previous similar messages [553778.397853] Lustre: MGS: haven't heard from client e9c0265f-79c6-6c90-0876-ce9b602168d7 (at 10.153.10.80@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8974b3e90800, cur 1591235885 expire 1591235735 last 1591235658 [553790.399450] Lustre: nbp8-MDT0000: haven't heard from client 67dd1d8e-0a7b-a057-28d7-e067bc66eb77 (at 10.153.10.80@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8998b874fc00, cur 1591235897 expire 1591235747 last 1591235670 [554320.085833] Lustre: MGS: Connection restored to e1544100-30cf-7b5f-cdbe-167759d2882b (at 10.151.44.74@o2ib) [554320.085838] Lustre: Skipped 53 previous similar messages [554357.418659] Lustre: nbp8-MDT0000: haven't heard from client 98196858-3919-f349-7001-f717355a7d67 (at 10.153.10.80@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899b34f60800, cur 1591236464 expire 1591236314 last 1591236237 [554635.431563] Lustre: MGS: haven't heard from client 17b7454f-4573-a42a-fe25-fa1343e1981c (at 10.153.10.80@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89796c640c00, cur 1591236742 expire 1591236592 last 1591236515 [554635.502239] Lustre: Skipped 1 previous similar message [554967.167437] Lustre: MGS: Connection restored to aa82aa1b-d594-a0f4-f7df-97f1fea0f71e (at 10.153.10.80@o2ib233) [554967.167443] Lustre: Skipped 147 previous similar messages [555229.452372] Lustre: MGS: haven't heard from client 0c97f4cd-b9d7-7bb9-ec42-f455a9b5c502 (at 10.153.10.80@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897bfc766000, cur 1591237336 expire 1591237186 last 1591237109 [555229.523055] Lustre: Skipped 1 previous similar message [555509.465387] Lustre: MGS: haven't heard from client 88c7feb1-02d7-10d7-e4ff-5241c4641468 (at 10.153.10.80@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897ef0ffa400, cur 1591237616 expire 1591237466 last 1591237389 [555509.536103] Lustre: Skipped 1 previous similar message [555773.985379] Lustre: MGS: Connection restored to 56c061d5-db64-26e8-439f-4edf23d92a90 (at 10.149.3.65@o2ib313) [555773.985385] Lustre: Skipped 149 previous similar messages [556386.083125] Lustre: MGS: Connection restored to edba9f51-cf87-3622-27cf-2856ae5b2587 (at 10.151.3.51@o2ib) [556386.083131] Lustre: Skipped 335 previous similar messages [556757.505036] Lustre: MGS: haven't heard from client c1620913-011a-03f2-cc31-f0d2d141f80f (at 10.153.10.80@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8976b030f400, cur 1591238864 expire 1591238714 last 1591238637 [556757.575713] Lustre: Skipped 1 previous similar message [556775.508015] Lustre: nbp8-MDT0000: haven't heard from client 32a2e980-d4ea-b8e3-600b-89ab66f4a3da (at 10.153.10.80@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a3c561e800, cur 1591238882 expire 1591238732 last 1591238655 [557069.000138] Lustre: MGS: Connection restored to 762cadd2-31e0-412e-0549-fbf663d40bbc (at 10.141.5.129@o2ib417) [557069.000143] Lustre: Skipped 583 previous similar messages [557345.528229] Lustre: MGS: haven't heard from client 75b21178-7636-890d-f5e7-1fca76524cfe (at 10.153.10.80@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897ad56a0400, cur 1591239452 expire 1591239302 last 1591239225 [557626.544027] Lustre: MGS: haven't heard from client dc315182-6b25-de27-e5fc-5dbcadf401d4 (at 10.153.10.93@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897accbecc00, cur 1591239733 expire 1591239583 last 1591239506 [557626.614716] Lustre: Skipped 1 previous similar message [557858.152560] Lustre: MGS: Connection restored to 481bd6cc-3df1-0543-f566-6edebc2162d4 (at 10.151.56.35@o2ib) [557858.152565] Lustre: Skipped 239 previous similar messages [558237.566662] Lustre: MGS: haven't heard from client a3e8fcbf-a986-1216-9e72-d436ca4eed7d (at 10.153.10.93@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8973b52e9c00, cur 1591240344 expire 1591240194 last 1591240117 [558237.637361] Lustre: Skipped 1 previous similar message [558258.250105] LNet: 24769:0:(o2iblnd_cb.c:2602:kiblnd_passive_connect()) Conn stale 10.151.27.18@o2ib version 12/12 incarnation 1590086743063283/1591240360579877 [558462.713518] Lustre: MGS: Connection restored to 604293af-9ea3-ebb3-1e89-16ea97e80431 (at 10.149.14.2@o2ib313) [558462.713523] Lustre: Skipped 223 previous similar messages [558683.575087] Lustre: nbp8-MDT0000: haven't heard from client d960681e-9392-3d25-536d-f68b19bf1823 (at 10.151.47.33@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a321f65400, cur 1591240790 expire 1591240640 last 1591240563 [558683.647486] Lustre: Skipped 3 previous similar messages [558763.489142] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [558763.522635] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.47.33@o2ib (306): c: 30, oc: 0, rc: 32 [558796.490350] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [558796.523856] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.11.23@o2ib (335): c: 30, oc: 0, rc: 32 [559203.595390] Lustre: nbp8-MDT0000: haven't heard from client 6efa30be-6abc-9da2-d914-954d16b1f4c1 (at 10.153.10.30@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899e0f328000, cur 1591241310 expire 1591241160 last 1591241083 [559203.668656] Lustre: Skipped 3 previous similar messages [559306.180251] Lustre: MGS: Connection restored to 6967304c-0a5c-4cc9-2aaf-9a2f70760631 (at 10.151.47.38@o2ib) [559306.180257] Lustre: Skipped 11 previous similar messages [559927.087751] Lustre: MGS: Connection restored to 2331a165-6364-a9ca-8de7-212d145b8fa8 (at 10.151.56.51@o2ib) [559927.087756] Lustre: Skipped 327 previous similar messages [560000.625710] Lustre: MGS: haven't heard from client 2ce0cffe-6b31-c87b-f87b-57aad6b08ff2 (at 10.153.17.81@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8973b52ef000, cur 1591242107 expire 1591241957 last 1591241880 [560000.696398] Lustre: Skipped 3 previous similar messages [560012.624970] LustreError: 8631:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) ### client (nid 10.153.17.81@o2ib233) failed to reply to blocking AST (req@ffff897338f65100 x1667957308475328 status 0 rc -5), evict it ns: mdt-nbp8-MDT0000_UUID lock: ffff89a323af72c0/0xa22cee32f6f8cedc lrc: 5/0,0 mode: PR/PR res: [0x36062a07b:0x734f:0x0].0x0 bits 0x13/0x0 rrc: 8 type: IBT flags: 0x70200400000020 nid: 10.153.17.81@o2ib233 remote: 0x4caf938430e8c99d expref: 16 pid: 12654 timeout: 560187 lvb_type: 0 [560012.767778] LustreError: 138-a: nbp8-MDT0000: A client on nid 10.153.17.81@o2ib233 was evicted due to a lock blocking callback time out: rc -5 [560646.007057] Lustre: MGS: Connection restored to cac98f4f-09fd-ce73-9452-5acc8f8da83a (at 10.151.56.23@o2ib) [560646.007062] Lustre: Skipped 37 previous similar messages [561311.560350] Lustre: MGS: Connection restored to 6e355c65-649e-7c51-17e3-e00877d65351 (at 10.153.17.81@o2ib233) [561311.560356] Lustre: Skipped 61 previous similar messages [561905.692793] Lustre: nbp8-MDT0000: haven't heard from client 54a1b4a2-069b-2a40-6269-ba91447edcd3 (at 10.153.17.232@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897ea0157000, cur 1591244012 expire 1591243862 last 1591243785 [561905.766351] Lustre: Skipped 1 previous similar message [562255.455330] Lustre: MGS: Connection restored to 7a144ea3-0fd7-0def-e614-0ee83e5cacf6 (at 10.153.18.154@o2ib233) [562255.455335] Lustre: Skipped 59 previous similar messages [562884.800285] Lustre: MGS: Connection restored to c3e41625-7382-ced5-cd88-3ba3c8b9a5f1 (at 10.151.32.22@o2ib) [562884.800290] Lustre: Skipped 237 previous similar messages [562966.731835] Lustre: nbp8-MDT0000: haven't heard from client 5a32dde3-78a4-8a64-6a6e-59e498f1ef6c (at 10.151.46.148@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a321f63400, cur 1591245073 expire 1591244923 last 1591244846 [562966.804522] Lustre: Skipped 1 previous similar message [563086.646947] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [563086.680440] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.46.148@o2ib (346): c: 30, oc: 0, rc: 32 [563088.646908] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [563088.680411] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.46.151@o2ib (342): c: 30, oc: 0, rc: 32 [564431.217028] Lustre: MGS: Connection restored to 12a4908d-852d-963b-4c17-cc86ac14128d (at 10.151.43.33@o2ib) [564431.217033] Lustre: Skipped 727 previous similar messages [564763.829314] Lustre: MGS: Connection restored to 40a4b46b-60f8-9a97-9d3c-7b1cdd7c48be (at 10.151.6.168@o2ib) [564763.829320] Lustre: Skipped 1 previous similar message [564951.119054] Lustre: MGS: Connection restored to 7baee6d1-245b-4784-0b93-fdbb9a836064 (at 10.153.18.155@o2ib233) [564951.119059] Lustre: Skipped 265 previous similar messages [565307.035161] Lustre: MGS: Connection restored to fbaa5620-a95f-2065-f92d-26e56d041e4c (at 10.151.12.115@o2ib) [565307.035167] Lustre: Skipped 131 previous similar messages [565921.437535] Lustre: MGS: Connection restored to 6052daa6-f317-9f2c-9b8f-74f9f4f1cb20 (at 10.151.47.81@o2ib) [565921.437541] Lustre: Skipped 97 previous similar messages [566587.972397] Lustre: MGS: Connection restored to f38ea47f-0668-637e-62bf-a5ae8505d788 (at 10.151.34.116@o2ib) [566587.972403] Lustre: Skipped 209 previous similar messages [567277.094055] Lustre: MGS: Connection restored to 46b088bd-a5ed-a9c2-cdc2-0cc7b43ce3da (at 10.153.10.197@o2ib233) [567277.094060] Lustre: Skipped 437 previous similar messages [568049.012551] Lustre: MGS: Connection restored to dcfd30a8-e97a-7b26-69eb-93abd5f05e2a (at 10.151.44.111@o2ib) [568049.012558] Lustre: Skipped 145 previous similar messages [568628.938638] Lustre: MGS: haven't heard from client c577880a-ef10-cdbc-3f42-f3c6488ddfcc (at 10.153.10.75@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8979b8ff8c00, cur 1591250735 expire 1591250585 last 1591250508 [568629.009341] Lustre: Skipped 5 previous similar messages [568673.523134] Lustre: MGS: Connection restored to f2dbfe86-725f-3d26-3e81-5cc75f88dcf4 (at 10.153.10.75@o2ib233) [568673.523139] Lustre: Skipped 135 previous similar messages [568704.941148] Lustre: nbp8-MDT0000: haven't heard from client 068d5fb6-4ca6-2dbe-7f82-fb486c543c56 (at 10.153.10.30@o2ib233) in 189 seconds. I think it's dead, and I am evicting it. exp ffff899778f08000, cur 1591250811 expire 1591250661 last 1591250622 [568705.014421] Lustre: Skipped 1 previous similar message [569323.975091] Lustre: MGS: haven't heard from client 3f51a56e-df6d-1bad-ea4c-288bde147f35 (at 10.153.10.30@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8974c4fc1400, cur 1591251430 expire 1591251280 last 1591251203 [569324.045784] Lustre: Skipped 1 previous similar message [569334.965203] Lustre: nbp8-MDT0000: haven't heard from client dc0d2990-1bfd-91af-48a8-9e338b993b27 (at 10.153.10.30@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8973b722d400, cur 1591251441 expire 1591251291 last 1591251214 [569355.678736] Lustre: MGS: Connection restored to 170e97a9-e559-32e1-fc3a-facb4786cfa4 (at 10.153.10.30@o2ib233) [569355.678742] Lustre: Skipped 125 previous similar messages [569617.975919] Lustre: MGS: haven't heard from client 6a62650d-d63b-1e37-e3e3-4c63cb13f2c2 (at 10.153.10.30@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897c5e55c400, cur 1591251724 expire 1591251574 last 1591251497 [569628.976235] Lustre: nbp8-MDT0000: haven't heard from client 90f8164a-bee1-47c4-8be9-633bba83d7dc (at 10.153.10.30@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897c5e559800, cur 1591251735 expire 1591251585 last 1591251508 [569984.476726] Lustre: MGS: Connection restored to bdefe572-df66-c975-cf2e-796dda2bde4a (at 10.151.0.43@o2ib) [569984.476732] Lustre: Skipped 87 previous similar messages [570684.019510] Lustre: MGS: haven't heard from client c553d250-752a-d493-cc24-0aae743f01a8 (at 10.153.10.75@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897f24c15400, cur 1591252790 expire 1591252640 last 1591252563 [570686.245325] Lustre: MGS: Connection restored to dde6e29f-9dc7-ae35-ad80-7cb81a8cfa99 (at 10.151.3.173@o2ib) [570686.245331] Lustre: Skipped 371 previous similar messages [570710.014943] Lustre: nbp8-MDT0000: haven't heard from client e0c74106-74c5-2f3b-6b6b-6df87bdb484c (at 10.153.10.75@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8998cf211400, cur 1591252816 expire 1591252666 last 1591252589 [571152.033274] Lustre: MGS: haven't heard from client 2a705aca-ed3e-03e7-ba52-67e0bc25cfd6 (at 10.153.10.75@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899791f2b400, cur 1591253258 expire 1591253108 last 1591253031 [571187.033678] Lustre: nbp8-MDT0000: haven't heard from client 36a29ae1-9b79-4b51-8929-620fa18c23f1 (at 10.153.10.75@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8998f53b7c00, cur 1591253293 expire 1591253143 last 1591253066 [571303.199847] Lustre: MGS: Connection restored to d438212c-8644-db3a-1343-796e51aa7d04 (at 10.151.54.119@o2ib) [571303.199853] Lustre: Skipped 327 previous similar messages [571490.047217] Lustre: MGS: haven't heard from client 750b7348-cefa-6c80-f8cf-a60beefad547 (at 10.153.10.75@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897713a5fc00, cur 1591253596 expire 1591253446 last 1591253369 [571506.047024] Lustre: nbp8-MDT0000: haven't heard from client 55421818-40cc-006e-c80b-58df46de3d83 (at 10.153.10.75@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a215eab400, cur 1591253612 expire 1591253462 last 1591253385 [571984.540590] Lustre: MGS: Connection restored to 8034f406-27b7-344e-a455-5a83386ec884 (at 10.151.51.90@o2ib) [571984.540596] Lustre: Skipped 201 previous similar messages [572315.846387] Process accounting resumed [572664.169531] Lustre: MGS: Connection restored to 4dc185d8-e55d-27e8-b5e8-f283640f3cf0 (at 10.151.44.87@o2ib) [572664.169537] Lustre: Skipped 791 previous similar messages [573306.914517] Lustre: MGS: Connection restored to ac720340-4e75-42b4-29c4-52a67b441a7b (at 10.141.3.91@o2ib417) [573306.914523] Lustre: Skipped 11 previous similar messages [574143.960617] Lustre: MGS: Connection restored to 15ed3306-ed93-d6ab-ebbe-87098322fe69 (at 10.151.47.74@o2ib) [574143.960624] Lustre: Skipped 619 previous similar messages [574816.779266] Lustre: MGS: Connection restored to e3ad2374-3eae-ac19-ab2c-d144c1aaac75 (at 10.141.3.40@o2ib417) [574816.779272] Lustre: Skipped 177 previous similar messages [576237.716040] Lustre: MGS: Connection restored to 6388e37a-8f8c-c2ae-c61b-4d071358c379 (at 10.151.32.244@o2ib) [576237.716046] Lustre: Skipped 59 previous similar messages [576322.088417] Lustre: MGS: Connection restored to 1e6fdcda-4e0c-fcea-acf9-ffaa892e8793 (at 10.153.12.61@o2ib233) [576322.088423] Lustre: Skipped 25 previous similar messages [576490.311386] Lustre: MGS: Connection restored to 0f1c9f85-b0aa-8463-a012-18f9065e3a86 (at 10.149.3.72@o2ib313) [576490.311391] Lustre: Skipped 9 previous similar messages [576583.231968] Lustre: MGS: haven't heard from client c830ad13-6ce1-8822-0bbb-e9ad4595e1a6 (at 10.153.10.75@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897509f5dc00, cur 1591258689 expire 1591258539 last 1591258462 [576586.230251] Lustre: nbp8-MDT0000: haven't heard from client 05ddbe79-625f-74db-8632-1c582af75c2f (at 10.153.10.75@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89996ef52c00, cur 1591258692 expire 1591258542 last 1591258465 [576843.011726] Lustre: MGS: Connection restored to 6f23c984-fc40-e305-97b1-b4286c4c87e9 (at 10.149.10.27@o2ib313) [576843.011732] Lustre: Skipped 51 previous similar messages [576950.246425] Lustre: MGS: haven't heard from client 55798fca-4117-b705-1fa6-fa926d5e2ca5 (at 10.153.10.75@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89791839f000, cur 1591259056 expire 1591258906 last 1591258829 [576978.245228] Lustre: nbp8-MDT0000: haven't heard from client 330c42c3-e4cf-e066-0cc3-a9d4b9ba3e52 (at 10.153.10.75@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8999aae17000, cur 1591259084 expire 1591258934 last 1591258857 [577292.280883] Lustre: MGS: haven't heard from client ffcdf6fb-a5ad-86f1-5b02-f1d5e72eb396 (at 10.153.10.75@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8974cd0c9400, cur 1591259398 expire 1591259248 last 1591259171 [577320.259231] LustreError: 8585:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) ### client (nid 10.153.10.75@o2ib233) failed to reply to blocking AST (req@ffff8973d5ab4380 x1667957467884928 status 0 rc -5), evict it ns: mdt-nbp8-MDT0000_UUID lock: ffff899be7734380/0xa22cee3340f01f08 lrc: 4/0,0 mode: PR/PR res: [0x36088c9df:0x3bc9:0x0].0x0 bits 0x13/0x0 rrc: 5 type: IBT flags: 0x60200400000020 nid: 10.153.10.75@o2ib233 remote: 0x8706aaa75b745098 expref: 357 pid: 8582 timeout: 577626 lvb_type: 0 [577320.402000] LustreError: 138-a: nbp8-MDT0000: A client on nid 10.153.10.75@o2ib233 was evicted due to a lock blocking callback time out: rc -5 [577453.920348] Lustre: MGS: Connection restored to b528722f-a2e9-a80d-9285-4377955381a3 (at 10.153.16.162@o2ib233) [577453.920353] Lustre: Skipped 347 previous similar messages [578135.766920] Lustre: MGS: Connection restored to 046535ed-06e8-41cf-690d-3e80cd89f93b (at 10.151.10.18@o2ib) [578135.766927] Lustre: Skipped 281 previous similar messages [578889.798335] Lustre: MGS: Connection restored to 623222d1-65dd-39ac-515d-3159d1ac7cf8 (at 10.151.28.201@o2ib) [578889.798340] Lustre: Skipped 237 previous similar messages [578980.321633] Lustre: MGS: haven't heard from client f12f298a-b321-f958-9a4d-e779c48722e2 (at 10.153.10.75@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89750a6db800, cur 1591261086 expire 1591260936 last 1591260859 [578980.392304] Lustre: Skipped 1 previous similar message [578995.321392] Lustre: nbp8-MDT0000: haven't heard from client 8e53ed4b-9ada-3869-cba8-b9c4a2b10503 (at 10.153.10.75@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a2d7e17000, cur 1591261101 expire 1591260951 last 1591260874 [578995.394670] LustreError: 10528:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) ### client (nid 10.153.10.75@o2ib233) failed to reply to blocking AST (req@ffff8973d3f42d00 x1667957487804288 status 0 rc -5), evict it ns: mdt-nbp8-MDT0000_UUID lock: ffff89a338a0a1c0/0xa22cee3346ceae1e lrc: 4/0,0 mode: PR/PR res: [0x36088c9df:0x3be3:0x0].0x0 bits 0x13/0x0 rrc: 6 type: IBT flags: 0x60200400000020 nid: 10.153.10.75@o2ib233 remote: 0x6726d9a1f0acecf3 expref: 365 pid: 12658 timeout: 579189 lvb_type: 0 [578995.538044] LustreError: 138-a: nbp8-MDT0000: A client on nid 10.153.10.75@o2ib233 was evicted due to a lock blocking callback time out: rc -5 [579517.709810] Lustre: MGS: Connection restored to ebaba53d-6bc9-5584-14ee-ea956b72fd4e (at 10.151.5.190@o2ib) [579517.709815] Lustre: Skipped 999 previous similar messages [579634.342001] Lustre: MGS: haven't heard from client 73e90dd6-b5fa-cbd0-24a7-fee5f54f5a3c (at 10.153.10.30@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897aa8363800, cur 1591261740 expire 1591261590 last 1591261513 [579653.343436] Lustre: nbp8-MDT0000: haven't heard from client f0c5fb63-16ef-191a-2f9e-77436d06aed2 (at 10.153.10.30@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899a83250000, cur 1591261759 expire 1591261609 last 1591261532 [580135.778757] Lustre: MGS: Connection restored to 5497f614-e0c7-4cb7-4c6d-ecc09162bbfb (at 10.151.35.53@o2ib) [580135.778763] Lustre: Skipped 197 previous similar messages [580301.369898] Lustre: MGS: haven't heard from client 6a08f11d-7048-2db9-3bc5-fb36a2d17959 (at 10.153.10.30@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff898343b05c00, cur 1591262407 expire 1591262257 last 1591262180 [580324.366808] Lustre: nbp8-MDT0000: haven't heard from client fee8ea2d-930a-72f1-c1cf-ddea3b7193f1 (at 10.153.10.30@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899eecbea000, cur 1591262430 expire 1591262280 last 1591262203 [580544.376188] Lustre: nbp8-MDT0000: haven't heard from client 6fed901a-f7e2-0332-bf85-f260018e6445 (at 10.153.10.160@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8974cc094000, cur 1591262650 expire 1591262500 last 1591262423 [580759.153250] Lustre: MGS: Connection restored to 3bdc5c29-5893-25c4-fbbf-9281a99b5db1 (at 10.151.33.53@o2ib) [580759.153255] Lustre: Skipped 41 previous similar messages [581432.820499] Lustre: MGS: Connection restored to c4466eb0-8609-6fb8-6bda-2d1708e1467f (at 10.151.35.68@o2ib) [581432.820505] Lustre: Skipped 95 previous similar messages [582071.910960] Lustre: MGS: Connection restored to e093138f-3cb2-c2da-1092-853ea0dcb2ec (at 10.151.37.106@o2ib) [582071.910966] Lustre: Skipped 17 previous similar messages [582737.680606] Lustre: MGS: Connection restored to 88f28a01-f699-6262-95a3-7222eaf27527 (at 10.151.9.225@o2ib) [582737.680612] Lustre: Skipped 13 previous similar messages [583341.136171] Lustre: MGS: Connection restored to 42e0ca7b-887c-2f86-e4ec-95b6531f8de7 (at 10.151.36.115@o2ib) [583341.136177] Lustre: Skipped 45 previous similar messages [583943.448994] Lustre: MGS: Connection restored to c553762c-4c5a-0ce4-f772-5ec5b149f0c9 (at 10.151.31.91@o2ib) [583943.448998] Lustre: Skipped 173 previous similar messages [584564.967674] Lustre: MGS: Connection restored to a74e42a8-9980-c508-f86c-d77ea0b46d1f (at 10.151.31.83@o2ib) [584564.967680] Lustre: Skipped 37 previous similar messages [585290.216097] Lustre: MGS: Connection restored to 095ceee9-5142-4a64-d4e7-d7d415974416 (at 10.151.35.11@o2ib) [585290.216103] Lustre: Skipped 419 previous similar messages [585933.725925] Lustre: MGS: Connection restored to 9164e415-b487-714e-e726-286b75e2d2b4 (at 10.151.33.127@o2ib) [585933.725934] Lustre: Skipped 59 previous similar messages [586556.571711] Lustre: MGS: Connection restored to 9662cb32-c9a8-476c-8f71-f49d0d19fd09 (at 10.151.33.128@o2ib) [586556.571717] Lustre: Skipped 41 previous similar messages [587213.563125] Lustre: MGS: Connection restored to ac720340-4e75-42b4-29c4-52a67b441a7b (at 10.141.3.91@o2ib417) [587213.563131] Lustre: Skipped 7 previous similar messages [587889.633289] Lustre: MGS: Connection restored to a78c09bf-393d-8f6d-e893-943157a65419 (at 10.149.10.26@o2ib313) [587889.633294] Lustre: Skipped 217 previous similar messages [588343.570314] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [588343.603830] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [588343.637032] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.12.177@o2ib (304): c: 32, oc: 0, rc: 32 [588343.677945] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [588504.121424] Lustre: MGS: Connection restored to 0e8d7337-649e-af76-e03a-af274d8d9e18 (at 10.151.39.103@o2ib) [588504.121430] Lustre: Skipped 299 previous similar messages [589342.177789] Lustre: MGS: Connection restored to c2e1c1d9-cdaa-7b89-5a9e-51cdb6494edb (at 10.149.1.0@o2ib313) [589342.177795] Lustre: Skipped 95 previous similar messages [590216.411543] Lustre: MGS: Connection restored to 9682da21-6b3a-7e6e-34d3-d0d530047ad3 (at 10.149.15.236@o2ib313) [590216.411549] Lustre: Skipped 81 previous similar messages [591030.257910] Lustre: MGS: Connection restored to 3cd332b8-6053-10d0-ebe0-14aa486aa344 (at 10.141.6.212@o2ib417) [591030.257915] Lustre: Skipped 69 previous similar messages [591743.030649] Lustre: MGS: Connection restored to 5cd1a5dc-48f4-e840-4551-d307c95444b0 (at 10.151.15.51@o2ib) [591743.030654] Lustre: Skipped 3 previous similar messages [592421.324623] Lustre: MGS: Connection restored to e1a6f3a0-209b-7479-fec2-b557eef3d659 (at 10.151.30.127@o2ib) [592421.324629] Lustre: Skipped 371 previous similar messages [593861.011284] Lustre: MGS: Connection restored to ed07eb39-6762-be17-4826-391c5b94a912 (at 10.151.35.48@o2ib) [593861.011289] Lustre: Skipped 5 previous similar messages [593936.456715] Lustre: MGS: Connection restored to e15be1e2-76bf-e47b-21f6-342557c76ef8 (at 10.151.16.161@o2ib) [593936.456721] Lustre: Skipped 19 previous similar messages [594244.379736] Lustre: MGS: Connection restored to 634d19d8-6ff2-df26-10be-4a9f1344c909 (at 10.151.33.182@o2ib) [594244.379742] Lustre: Skipped 121 previous similar messages [594546.453151] Lustre: MGS: Connection restored to 39ff2546-d110-6fab-3018-2ec9997e7bc3 (at 10.151.6.108@o2ib) [594546.453157] Lustre: Skipped 23 previous similar messages [595154.629450] Lustre: MGS: Connection restored to 7c9d6508-5157-532a-f793-a9a748e4116f (at 10.149.5.21@o2ib313) [595154.629456] Lustre: Skipped 95 previous similar messages [595792.412246] Lustre: MGS: Connection restored to 1162fd07-48cd-0df7-41fb-02e11989f83c (at 10.149.5.84@o2ib313) [595792.412251] Lustre: Skipped 273 previous similar messages [596540.569684] Lustre: MGS: Connection restored to b64a6e54-5e57-42e9-e7d6-c73db3e858e4 (at 10.151.35.180@o2ib) [596540.569690] Lustre: Skipped 45 previous similar messages [597254.588925] Lustre: MGS: Connection restored to 2ff775bb-92bc-1750-d489-d033f02e6efb (at 10.151.32.14@o2ib) [597254.588930] Lustre: Skipped 717 previous similar messages [598283.211281] Lustre: MGS: Connection restored to c8f90a6f-5248-c5ca-366c-f552e7361c2f (at 10.151.12.41@o2ib) [598283.211287] Lustre: Skipped 43 previous similar messages [598357.025970] Lustre: nbp8-MDT0000: haven't heard from client e2d1966c-ca79-17d8-976b-712c28bf8fda (at 10.153.13.68@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89976fb3d400, cur 1591280462 expire 1591280312 last 1591280235 [598357.099217] Lustre: Skipped 1 previous similar message [599268.865209] Lustre: MGS: Connection restored to e26dae3a-2220-e8c1-4a20-8e7617487250 (at 10.141.6.3@o2ib417) [599268.865214] Lustre: Skipped 173 previous similar messages [599944.376139] Lustre: MGS: Connection restored to 49221635-5f6e-a02e-d61c-c39e909c2ff4 (at 10.151.3.180@o2ib) [599944.376144] Lustre: Skipped 91 previous similar messages [600613.841553] Lustre: MGS: Connection restored to 5d992a23-e0dc-a68c-1c8e-c93bc2f31bd2 (at 10.141.7.51@o2ib417) [600613.841559] Lustre: Skipped 125 previous similar messages [601298.337966] Lustre: MGS: Connection restored to 809c3385-f20d-6475-f670-b4cad1625503 (at 10.149.5.73@o2ib313) [601298.337971] Lustre: Skipped 57 previous similar messages [601993.229559] Lustre: MGS: Connection restored to 028189f9-ba28-7d1d-c7b4-1b6f7029d2b9 (at 10.153.11.69@o2ib233) [601993.229564] Lustre: Skipped 369 previous similar messages [602632.466877] Lustre: MGS: Connection restored to 3ad76de0-76f7-df74-fe5e-e0487723f39d (at 10.151.37.142@o2ib) [602632.466883] Lustre: Skipped 1505 previous similar messages [603267.143442] Lustre: MGS: Connection restored to bba11470-af22-c198-0387-41079e843567 (at 10.151.14.79@o2ib) [603267.143447] Lustre: Skipped 21 previous similar messages [603870.731100] Lustre: MGS: Connection restored to 18e6390f-fe3c-ce17-c28b-49a3eec528f0 (at 10.151.4.45@o2ib) [603870.731106] Lustre: Skipped 355 previous similar messages [604587.254870] Lustre: MGS: Connection restored to e8f4e8a8-72a7-0b09-4f97-fe721a271298 (at 10.151.7.84@o2ib) [604587.254874] Lustre: Skipped 211 previous similar messages [605532.826404] Lustre: MGS: Connection restored to 1ce2b5fd-6c31-7d7c-0a7a-a345ae115bc6 (at 10.151.42.148@o2ib) [605532.826410] Lustre: Skipped 165 previous similar messages [606145.160376] Lustre: MGS: Connection restored to e5b49446-80b5-2721-f289-a546ace86de2 (at 10.149.2.128@o2ib313) [606145.160381] Lustre: Skipped 355 previous similar messages [606894.528227] Lustre: MGS: Connection restored to a69b6b8c-942a-f015-5d31-da18ef969e1b (at 10.151.35.117@o2ib) [606894.528232] Lustre: Skipped 37 previous similar messages [607136.258023] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [607136.291523] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.53.97@o2ib (292): c: 32, oc: 0, rc: 32 [607879.190607] Lustre: MGS: Connection restored to 3435fb52-109a-f1f6-2c2b-1b5d27a2f86a (at 10.141.3.139@o2ib417) [607879.190612] Lustre: Skipped 163 previous similar messages [608748.607865] Lustre: MGS: Connection restored to 9b1093c2-681f-9b98-477d-8c4cac071790 (at 10.151.3.123@o2ib) [608748.607871] Lustre: Skipped 103 previous similar messages [609380.074195] Lustre: MGS: Connection restored to 77da37a1-42ce-36f4-b72d-bfc801af5a73 (at 10.151.55.177@o2ib) [609380.074201] Lustre: Skipped 131 previous similar messages [610012.853799] Lustre: MGS: Connection restored to 21743b95-8427-c45a-65f9-463166849806 (at 10.141.7.55@o2ib417) [610012.853805] Lustre: Skipped 87 previous similar messages [610645.801616] Lustre: MGS: Connection restored to 4bfbe119-1d86-ba48-8af9-0db5e0a8a5b2 (at 10.151.19.132@o2ib) [610645.801621] Lustre: Skipped 123 previous similar messages [611300.418645] Lustre: MGS: Connection restored to 181fdad7-dd8d-abbd-4070-0842c7beb265 (at 10.151.9.123@o2ib) [611300.418651] Lustre: Skipped 145 previous similar messages [612135.533798] Lustre: MGS: Connection restored to b8e2a3ab-8ef7-74b7-cf09-b205f410abc7 (at 10.151.3.32@o2ib) [612135.533803] Lustre: Skipped 751 previous similar messages [612751.346658] Lustre: MGS: Connection restored to b8bf1995-9f90-a76c-5b3a-2bef751021fe (at 10.149.1.23@o2ib313) [612751.346663] Lustre: Skipped 403 previous similar messages [612906.556456] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899ab6f14800, cur 1591295011 expire 1591294861 last 1591294784 [612906.629703] Lustre: Skipped 1 previous similar message [613487.371841] Lustre: MGS: Connection restored to b43e91f2-57a2-a235-8e90-4f8685b33a9f (at 10.151.37.55@o2ib) [613487.371847] Lustre: Skipped 173 previous similar messages [613945.505113] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [613945.538615] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.33.251@o2ib (291): c: 32, oc: 0, rc: 32 [614095.854114] Lustre: MGS: Connection restored to 81df8a59-e8bc-88a0-d563-cb62440a832e (at 10.149.14.149@o2ib313) [614095.854120] Lustre: Skipped 67 previous similar messages [615010.021252] Lustre: MGS: Connection restored to ad7eca1f-951c-3cd4-68c0-c353555c68f9 (at 10.151.44.26@o2ib) [615010.021258] Lustre: Skipped 147 previous similar messages [615089.144450] LustreError: 36632:0:(ldlm_lockd.c:2324:ldlm_cancel_handler()) ldlm_cancel from 10.149.2.243@o2ib313 arrived at 1591297193 with bad export cookie 11685977039089765997 [615315.644970] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899aaeb54400, cur 1591297420 expire 1591297270 last 1591297193 [615315.718222] Lustre: Skipped 1 previous similar message [615657.656716] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a001652400, cur 1591297762 expire 1591297612 last 1591297535 [615659.983385] Lustre: MGS: Connection restored to 20e9db7b-014b-2be6-813e-af44f81b8750 (at 10.151.36.232@o2ib) [615659.983391] Lustre: Skipped 389 previous similar messages [616258.679834] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a39d478000, cur 1591298363 expire 1591298213 last 1591298136 [616330.665845] Lustre: MGS: Connection restored to d9b0ad0e-e7d2-581c-ca70-7907a3c33dae (at 10.141.3.55@o2ib417) [616330.665852] Lustre: Skipped 186 previous similar messages [616859.701371] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899857bc4000, cur 1591298964 expire 1591298814 last 1591298737 [617195.587260] Lustre: MGS: Connection restored to 6d07393f-a52b-dacb-b89a-e2725b4c2cf0 (at 10.151.46.148@o2ib) [617195.587265] Lustre: Skipped 44 previous similar messages [617252.227075] LNet: 39032:0:(o2iblnd_cb.c:2602:kiblnd_passive_connect()) Conn stale 10.151.8.97@o2ib version 12/12 incarnation 1588912996227712/1591299344417522 [617253.739372] LustreError: 8604:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) ### client (nid 10.151.8.97@o2ib) returned error from blocking AST (req@ffff899b85b65e80 x1667957738244544 status -107 rc -107), evict it ns: mdt-nbp8-MDT0000_UUID lock: ffff8998c46c2d00/0xa22cee33d7b12f7a lrc: 4/0,0 mode: PR/PR res: [0x3608a6ea5:0xe703:0x0].0x0 bits 0x13/0x0 rrc: 13 type: IBT flags: 0x60200400000020 nid: 10.151.8.97@o2ib remote: 0xf76bed38166012d3 expref: 42 pid: 8594 timeout: 617642 lvb_type: 0 [617253.881578] LustreError: 138-a: nbp8-MDT0000: A client on nid 10.151.8.97@o2ib was evicted due to a lock blocking callback time out: rc -107 [617253.923384] LustreError: 5711:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 0s: evicting client at 10.151.8.97@o2ib ns: mdt-nbp8-MDT0000_UUID lock: ffff8998c46c2d00/0xa22cee33d7b12f7a lrc: 3/0,0 mode: PR/PR res: [0x3608a6ea5:0xe703:0x0].0x0 bits 0x13/0x0 rrc: 13 type: IBT flags: 0x60200400000020 nid: 10.151.8.97@o2ib remote: 0xf76bed38166012d3 expref: 43 pid: 8594 timeout: 0 lvb_type: 0 [617302.717627] Lustre: MGS: haven't heard from client 9cdc624d-b528-4ace-f0ff-621d637846f6 (at 10.151.8.102@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8973e8b72800, cur 1591299407 expire 1591299257 last 1591299180 [617315.717956] Lustre: nbp8-MDT0000: haven't heard from client 99578121-4cc1-b639-8df0-cd0e011d860f (at 10.151.8.113@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a403274400, cur 1591299420 expire 1591299270 last 1591299193 [617315.790351] Lustre: Skipped 9 previous similar messages [617391.720885] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 158 seconds. I think it's dead, and I am evicting it. exp ffff89a07bad9800, cur 1591299496 expire 1591299346 last 1591299338 [617391.794138] Lustre: Skipped 8 previous similar messages [617392.630800] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [617392.664294] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.8.111@o2ib (302): c: 30, oc: 0, rc: 32 [617393.630821] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [617393.664316] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.8.113@o2ib (303): c: 30, oc: 0, rc: 32 [617395.630896] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [617395.664393] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [617395.697589] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.8.117@o2ib (306): c: 30, oc: 0, rc: 32 [617395.738218] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [617414.632673] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [617414.666175] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.8.156@o2ib (324): c: 30, oc: 0, rc: 32 [617438.632467] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [617438.665954] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.8.102@o2ib (348): c: 30, oc: 0, rc: 32 [617832.816956] Lustre: MGS: Connection restored to 5b4d7143-8c7d-0803-ef20-4b2b6251b523 (at 10.149.16.17@o2ib313) [617832.816962] Lustre: Skipped 78 previous similar messages [618061.745290] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899bd5747000, cur 1591300166 expire 1591300016 last 1591299939 [618436.421048] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [618436.421054] Lustre: Skipped 170 previous similar messages [618662.767946] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8974acf73800, cur 1591300767 expire 1591300617 last 1591300540 [619037.563586] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [619037.563592] Lustre: Skipped 284 previous similar messages [619254.698751] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [619254.732258] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 3 previous similar messages [619254.765753] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.9.44@o2ib (293): c: 32, oc: 0, rc: 32 [619254.806103] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 3 previous similar messages [619263.788225] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897347e77400, cur 1591301368 expire 1591301218 last 1591301141 [619468.797900] Lustre: nbp8-MDT0000: haven't heard from client 46d254f7-b7de-944a-5be6-76c8935a01b9 (at 10.151.40.242@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897d8dfe1400, cur 1591301573 expire 1591301423 last 1591301346 [619544.799349] Lustre: nbp8-MDT0000: haven't heard from client 7407c9b7-f40c-0b64-bc46-9f0bd2f9488a (at 10.151.39.214@o2ib) in 217 seconds. I think it's dead, and I am evicting it. exp ffff89a083ffc400, cur 1591301649 expire 1591301499 last 1591301432 [619544.872041] Lustre: Skipped 9 previous similar messages [619571.710313] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [619571.743800] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.40.216@o2ib (303): c: 32, oc: 0, rc: 32 [619577.710657] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [619577.744152] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.38.236@o2ib (306): c: 31, oc: 0, rc: 32 [619584.710818] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [619584.744319] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.40.242@o2ib (342): c: 30, oc: 0, rc: 32 [619629.712449] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [619629.745949] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.38.238@o2ib (338): c: 30, oc: 0, rc: 32 [619638.703588] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [619638.703595] Lustre: Skipped 650 previous similar messages [619680.714388] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [619680.747894] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 4 previous similar messages [619680.781383] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.38.240@o2ib (322): c: 30, oc: 0, rc: 32 [619680.822304] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 4 previous similar messages [619864.811907] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89778db47800, cur 1591301969 expire 1591301819 last 1591301742 [619864.885154] Lustre: Skipped 7 previous similar messages [620036.727280] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [620036.760782] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [620036.793982] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.19.159@o2ib (275): c: 32, oc: 0, rc: 32 [620036.834903] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [620239.845023] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [620239.845027] Lustre: Skipped 206 previous similar messages [620466.833166] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899fc3e22800, cur 1591302571 expire 1591302421 last 1591302344 [620840.983070] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [620840.983075] Lustre: Skipped 114 previous similar messages [621067.855477] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89737c3e2000, cur 1591303172 expire 1591303022 last 1591302945 [621442.124768] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [621442.124773] Lustre: Skipped 256 previous similar messages [621668.876284] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8973c4668400, cur 1591303773 expire 1591303623 last 1591303546 [622043.266336] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [622043.266341] Lustre: Skipped 62 previous similar messages [622269.899109] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a3f56a5c00, cur 1591304374 expire 1591304224 last 1591304147 [622644.407051] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [622644.407055] Lustre: Skipped 62 previous similar messages [622754.826397] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [622754.859897] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.44.205@o2ib (303): c: 32, oc: 0, rc: 32 [622870.921572] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899cd6b86400, cur 1591304975 expire 1591304825 last 1591304748 [623245.551810] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [623245.551815] Lustre: Skipped 168 previous similar messages [623256.844689] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [623256.878185] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [623256.911386] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.42.206@o2ib (298): c: 32, oc: 0, rc: 32 [623256.952308] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [623471.942737] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897dfad2a800, cur 1591305576 expire 1591305426 last 1591305349 [623534.854815] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 19 seconds [623534.888609] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.28.205@o2ib (258): c: 30, oc: 0, rc: 32 [623585.856754] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [623585.890246] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.28.207@o2ib (316): c: 30, oc: 0, rc: 32 [623846.720286] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [623846.720291] Lustre: Skipped 280 previous similar messages [624030.872859] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [624030.906360] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.29.234@o2ib (326): c: 30, oc: 0, rc: 32 [624072.963801] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a2d87e8400, cur 1591306177 expire 1591306027 last 1591305950 [624073.037050] Lustre: Skipped 8 previous similar messages [624447.837955] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [624447.837960] Lustre: Skipped 178 previous similar messages [624577.892830] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [624577.926327] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [624577.959541] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.29.218@o2ib (309): c: 31, oc: 0, rc: 32 [624578.000463] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [624673.985422] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8998b8a23400, cur 1591306778 expire 1591306628 last 1591306551 [624674.058673] Lustre: Skipped 2 previous similar messages [625048.980623] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [625048.980628] Lustre: Skipped 164 previous similar messages [625275.008099] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a083ff9c00, cur 1591307379 expire 1591307229 last 1591307152 [625635.931229] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [625635.964730] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.37.184@o2ib (236): c: 32, oc: 0, rc: 32 [625650.126839] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [625650.126845] Lustre: Skipped 192 previous similar messages [625877.030173] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899958ba4800, cur 1591307981 expire 1591307831 last 1591307754 [626251.270712] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [626251.270717] Lustre: Skipped 348 previous similar messages [626478.052543] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899f58658000, cur 1591308582 expire 1591308432 last 1591308355 [626852.412897] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [626852.412902] Lustre: Skipped 300 previous similar messages [627079.072949] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899b843cec00, cur 1591309183 expire 1591309033 last 1591308956 [627453.557753] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [627453.557759] Lustre: Skipped 366 previous similar messages [627680.094590] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8973b4fa3000, cur 1591309784 expire 1591309634 last 1591309557 [628055.030450] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [628055.030456] Lustre: Skipped 296 previous similar messages [628281.117513] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8973b5723000, cur 1591310385 expire 1591310235 last 1591310158 [628281.190767] Lustre: Skipped 2 previous similar messages [628655.848063] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [628655.848068] Lustre: Skipped 52 previous similar messages [628882.139509] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8973aee1b000, cur 1591310986 expire 1591310836 last 1591310759 [629243.063532] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [629243.097034] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.28.108@o2ib (266): c: 32, oc: 0, rc: 32 [629256.989109] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [629256.989114] Lustre: Skipped 198 previous similar messages [629483.160155] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8999aaac1000, cur 1591311587 expire 1591311437 last 1591311360 [629857.437987] Lustre: nbp8-MDT0000: Connection restored to e2586261-acff-9602-5f39-3158476f0d9e (at 10.151.31.74@o2ib) [629857.437992] Lustre: Skipped 177 previous similar messages [630084.182098] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897503fe9000, cur 1591312188 expire 1591312038 last 1591311961 [630459.273724] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [630459.273728] Lustre: Skipped 57 previous similar messages [630686.204513] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8998ccae0c00, cur 1591312790 expire 1591312640 last 1591312563 [631060.414870] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [631060.414875] Lustre: Skipped 62 previous similar messages [631287.226774] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899b7e691c00, cur 1591313391 expire 1591313241 last 1591313164 [631661.556502] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [631661.556507] Lustre: Skipped 30 previous similar messages [631888.248691] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899f92fae000, cur 1591313992 expire 1591313842 last 1591313765 [632262.702678] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [632262.702683] Lustre: Skipped 86 previous similar messages [632489.270425] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8981593eec00, cur 1591314593 expire 1591314443 last 1591314366 [632863.840579] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [632863.840583] Lustre: Skipped 30 previous similar messages [633090.292221] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899a6daf8400, cur 1591315194 expire 1591315044 last 1591314967 [633464.985273] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [633464.985278] Lustre: Skipped 66 previous similar messages [633627.222854] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [633627.256350] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.17.77@o2ib (275): c: 32, oc: 0, rc: 32 [633691.313455] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a215eafc00, cur 1591315795 expire 1591315645 last 1591315568 [634066.129946] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [634066.129951] Lustre: Skipped 180 previous similar messages [634292.335998] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a403668000, cur 1591316396 expire 1591316246 last 1591316169 [634667.272867] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [634667.272872] Lustre: Skipped 26 previous similar messages [634893.358096] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899f95299400, cur 1591316997 expire 1591316847 last 1591316770 [635268.695519] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [635268.695524] Lustre: Skipped 172 previous similar messages [635495.380205] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89987a2da000, cur 1591317599 expire 1591317449 last 1591317372 [635869.557016] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [635869.557022] Lustre: Skipped 112 previous similar messages [636096.400781] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a3a19d6c00, cur 1591318200 expire 1591318050 last 1591317973 [636470.699250] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [636470.699256] Lustre: Skipped 44 previous similar messages [636697.422671] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8978d9ff4000, cur 1591318801 expire 1591318651 last 1591318574 [637071.841585] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [637071.841591] Lustre: Skipped 252 previous similar messages [637298.445570] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8979d2a7a000, cur 1591319402 expire 1591319252 last 1591319175 [637672.984528] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [637672.984534] Lustre: Skipped 66 previous similar messages [637899.467554] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8973e768e000, cur 1591320003 expire 1591319853 last 1591319776 [638274.129050] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [638274.129055] Lustre: Skipped 80 previous similar messages [638500.488696] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a034bc5000, cur 1591320604 expire 1591320454 last 1591320377 [638875.270481] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [638875.270487] Lustre: Skipped 42 previous similar messages [639101.510278] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89740021b400, cur 1591321205 expire 1591321055 last 1591320978 [639476.414017] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [639476.414022] Lustre: Skipped 380 previous similar messages [639702.533054] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899e3faf3c00, cur 1591321806 expire 1591321656 last 1591321579 [640077.560087] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [640077.560092] Lustre: Skipped 58 previous similar messages [640304.554963] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899813369c00, cur 1591322408 expire 1591322258 last 1591322181 [640678.702671] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [640678.702676] Lustre: Skipped 64 previous similar messages [640905.576683] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a378a7ac00, cur 1591323009 expire 1591322859 last 1591322782 [641059.492249] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [641059.525746] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.2.196@o2ib (303): c: 32, oc: 0, rc: 32 [641279.845477] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [641279.845483] Lustre: Skipped 82 previous similar messages [641506.598145] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8973f362a800, cur 1591323610 expire 1591323460 last 1591323383 [641880.988190] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [641880.988195] Lustre: Skipped 461 previous similar messages [642373.540175] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [642373.573668] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.63.36@o2ib (303): c: 30, oc: 0, rc: 32 [642481.035497] Lustre: nbp8-MDT0000: Connection restored to 26caefa1-759d-49b0-156f-0bc93668766b (at 10.153.16.42@o2ib233) [642481.035502] Lustre: Skipped 508 previous similar messages [642708.642261] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899f5fea7000, cur 1591324812 expire 1591324662 last 1591324585 [642708.715510] Lustre: Skipped 3 previous similar messages [643083.278099] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [643083.278104] Lustre: Skipped 776 previous similar messages [643309.667096] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899f5865b400, cur 1591325413 expire 1591325263 last 1591325186 [643684.412927] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [643684.412932] Lustre: Skipped 26 previous similar messages [643910.685235] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899b2ca7bc00, cur 1591326014 expire 1591325864 last 1591325787 [644285.558399] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [644285.558404] Lustre: Skipped 26 previous similar messages [644511.708610] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8998592b9000, cur 1591326615 expire 1591326465 last 1591326388 [644886.702590] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [644886.702595] Lustre: Skipped 94 previous similar messages [645112.732829] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899b32edf400, cur 1591327216 expire 1591327066 last 1591326989 [645487.843619] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [645487.843625] Lustre: Skipped 200 previous similar messages [645714.752206] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8973c5734800, cur 1591327818 expire 1591327668 last 1591327591 [646088.984778] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [646088.984783] Lustre: Skipped 2 previous similar messages [646315.780720] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89752f789400, cur 1591328419 expire 1591328269 last 1591328192 [646690.127017] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [646690.127023] Lustre: Skipped 40 previous similar messages [646916.804623] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8978d9ff1400, cur 1591329020 expire 1591328870 last 1591328793 [647175.715110] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [647175.748611] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.28.15@o2ib (296): c: 32, oc: 0, rc: 32 [647291.269260] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [647291.269265] Lustre: Skipped 84 previous similar messages [647517.818021] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899beca32400, cur 1591329621 expire 1591329471 last 1591329394 [647892.414338] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [647892.414344] Lustre: Skipped 58 previous similar messages [648118.838782] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8973b6397400, cur 1591330222 expire 1591330072 last 1591329995 [648459.762225] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [648459.795732] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.38.125@o2ib (303): c: 32, oc: 0, rc: 32 [648493.555894] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [648493.555899] Lustre: Skipped 116 previous similar messages [648719.863765] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89990864b400, cur 1591330823 expire 1591330673 last 1591330596 [649094.700654] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [649094.700659] Lustre: Skipped 124 previous similar messages [649320.886398] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899f4c735400, cur 1591331424 expire 1591331274 last 1591331197 [649695.842052] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [649695.842056] Lustre: Skipped 214 previous similar messages [649921.905592] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899bde2e7400, cur 1591332025 expire 1591331875 last 1591331798 [650296.984152] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [650296.984158] Lustre: Skipped 38 previous similar messages [650523.936555] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897aac2d2400, cur 1591332627 expire 1591332477 last 1591332400 [650898.127541] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [650898.127546] Lustre: Skipped 42 previous similar messages [651124.950419] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89985c23c000, cur 1591333228 expire 1591333078 last 1591333001 [651499.270088] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [651499.270093] Lustre: Skipped 228 previous similar messages [651725.971281] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8997e27c0800, cur 1591333829 expire 1591333679 last 1591333602 [652100.411011] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [652100.411016] Lustre: Skipped 96 previous similar messages [652326.997596] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899b4f670400, cur 1591334430 expire 1591334280 last 1591334203 [652701.553390] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [652701.553395] Lustre: Skipped 104 previous similar messages [652928.016300] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a396665800, cur 1591335031 expire 1591334881 last 1591334804 [653302.698481] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [653302.698485] Lustre: Skipped 6 previous similar messages [653529.041318] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a09e693000, cur 1591335632 expire 1591335482 last 1591335405 [653903.839889] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [653903.839894] Lustre: Skipped 96 previous similar messages [654130.062076] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8997bc686c00, cur 1591336233 expire 1591336083 last 1591336006 [654504.980691] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [654504.980696] Lustre: Skipped 10 previous similar messages [654731.082716] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897475e42000, cur 1591336834 expire 1591336684 last 1591336607 [655106.124609] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [655106.124615] Lustre: Skipped 12 previous similar messages [655333.105304] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8979c1244c00, cur 1591337436 expire 1591337286 last 1591337209 [655707.152514] Lustre: MGS: Connection restored to b6b3c963-8260-e76d-6e45-87a60e923c83 (at 10.149.11.127@o2ib313) [655707.152520] Lustre: Skipped 232 previous similar messages [655934.127385] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897437f37000, cur 1591338037 expire 1591337887 last 1591337810 [656308.410766] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [656308.410770] Lustre: Skipped 174 previous similar messages [656535.148529] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89975fb5c800, cur 1591338638 expire 1591338488 last 1591338411 [656909.555993] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [656909.555999] Lustre: Skipped 226 previous similar messages [657100.146831] LustreError: 12653:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) ### client (nid 10.149.1.69@o2ib313) returned error from blocking AST (req@ffff897d173fa880 x1667957988363648 status -107 rc -107), evict it ns: mdt-nbp8-MDT0000_UUID lock: ffff89a4007069c0/0xa22cee34506cd2fa lrc: 4/0,0 mode: PR/PR res: [0x3608ab237:0x75a3:0x0].0x0 bits 0x13/0x0 rrc: 9 type: IBT flags: 0x60200400000020 nid: 10.149.1.69@o2ib313 remote: 0xe020a7631eb22fa0 expref: 96 pid: 12655 timeout: 657487 lvb_type: 0 [657100.291041] LustreError: 138-a: nbp8-MDT0000: A client on nid 10.149.1.69@o2ib313 was evicted due to a lock blocking callback time out: rc -107 [657100.333707] LustreError: 5711:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 1s: evicting client at 10.149.1.69@o2ib313 ns: mdt-nbp8-MDT0000_UUID lock: ffff89a4007069c0/0xa22cee34506cd2fa lrc: 3/0,0 mode: PR/PR res: [0x3608ab237:0x75a3:0x0].0x0 bits 0x13/0x0 rrc: 10 type: IBT flags: 0x60200400000020 nid: 10.149.1.69@o2ib313 remote: 0xe020a7631eb22fa0 expref: 97 pid: 12655 timeout: 0 lvb_type: 0 [657136.171881] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8973e8b70400, cur 1591339239 expire 1591339089 last 1591339012 [657510.699228] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [657510.699233] Lustre: Skipped 152 previous similar messages [657737.192581] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899756bd9000, cur 1591339840 expire 1591339690 last 1591339613 [657737.265855] Lustre: Skipped 3 previous similar messages [658111.838248] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [658111.838253] Lustre: Skipped 146 previous similar messages [658338.214433] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899895a5bc00, cur 1591340441 expire 1591340291 last 1591340214 [658712.984159] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [658712.984164] Lustre: Skipped 168 previous similar messages [658839.324412] Process accounting resumed [658939.236320] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899b8f6a7400, cur 1591341042 expire 1591340892 last 1591340815 [659314.125834] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [659314.125840] Lustre: Skipped 86 previous similar messages [659540.258770] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89752d33e000, cur 1591341643 expire 1591341493 last 1591341416 [659915.270427] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [659915.270433] Lustre: Skipped 48 previous similar messages [660141.281146] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899b53b2f400, cur 1591342244 expire 1591342094 last 1591342017 [660516.414198] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [660516.414203] Lustre: Skipped 42 previous similar messages [660743.304007] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89987a287000, cur 1591342846 expire 1591342696 last 1591342619 [661117.556464] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [661117.556469] Lustre: Skipped 286 previous similar messages [661344.323547] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899f8e610400, cur 1591343447 expire 1591343297 last 1591343220 [661718.699099] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [661718.699103] Lustre: Skipped 8 previous similar messages [661945.346275] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8997bca4d800, cur 1591344048 expire 1591343898 last 1591343821 [662319.848083] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [662319.848087] Lustre: Skipped 20 previous similar messages [662546.371571] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a3988d5c00, cur 1591344649 expire 1591344499 last 1591344422 [662920.999397] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [662920.999403] Lustre: Skipped 24 previous similar messages [663147.389384] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899f45f55800, cur 1591345250 expire 1591345100 last 1591345023 [663522.139181] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [663522.139187] Lustre: Skipped 46 previous similar messages [663748.413500] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897cc74cb800, cur 1591345851 expire 1591345701 last 1591345624 [664123.275979] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [664123.275985] Lustre: Skipped 124 previous similar messages [664349.434296] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8974b3e95400, cur 1591346452 expire 1591346302 last 1591346225 [664724.434209] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [664724.434215] Lustre: Skipped 28 previous similar messages [664950.455108] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89830b172400, cur 1591347053 expire 1591346903 last 1591346826 [665325.569643] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [665325.569648] Lustre: Skipped 230 previous similar messages [665552.477361] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899b21ed0800, cur 1591347655 expire 1591347505 last 1591347428 [665926.713067] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [665926.713073] Lustre: Skipped 10 previous similar messages [666153.503747] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8973af61d800, cur 1591348256 expire 1591348106 last 1591348029 [666527.856594] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [666527.856599] Lustre: Skipped 60 previous similar messages [666754.523648] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a3fee1cc00, cur 1591348857 expire 1591348707 last 1591348630 [667129.001684] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [667129.001689] Lustre: Skipped 324 previous similar messages [667355.543028] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8999cda9c000, cur 1591349458 expire 1591349308 last 1591349231 [667730.148185] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [667730.148190] Lustre: Skipped 168 previous similar messages [667956.566186] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899dceb67000, cur 1591350059 expire 1591349909 last 1591349832 [668331.287598] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [668331.287604] Lustre: Skipped 10 previous similar messages [668557.588190] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8973b2fed400, cur 1591350660 expire 1591350510 last 1591350433 [668932.430685] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [668932.430690] Lustre: Skipped 20 previous similar messages [669158.609167] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89979561ec00, cur 1591351261 expire 1591351111 last 1591351034 [669533.570033] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [669759.631982] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899f443ad400, cur 1591351862 expire 1591351712 last 1591351635 [670134.715070] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [670134.715075] Lustre: Skipped 134 previous similar messages [670361.656365] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8998b1378800, cur 1591352464 expire 1591352314 last 1591352237 [670735.855798] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [670735.855803] Lustre: Skipped 36 previous similar messages [670962.674914] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8999bfb45c00, cur 1591353065 expire 1591352915 last 1591352838 [671337.000091] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [671337.000096] Lustre: Skipped 86 previous similar messages [671563.698229] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899bdee3ec00, cur 1591353666 expire 1591353516 last 1591353439 [671938.146441] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [671938.146446] Lustre: Skipped 2 previous similar messages [672164.721249] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899ec4edd400, cur 1591354267 expire 1591354117 last 1591354040 [672539.290290] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [672539.290296] Lustre: Skipped 86 previous similar messages [672765.741446] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897e0220c400, cur 1591354868 expire 1591354718 last 1591354641 [673140.432146] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [673140.432152] Lustre: Skipped 24 previous similar messages [673366.763067] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8974c583d400, cur 1591355469 expire 1591355319 last 1591355242 [673424.676699] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [673424.710192] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.32.78@o2ib (293): c: 32, oc: 0, rc: 32 [673741.573374] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [673741.573380] Lustre: Skipped 24 previous similar messages [673967.785606] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff898050b9f000, cur 1591356070 expire 1591355920 last 1591355843 [674342.717035] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [674342.717040] Lustre: Skipped 81 previous similar messages [674556.148470] Lustre: MGS: Received new LWP connection from 10.141.6.165@o2ib417, removing former export from same NID [674556.183388] Lustre: Skipped 1 previous similar message [674568.809289] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8998716a5000, cur 1591356671 expire 1591356521 last 1591356444 [674943.856957] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [674943.856963] Lustre: Skipped 915 previous similar messages [675170.829602] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8978ea3f5800, cur 1591357273 expire 1591357123 last 1591357046 [675545.000924] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [675771.851909] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8998b1eb2c00, cur 1591357874 expire 1591357724 last 1591357647 [676146.143251] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [676146.143256] Lustre: Skipped 86 previous similar messages [676372.877426] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899e482e4000, cur 1591358475 expire 1591358325 last 1591358248 [676747.284444] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [676747.284449] Lustre: Skipped 74 previous similar messages [676973.895449] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8997b771e000, cur 1591359076 expire 1591358926 last 1591358849 [676973.968727] Lustre: Skipped 2 previous similar messages [677266.816158] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [677266.849650] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.35.127@o2ib (303): c: 32, oc: 0, rc: 32 [677348.426403] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [677348.426409] Lustre: Skipped 2 previous similar messages [677574.919414] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897b08273400, cur 1591359677 expire 1591359527 last 1591359450 [677949.567483] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [677949.567488] Lustre: Skipped 270 previous similar messages [678175.939783] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a087f5bc00, cur 1591360278 expire 1591360128 last 1591360051 [678428.858513] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [678428.892013] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.37.120@o2ib (296): c: 32, oc: 0, rc: 32 [678550.711163] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [678550.711167] Lustre: Skipped 84 previous similar messages [678776.962410] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899ddaac0800, cur 1591360879 expire 1591360729 last 1591360652 [679151.855557] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [679151.855562] Lustre: Skipped 378 previous similar messages [679377.984337] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8997bf27ac00, cur 1591361480 expire 1591361330 last 1591361253 [679752.997892] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [679752.997897] Lustre: Skipped 118 previous similar messages [679949.914173] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [679949.947674] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.37.133@o2ib (303): c: 32, oc: 0, rc: 32 [679984.915474] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [679984.948976] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.35.109@o2ib (303): c: 32, oc: 0, rc: 32 [680007.916308] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [680007.949814] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.37.148@o2ib (306): c: 30, oc: 0, rc: 32 [680014.916561] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [680014.950062] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.37.162@o2ib (314): c: 30, oc: 0, rc: 32 [680022.916861] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [680022.950367] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 5 previous similar messages [680022.983854] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.37.178@o2ib (322): c: 30, oc: 0, rc: 32 [680023.024784] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 5 previous similar messages [680043.917614] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [680043.951121] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.37.119@o2ib (343): c: 30, oc: 0, rc: 32 [680354.138790] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [680354.138795] Lustre: Skipped 28 previous similar messages [680581.027909] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89996ef51400, cur 1591362683 expire 1591362533 last 1591362456 [680581.101178] Lustre: Skipped 25 previous similar messages [680955.282312] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [680955.282317] Lustre: Skipped 148 previous similar messages [681182.051395] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899871361400, cur 1591363284 expire 1591363134 last 1591363057 [681477.969940] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [681478.003439] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 3 previous similar messages [681478.036919] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.33.172@o2ib (304): c: 32, oc: 0, rc: 32 [681478.077840] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 3 previous similar messages [681556.428162] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [681556.428168] Lustre: Skipped 6 previous similar messages [681783.071499] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8973b722f400, cur 1591363885 expire 1591363735 last 1591363658 [682157.573567] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [682157.573572] Lustre: Skipped 114 previous similar messages [682384.105357] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89809e3cd400, cur 1591364486 expire 1591364336 last 1591364259 [682758.712941] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [682758.712947] Lustre: Skipped 64 previous similar messages [682985.126003] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89750def2400, cur 1591365087 expire 1591364937 last 1591364860 [683359.857170] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [683359.857175] Lustre: Skipped 140 previous similar messages [683586.140424] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899b84614800, cur 1591365688 expire 1591365538 last 1591365461 [683960.999385] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [683960.999391] Lustre: Skipped 912 previous similar messages [684187.162536] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897aa8f1f400, cur 1591366289 expire 1591366139 last 1591366062 [684562.141033] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [684562.141038] Lustre: Skipped 2105 previous similar messages [684788.180004] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899bde2e5800, cur 1591366890 expire 1591366740 last 1591366663 [685099.102364] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [685099.135848] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [685099.169056] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.31.250@o2ib (307): c: 30, oc: 0, rc: 32 [685099.209972] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [685107.102739] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [685107.136215] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.32.213@o2ib (314): c: 30, oc: 0, rc: 32 [685109.102789] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [685109.136285] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.30.223@o2ib (317): c: 30, oc: 0, rc: 32 [685118.103069] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [685118.136562] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.32.234@o2ib (325): c: 30, oc: 0, rc: 32 [685127.103491] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [685127.136978] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.31.206@o2ib (333): c: 30, oc: 0, rc: 32 [685141.104002] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [685141.137493] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.33.227@o2ib (347): c: 30, oc: 0, rc: 32 [685163.282233] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [685163.282238] Lustre: Skipped 304 previous similar messages [685390.203633] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8999b2b34000, cur 1591367492 expire 1591367342 last 1591367265 [685390.276907] Lustre: Skipped 16 previous similar messages [685764.424765] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [685764.424770] Lustre: Skipped 62 previous similar messages [685991.228136] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899db7f92400, cur 1591368093 expire 1591367943 last 1591367866 [686156.141056] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [686156.174563] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.39.114@o2ib (303): c: 32, oc: 0, rc: 32 [686365.568038] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [686365.568044] Lustre: Skipped 864 previous similar messages [686592.246478] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89809e3ce800, cur 1591368694 expire 1591368544 last 1591368467 [686966.710276] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [686966.710281] Lustre: Skipped 166 previous similar messages [687193.269910] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a3d1ce6000, cur 1591369295 expire 1591369145 last 1591369068 [687567.852713] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [687567.852718] Lustre: Skipped 50 previous similar messages [687794.290459] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8999a1e82800, cur 1591369896 expire 1591369746 last 1591369669 [688168.997585] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [688168.997590] Lustre: Skipped 52 previous similar messages [688395.315094] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a2e628cc00, cur 1591370497 expire 1591370347 last 1591370270 [688395.388385] Lustre: Skipped 2 previous similar messages [688770.142985] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [688770.142990] Lustre: Skipped 128 previous similar messages [688996.337206] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89974cbe2400, cur 1591371098 expire 1591370948 last 1591370871 [689371.289789] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [689371.289794] Lustre: Skipped 326 previous similar messages [689597.356547] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8999cabf2800, cur 1591371699 expire 1591371549 last 1591371472 [689972.436164] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [689972.436170] Lustre: Skipped 50 previous similar messages [690199.381236] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a008748000, cur 1591372301 expire 1591372151 last 1591372074 [690573.580558] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [690573.580563] Lustre: Skipped 32 previous similar messages [690800.401188] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89798c603800, cur 1591372902 expire 1591372752 last 1591372675 [691174.724946] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [691174.724952] Lustre: Skipped 124 previous similar messages [691401.424505] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8976a3362c00, cur 1591373503 expire 1591373353 last 1591373276 [691775.871668] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [691775.871674] Lustre: Skipped 474 previous similar messages [692002.445959] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8978ffb14800, cur 1591374104 expire 1591373954 last 1591373877 [692377.015187] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [692377.015192] Lustre: Skipped 318 previous similar messages [692603.470038] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899a6dafb400, cur 1591374705 expire 1591374555 last 1591374478 [692978.158251] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [692978.158256] Lustre: Skipped 36 previous similar messages [693204.488184] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897ad6281000, cur 1591375306 expire 1591375156 last 1591375079 [693579.299663] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [693579.299668] Lustre: Skipped 128 previous similar messages [693805.512313] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8997ccbd1800, cur 1591375907 expire 1591375757 last 1591375680 [694180.440452] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [694180.440457] Lustre: Skipped 132 previous similar messages [694406.538324] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899fd2e30400, cur 1591376508 expire 1591376358 last 1591376281 [694780.860199] Lustre: MGS: Connection restored to f5bec115-3ce5-6617-c284-cd464bbeb05b (at 10.151.32.213@o2ib) [694780.860204] Lustre: Skipped 124 previous similar messages [695008.554355] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a0dcf0bc00, cur 1591377110 expire 1591376960 last 1591376883 [695382.729108] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [695382.729114] Lustre: Skipped 188 previous similar messages [695458.481664] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [695458.515155] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.9.37@o2ib (303): c: 32, oc: 0, rc: 32 [695609.580663] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8979b3a10c00, cur 1591377711 expire 1591377561 last 1591377484 [695983.872937] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [695983.872942] Lustre: Skipped 10 previous similar messages [696210.606326] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8997ccbd0c00, cur 1591378312 expire 1591378162 last 1591378085 [696585.013417] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [696585.013422] Lustre: Skipped 36 previous similar messages [696811.620653] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899902bfdc00, cur 1591378913 expire 1591378763 last 1591378686 [697186.158973] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [697186.158979] Lustre: Skipped 120 previous similar messages [697412.643858] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899b32eddc00, cur 1591379514 expire 1591379364 last 1591379287 [697787.303826] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [697787.303831] Lustre: Skipped 28 previous similar messages [698013.666976] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89987c229c00, cur 1591380115 expire 1591379965 last 1591379888 [698388.445090] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [698388.445095] Lustre: Skipped 50 previous similar messages [698614.686280] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899f5eab5800, cur 1591380716 expire 1591380566 last 1591380489 [698989.586657] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [698989.586662] Lustre: Skipped 54 previous similar messages [699215.709818] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899b32edc000, cur 1591381317 expire 1591381167 last 1591381090 [699589.989152] Lustre: MGS: Connection restored to 09883bed-841b-96e6-71f5-cb9e8d47cb98 (at 10.149.14.37@o2ib313) [699589.989158] Lustre: Skipped 586 previous similar messages [699817.739620] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8979b8e4b800, cur 1591381919 expire 1591381769 last 1591381692 [700191.867274] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [700191.867280] Lustre: Skipped 130 previous similar messages [700418.759488] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897f6dbc1c00, cur 1591382520 expire 1591382370 last 1591382293 [700793.010470] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [700793.010476] Lustre: Skipped 202 previous similar messages [701019.775275] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8974c6e1e800, cur 1591383121 expire 1591382971 last 1591382894 [701394.151955] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [701394.151960] Lustre: Skipped 264 previous similar messages [701620.798437] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8999f26e2c00, cur 1591383722 expire 1591383572 last 1591383495 [701995.293459] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [701995.293464] Lustre: Skipped 78 previous similar messages [702221.818865] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89752d328000, cur 1591384323 expire 1591384173 last 1591384096 [702596.434929] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [702596.434934] Lustre: Skipped 172 previous similar messages [702822.843000] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a044a3cc00, cur 1591384924 expire 1591384774 last 1591384697 [703197.574150] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [703197.574155] Lustre: Skipped 86 previous similar messages [703340.769904] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [703340.803413] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.9.45@o2ib (278): c: 32, oc: 0, rc: 32 [703346.770128] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [703346.803622] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.9.57@o2ib (296): c: 32, oc: 0, rc: 32 [703423.865917] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8997cd65b000, cur 1591385525 expire 1591385375 last 1591385298 [703798.716739] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [703798.716744] Lustre: Skipped 46 previous similar messages [704024.885885] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899dd679ec00, cur 1591386126 expire 1591385976 last 1591385899 [704399.858008] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [704399.858013] Lustre: Skipped 8 previous similar messages [704625.908748] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8974e4217800, cur 1591386727 expire 1591386577 last 1591386500 [704625.982039] Lustre: Skipped 2 previous similar messages [705001.000729] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [705001.000735] Lustre: Skipped 110 previous similar messages [705227.929224] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89987a286400, cur 1591387329 expire 1591387179 last 1591387102 [705602.145707] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [705602.145712] Lustre: Skipped 12 previous similar messages [705828.955796] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899f662f8c00, cur 1591387930 expire 1591387780 last 1591387703 [706203.290155] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [706203.290160] Lustre: Skipped 66 previous similar messages [706429.982030] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899be424d800, cur 1591388531 expire 1591388381 last 1591388304 [706804.434853] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [706804.434858] Lustre: Skipped 18 previous similar messages [707030.995939] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899bbe246800, cur 1591389132 expire 1591388982 last 1591388905 [707405.579794] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [707405.579799] Lustre: Skipped 230 previous similar messages [707632.020413] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899f8124b000, cur 1591389733 expire 1591389583 last 1591389506 [708006.721354] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [708006.721359] Lustre: Skipped 188 previous similar messages [708233.038689] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899adffea400, cur 1591390334 expire 1591390184 last 1591390107 [708607.865686] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [708607.865691] Lustre: Skipped 44 previous similar messages [708834.063628] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897d36959000, cur 1591390935 expire 1591390785 last 1591390708 [709209.009137] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [709209.009143] Lustre: Skipped 216 previous similar messages [709435.084511] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8975692ea400, cur 1591391536 expire 1591391386 last 1591391309 [709810.152264] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [709810.152270] Lustre: Skipped 10 previous similar messages [710037.104373] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897c1bfd9000, cur 1591392138 expire 1591391988 last 1591391911 [710411.297660] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [710411.297666] Lustre: Skipped 170 previous similar messages [710638.128992] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899ac1299c00, cur 1591392739 expire 1591392589 last 1591392512 [710804.042865] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [710804.076351] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.7.132@o2ib (343): c: 30, oc: 0, rc: 32 [711012.438601] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [711012.438607] Lustre: Skipped 30 previous similar messages [711239.150036] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8974cc134000, cur 1591393340 expire 1591393190 last 1591393113 [711239.223323] Lustre: Skipped 2 previous similar messages [711613.582257] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [711613.582261] Lustre: Skipped 268 previous similar messages [711840.169880] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89752d32e000, cur 1591393941 expire 1591393791 last 1591393714 [712214.727157] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [712214.727162] Lustre: Skipped 86 previous similar messages [712441.194989] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899e52218400, cur 1591394542 expire 1591394392 last 1591394315 [712815.869355] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [712815.869359] Lustre: Skipped 350 previous similar messages [713042.215032] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899756bdc400, cur 1591395143 expire 1591394993 last 1591394916 [713417.010290] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [713417.010295] Lustre: Skipped 146 previous similar messages [713643.235999] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8973b1a5a400, cur 1591395744 expire 1591395594 last 1591395517 [714018.162059] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [714018.162064] Lustre: Skipped 72 previous similar messages [714244.259757] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8998f53b2000, cur 1591396345 expire 1591396195 last 1591396118 [714619.294475] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [714619.294480] Lustre: Skipped 322 previous similar messages [714846.287058] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89981336c400, cur 1591396947 expire 1591396797 last 1591396720 [715220.436496] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [715220.436502] Lustre: Skipped 50 previous similar messages [715447.301785] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8998e0a85c00, cur 1591397548 expire 1591397398 last 1591397321 [715821.579475] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [715821.579480] Lustre: Skipped 326 previous similar messages [716048.324761] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8998c421dc00, cur 1591398149 expire 1591397999 last 1591397922 [716422.720990] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [716422.720995] Lustre: Skipped 122 previous similar messages [716649.346282] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8998f729b400, cur 1591398750 expire 1591398600 last 1591398523 [717023.864011] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [717023.864016] Lustre: Skipped 146 previous similar messages [717250.368547] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899bbbe67000, cur 1591399351 expire 1591399201 last 1591399124 [717593.290847] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [717593.324339] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.19.134@o2ib (303): c: 32, oc: 0, rc: 32 [717625.005411] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [717625.005417] Lustre: Skipped 152 previous similar messages [717851.389852] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89752eaf5800, cur 1591399952 expire 1591399802 last 1591399725 [718226.149676] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [718226.149681] Lustre: Skipped 196 previous similar messages [718452.412251] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff898190564c00, cur 1591400553 expire 1591400403 last 1591400326 [718827.296087] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [718827.296093] Lustre: Skipped 30 previous similar messages [719053.434355] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8973b7f7bc00, cur 1591401154 expire 1591401004 last 1591400927 [719428.443137] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [719428.443142] Lustre: Skipped 50 previous similar messages [719654.455141] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a0b2b17000, cur 1591401755 expire 1591401605 last 1591401528 [720029.586776] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [720029.586781] Lustre: Skipped 48 previous similar messages [720256.478923] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8973e7623c00, cur 1591402357 expire 1591402207 last 1591402130 [720630.729189] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [720630.729194] Lustre: Skipped 20 previous similar messages [720857.499758] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8996f2798400, cur 1591402958 expire 1591402808 last 1591402731 [720866.410010] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [720866.443508] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.34.213@o2ib (303): c: 32, oc: 0, rc: 32 [721231.869330] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [721231.869334] Lustre: Skipped 516 previous similar messages [721458.521582] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899910e7e400, cur 1591403559 expire 1591403409 last 1591403332 [721833.013013] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [721833.013017] Lustre: Skipped 142 previous similar messages [722059.545311] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899a76ff3000, cur 1591404160 expire 1591404010 last 1591403933 [722434.154043] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [722434.154049] Lustre: Skipped 106 previous similar messages [722660.564515] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897952e9f000, cur 1591404761 expire 1591404611 last 1591404534 [723035.296927] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [723035.296932] Lustre: Skipped 326 previous similar messages [723261.588833] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899f1a7ef000, cur 1591405362 expire 1591405212 last 1591405135 [723391.502079] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [723391.535578] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.33.66@o2ib (303): c: 32, oc: 0, rc: 32 [723636.441788] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [723636.441793] Lustre: Skipped 378 previous similar messages [723862.609380] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8996f279ac00, cur 1591405963 expire 1591405813 last 1591405736 [724237.590176] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [724237.590181] Lustre: Skipped 70 previous similar messages [724463.631069] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8999a97e2800, cur 1591406564 expire 1591406414 last 1591406337 [724838.734465] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [724838.734470] Lustre: Skipped 76 previous similar messages [725065.652071] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a3d5360400, cur 1591407166 expire 1591407016 last 1591406939 [725439.872220] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [725439.872226] Lustre: Skipped 22 previous similar messages [725666.673817] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899d676c7c00, cur 1591407767 expire 1591407617 last 1591407540 [726040.695229] Lustre: MGS: Connection restored to 1135acd8-005a-1225-ae33-e3ac497987c1 (at 10.153.10.77@o2ib233) [726040.695234] Lustre: Skipped 120 previous similar messages [726267.697008] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a3a1de3400, cur 1591408368 expire 1591408218 last 1591408141 [726521.616096] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [726521.649582] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.33.165@o2ib (299): c: 32, oc: 0, rc: 32 [726641.578362] Lustre: MGS: Connection restored to 948a3e52-5cf4-2b91-aa55-b735e15bbea7 (at 10.151.50.170@o2ib) [726641.578368] Lustre: Skipped 1720 previous similar messages [726868.721686] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89790d7bb000, cur 1591408969 expire 1591408819 last 1591408742 [726922.630740] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [726922.664247] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.37.145@o2ib (303): c: 32, oc: 0, rc: 32 [727243.303587] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [727243.303593] Lustre: Skipped 90 previous similar messages [727469.740829] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8973d5b31c00, cur 1591409570 expire 1591409420 last 1591409343 [727844.446760] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [727844.446766] Lustre: Skipped 28 previous similar messages [728070.763449] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8979bbf7f000, cur 1591410171 expire 1591410021 last 1591409944 [728445.590327] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [728445.590332] Lustre: Skipped 684 previous similar messages [728671.784840] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899fc9fc0800, cur 1591410772 expire 1591410622 last 1591410545 [729046.732298] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [729046.732303] Lustre: Skipped 34 previous similar messages [729272.809039] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8973327a5c00, cur 1591411373 expire 1591411223 last 1591411146 [729647.873002] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [729647.873007] Lustre: Skipped 28 previous similar messages [729874.827475] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899f662f8000, cur 1591411975 expire 1591411825 last 1591411748 [730249.014998] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [730249.015003] Lustre: Skipped 98 previous similar messages [730475.850532] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89987a2db800, cur 1591412576 expire 1591412426 last 1591412349 [730850.154979] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [730850.154984] Lustre: Skipped 530 previous similar messages [731076.872534] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899a636db000, cur 1591413177 expire 1591413027 last 1591412950 [731451.301081] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [731451.301087] Lustre: Skipped 58 previous similar messages [731677.894488] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8979b8f30c00, cur 1591413778 expire 1591413628 last 1591413551 [732052.450073] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [732052.450078] Lustre: Skipped 30 previous similar messages [732278.921682] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89998daea400, cur 1591414379 expire 1591414229 last 1591414152 [732653.581912] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [732653.581917] Lustre: Skipped 186 previous similar messages [732879.938457] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899ff4a67000, cur 1591414980 expire 1591414830 last 1591414753 [733254.724346] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [733254.724351] Lustre: Skipped 98 previous similar messages [733480.960892] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89989720bc00, cur 1591415581 expire 1591415431 last 1591415354 [733855.867414] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [733855.867419] Lustre: Skipped 72 previous similar messages [734081.981848] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899b84617400, cur 1591416182 expire 1591416032 last 1591415955 [734457.012748] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [734684.004864] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a059ab1c00, cur 1591416784 expire 1591416634 last 1591416557 [735058.152761] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [735058.152765] Lustre: Skipped 2 previous similar messages [735285.026895] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899f7e3c1800, cur 1591417385 expire 1591417235 last 1591417158 [735659.298972] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [735659.298978] Lustre: Skipped 94 previous similar messages [735886.047909] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89778ba1e400, cur 1591417986 expire 1591417836 last 1591417759 [736260.444754] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [736260.444759] Lustre: Skipped 262 previous similar messages [736487.071084] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897eaacb4400, cur 1591418587 expire 1591418437 last 1591418360 [736861.587242] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [736861.587248] Lustre: Skipped 20 previous similar messages [737088.103146] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897502326000, cur 1591419188 expire 1591419038 last 1591418961 [737462.728905] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [737462.728910] Lustre: Skipped 26 previous similar messages [737689.117897] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a034bc2000, cur 1591419789 expire 1591419639 last 1591419562 [738063.872921] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [738063.872927] Lustre: Skipped 4 previous similar messages [738290.137239] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8974eee53000, cur 1591420390 expire 1591420240 last 1591420163 [738665.014914] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [738665.014919] Lustre: Skipped 202 previous similar messages [738891.160034] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a3675ae000, cur 1591420991 expire 1591420841 last 1591420764 [739266.156787] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [739266.156792] Lustre: Skipped 2 previous similar messages [739492.180792] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899bdc3bf800, cur 1591421592 expire 1591421442 last 1591421365 [739867.296270] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [739867.296275] Lustre: Skipped 66 previous similar messages [740094.204053] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899ba5b22400, cur 1591422194 expire 1591422044 last 1591421967 [740468.442248] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [740468.442254] Lustre: Skipped 214 previous similar messages [740695.225961] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897d1b678000, cur 1591422795 expire 1591422645 last 1591422568 [741069.584440] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [741069.584445] Lustre: Skipped 4 previous similar messages [741296.247014] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899f51a8ac00, cur 1591423396 expire 1591423246 last 1591423169 [741670.736801] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [741670.736807] Lustre: Skipped 348 previous similar messages [741897.269818] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8998553eec00, cur 1591423997 expire 1591423847 last 1591423770 [742271.874896] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [742271.874901] Lustre: Skipped 6 previous similar messages [742498.291738] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a3675a9800, cur 1591424598 expire 1591424448 last 1591424371 [742873.013469] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [742873.013474] Lustre: Skipped 18 previous similar messages [743099.313018] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899eecbef400, cur 1591425199 expire 1591425049 last 1591424972 [743474.155763] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [743474.155768] Lustre: Skipped 46 previous similar messages [743700.335710] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89987abfcc00, cur 1591425800 expire 1591425650 last 1591425573 [744075.298503] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [744075.298508] Lustre: Skipped 62 previous similar messages [744301.362710] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a3644e6400, cur 1591426401 expire 1591426251 last 1591426174 [744676.441651] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [744676.441657] Lustre: Skipped 2 previous similar messages [744903.380089] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8973b7f5d000, cur 1591427003 expire 1591426853 last 1591426776 [745014.293706] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [745014.327208] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.32.35@o2ib (303): c: 32, oc: 0, rc: 32 [745016.293630] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [745016.327123] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.32.38@o2ib (293): c: 32, oc: 0, rc: 32 [745018.294692] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [745018.328183] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [745018.361398] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.32.42@o2ib (267): c: 32, oc: 0, rc: 32 [745018.402030] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [745023.293895] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [745023.327394] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.32.52@o2ib (283): c: 32, oc: 0, rc: 32 [745073.295706] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [745073.329200] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.32.51@o2ib (303): c: 32, oc: 0, rc: 32 [745081.297015] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [745081.330523] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.32.67@o2ib (269): c: 32, oc: 0, rc: 32 [745104.297817] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [745104.331317] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.32.113@o2ib (302): c: 32, oc: 0, rc: 32 [745143.298256] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [745143.331745] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 2 previous similar messages [745143.365233] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.32.90@o2ib (304): c: 32, oc: 0, rc: 32 [745143.405877] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 2 previous similar messages [745209.300671] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [745209.334195] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 5 previous similar messages [745209.367691] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.32.121@o2ib (304): c: 32, oc: 0, rc: 32 [745209.408604] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 5 previous similar messages [745277.587731] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [745277.587737] Lustre: Skipped 176 previous similar messages [745339.305534] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [745339.339034] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 14 previous similar messages [745339.372816] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.33.25@o2ib (299): c: 32, oc: 0, rc: 32 [745339.413445] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 14 previous similar messages [745504.403371] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8973d06ff000, cur 1591427604 expire 1591427454 last 1591427377 [745504.476654] Lustre: Skipped 2 previous similar messages [745782.279910] Process accounting resumed [745878.727615] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [745878.727620] Lustre: Skipped 28 previous similar messages [746016.330137] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [746016.363646] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 28 previous similar messages [746016.397420] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.34.11@o2ib (304): c: 32, oc: 0, rc: 32 [746016.438045] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 28 previous similar messages [746105.423936] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8973c5730400, cur 1591428205 expire 1591428055 last 1591427978 [746479.869965] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [746479.869970] Lustre: Skipped 14 previous similar messages [746569.351311] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [746569.384810] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 73 previous similar messages [746569.418588] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.30.20@o2ib (304): c: 32, oc: 0, rc: 32 [746569.459216] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 73 previous similar messages [746706.445328] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899871365800, cur 1591428806 expire 1591428656 last 1591428579 [746706.518590] Lustre: Skipped 58 previous similar messages [747081.010181] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [747081.010187] Lustre: Skipped 118 previous similar messages [747175.372439] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [747175.405947] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 48 previous similar messages [747175.439727] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.38.93@o2ib (304): c: 32, oc: 0, rc: 32 [747175.480360] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 48 previous similar messages [747307.470667] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897a14a1b400, cur 1591429407 expire 1591429257 last 1591429180 [747307.543948] Lustre: Skipped 68 previous similar messages [747682.154842] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [747682.154848] Lustre: Skipped 10 previous similar messages [747908.490911] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899a7223f800, cur 1591430008 expire 1591429858 last 1591429781 [748283.296802] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [748283.296807] Lustre: Skipped 24 previous similar messages [748509.512098] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a193ac1c00, cur 1591430609 expire 1591430459 last 1591430382 [748884.438890] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [748884.438894] Lustre: Skipped 118 previous similar messages [749110.534535] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8998592bcc00, cur 1591431210 expire 1591431060 last 1591430983 [749485.582276] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [749485.582282] Lustre: Skipped 144 previous similar messages [749712.555839] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897dc0e14400, cur 1591431812 expire 1591431662 last 1591431585 [750086.726508] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [750086.726513] Lustre: Skipped 4 previous similar messages [750313.577622] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899f6d320c00, cur 1591432413 expire 1591432263 last 1591432186 [750581.497013] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [750581.530514] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 18 previous similar messages [750581.564296] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.7.95@o2ib (300): c: 32, oc: 0, rc: 32 [750581.604653] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 18 previous similar messages [750687.868725] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [750687.868730] Lustre: Skipped 56 previous similar messages [750914.598846] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899810b27800, cur 1591433014 expire 1591432864 last 1591432787 [751289.010906] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [751289.010911] Lustre: Skipped 356 previous similar messages [751515.623319] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899aae266c00, cur 1591433615 expire 1591433465 last 1591433388 [751890.153982] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [751890.153987] Lustre: Skipped 202 previous similar messages [752116.643773] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a3ba8db400, cur 1591434216 expire 1591434066 last 1591433989 [752491.299918] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [752491.299922] Lustre: Skipped 2 previous similar messages [752717.666684] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899897387800, cur 1591434817 expire 1591434667 last 1591434590 [753092.442284] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [753092.442289] Lustre: Skipped 102 previous similar messages [753318.693210] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8997bc683000, cur 1591435418 expire 1591435268 last 1591435191 [753693.587319] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [753693.587325] Lustre: Skipped 2 previous similar messages [753919.709319] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8975c7bbb800, cur 1591436019 expire 1591435869 last 1591435792 [754294.730085] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [754294.730091] Lustre: Skipped 252 previous similar messages [754521.734177] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89725ff8e400, cur 1591436621 expire 1591436471 last 1591436394 [754895.872290] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [754895.872296] Lustre: Skipped 234 previous similar messages [755122.756819] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897dbd648800, cur 1591437222 expire 1591437072 last 1591436995 [755497.016576] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [755497.016582] Lustre: Skipped 404 previous similar messages [755723.780188] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899ad7ff6400, cur 1591437823 expire 1591437673 last 1591437596 [756098.158309] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [756098.158315] Lustre: Skipped 24 previous similar messages [756324.803080] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8974dc638800, cur 1591438424 expire 1591438274 last 1591438197 [756699.299971] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [756699.299976] Lustre: Skipped 212 previous similar messages [756925.818974] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899ee77ff800, cur 1591439025 expire 1591438875 last 1591438798 [757300.444349] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [757300.444354] Lustre: Skipped 20 previous similar messages [757526.842145] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899bde750c00, cur 1591439626 expire 1591439476 last 1591439399 [757901.582818] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [757901.582823] Lustre: Skipped 48 previous similar messages [758127.865537] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899b843cfc00, cur 1591440227 expire 1591440077 last 1591440000 [758502.728174] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [758502.728180] Lustre: Skipped 220 previous similar messages [758728.898456] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8980f8497800, cur 1591440828 expire 1591440678 last 1591440601 [759103.868903] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [759103.868908] Lustre: Skipped 52 previous similar messages [759180.812609] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [759180.846101] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.3.36@o2ib (303): c: 32, oc: 0, rc: 32 [759329.911264] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899b3fab7400, cur 1591441429 expire 1591441279 last 1591441202 [759705.010915] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [759705.010920] Lustre: Skipped 12 previous similar messages [759931.930826] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899cc3738800, cur 1591442031 expire 1591441881 last 1591441804 [760306.153781] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [760306.153786] Lustre: Skipped 10 previous similar messages [760532.952908] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899f5fea6800, cur 1591442632 expire 1591442482 last 1591442405 [760907.295232] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [760907.295236] Lustre: Skipped 36 previous similar messages [761133.973995] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a3b1a43000, cur 1591443233 expire 1591443083 last 1591443006 [761508.436343] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [761508.436348] Lustre: Skipped 222 previous similar messages [761734.999448] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a39bd7a000, cur 1591443834 expire 1591443684 last 1591443607 [762109.580519] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [762109.580524] Lustre: Skipped 48 previous similar messages [762336.020855] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899e63e3d400, cur 1591444435 expire 1591444285 last 1591444208 [762710.725554] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [762710.725559] Lustre: Skipped 2 previous similar messages [762937.042560] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89737c318000, cur 1591445036 expire 1591444886 last 1591444809 [763311.870610] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [763311.870615] Lustre: Skipped 8 previous similar messages [763538.062099] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8973fc34d800, cur 1591445637 expire 1591445487 last 1591445410 [763913.009233] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [763913.009239] Lustre: Skipped 100 previous similar messages [764139.085767] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89755c6c5400, cur 1591446238 expire 1591446088 last 1591446011 [764514.152185] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [764514.152190] Lustre: Skipped 38 previous similar messages [764741.109190] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899eaa710800, cur 1591446840 expire 1591446690 last 1591446613 [765115.294495] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [765115.294500] Lustre: Skipped 794 previous similar messages [765342.131829] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8973b4a49800, cur 1591447441 expire 1591447291 last 1591447214 [765716.436441] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [765716.436445] Lustre: Skipped 24 previous similar messages [765943.152074] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8974aeac3400, cur 1591448042 expire 1591447892 last 1591447815 [766317.577127] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [766317.577132] Lustre: Skipped 46 previous similar messages [766544.172968] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899dd1fab400, cur 1591448643 expire 1591448493 last 1591448416 [766918.719181] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [766918.719186] Lustre: Skipped 56 previous similar messages [767145.195688] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897ad56a1c00, cur 1591449244 expire 1591449094 last 1591449017 [767519.862161] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [767519.862167] Lustre: Skipped 6 previous similar messages [767746.217470] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff898176b10000, cur 1591449845 expire 1591449695 last 1591449618 [768121.000981] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [768121.000986] Lustre: Skipped 272 previous similar messages [768347.240881] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899dcff18c00, cur 1591450446 expire 1591450296 last 1591450219 [768722.142831] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [768722.142836] Lustre: Skipped 188 previous similar messages [768948.263283] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a1b57cd400, cur 1591451047 expire 1591450897 last 1591450820 [769323.285245] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [769323.285250] Lustre: Skipped 52 previous similar messages [769550.287113] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a0397e2000, cur 1591451649 expire 1591451499 last 1591451422 [769924.427963] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [769924.427967] Lustre: Skipped 2 previous similar messages [770151.307598] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899eb17aa800, cur 1591452250 expire 1591452100 last 1591452023 [770525.571325] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [770525.571330] Lustre: Skipped 60 previous similar messages [770752.329324] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8999102fb400, cur 1591452851 expire 1591452701 last 1591452624 [771126.716733] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [771126.716737] Lustre: Skipped 2 previous similar messages [771353.354649] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899724bb9800, cur 1591453452 expire 1591453302 last 1591453225 [771727.856320] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [771727.856326] Lustre: Skipped 6 previous similar messages [771954.373133] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897ec8f0fc00, cur 1591454053 expire 1591453903 last 1591453826 [772328.996302] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [772328.996308] Lustre: Skipped 68 previous similar messages [772555.394690] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89796bb2a000, cur 1591454654 expire 1591454504 last 1591454427 [772930.139859] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [772930.139865] Lustre: Skipped 96 previous similar messages [773156.416121] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897d7683c000, cur 1591455255 expire 1591455105 last 1591455028 [773531.283643] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [773531.283648] Lustre: Skipped 78 previous similar messages [773757.439976] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a3e379f800, cur 1591455856 expire 1591455706 last 1591455629 [774132.429272] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [774132.429280] Lustre: Skipped 1596 previous similar messages [774358.463002] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8973e6e8c400, cur 1591456457 expire 1591456307 last 1591456230 [774733.570956] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [774733.570961] Lustre: Skipped 24 previous similar messages [774960.484400] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8998566e4800, cur 1591457059 expire 1591456909 last 1591456832 [775141.399558] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [775141.433058] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.34.83@o2ib (330): c: 30, oc: 0, rc: 32 [775334.712354] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [775334.712359] Lustre: Skipped 424 previous similar messages [775494.412530] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [775494.446031] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.32.89@o2ib (303): c: 32, oc: 0, rc: 32 [775497.412664] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [775497.446165] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.32.95@o2ib (303): c: 32, oc: 0, rc: 32 [775542.414287] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [775542.447803] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 3 previous similar messages [775542.481305] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.34.77@o2ib (312): c: 30, oc: 0, rc: 32 [775542.521933] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 3 previous similar messages [775561.505122] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a3dc7d2800, cur 1591457660 expire 1591457510 last 1591457433 [775561.578372] Lustre: Skipped 6 previous similar messages [775935.856337] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [775935.856342] Lustre: Skipped 6 previous similar messages [776059.433371] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [776059.466860] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [776059.500062] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.9.38@o2ib (283): c: 32, oc: 0, rc: 32 [776059.540410] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [776138.436174] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [776138.469666] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.44.224@o2ib (216): c: 32, oc: 0, rc: 32 [776162.529137] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899eb9e6c800, cur 1591458261 expire 1591458111 last 1591458034 [776176.437557] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [776176.471070] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 6 previous similar messages [776176.504561] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.29.201@o2ib (304): c: 32, oc: 0, rc: 32 [776176.545482] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 6 previous similar messages [776536.997881] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [776536.997887] Lustre: Skipped 750 previous similar messages [776763.549144] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8999a97fa000, cur 1591458862 expire 1591458712 last 1591458635 [777138.139866] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [777138.139871] Lustre: Skipped 2 previous similar messages [777364.570348] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89982ea3ec00, cur 1591459463 expire 1591459313 last 1591459236 [777739.282739] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [777739.282744] Lustre: Skipped 2 previous similar messages [777965.593310] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8997e27c0c00, cur 1591460064 expire 1591459914 last 1591459837 [778340.432274] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [778340.432278] Lustre: Skipped 162 previous similar messages [778566.615773] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89990864c000, cur 1591460665 expire 1591460515 last 1591460438 [778941.567053] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [778941.567058] Lustre: Skipped 420 previous similar messages [779167.636363] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8995c87bbc00, cur 1591461266 expire 1591461116 last 1591461039 [779542.709333] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [779542.709338] Lustre: Skipped 64 previous similar messages [779769.659396] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899a7babb000, cur 1591461868 expire 1591461718 last 1591461641 [780143.858112] Lustre: nbp8-MDT0000: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [780143.858117] Lustre: Skipped 166 previous similar messages [780370.681667] Lustre: nbp8-MDT0000: haven't heard from client f5088247-ee6a-eac8-a291-6310d22c7ff1 (at 10.149.2.243@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899a41389000, cur 1591462469 expire 1591462319 last 1591462242 [780635.601195] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [780635.634692] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 15 previous similar messages [780635.668477] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.19.64@o2ib (304): c: 32, oc: 0, rc: 32 [780635.709113] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 15 previous similar messages [780808.498291] Lustre: MGS: Connection restored to b40dfaab-4365-2bca-92d7-86e92d97e282 (at 10.149.12.135@o2ib313) [780808.498296] Lustre: Skipped 72 previous similar messages [781474.367359] Lustre: MGS: Connection restored to e8f4e8a8-72a7-0b09-4f97-fe721a271298 (at 10.151.7.84@o2ib) [781474.367365] Lustre: Skipped 97 previous similar messages [782138.151934] Lustre: MGS: Connection restored to 725a4aad-5d7e-55bf-4982-56e010691922 (at 10.149.2.103@o2ib313) [782138.151940] Lustre: Skipped 117 previous similar messages [782839.146461] Lustre: MGS: Connection restored to f6528e1a-d015-a5e3-2341-480d93b8d7e9 (at 10.149.2.171@o2ib313) [782839.146467] Lustre: Skipped 103 previous similar messages [783446.213440] Lustre: MGS: Connection restored to 5a1bd894-6a9e-b499-18b5-4283cac1ef05 (at 10.149.15.64@o2ib313) [783446.213444] Lustre: Skipped 91 previous similar messages [784102.542305] Lustre: MGS: Connection restored to 910200e3-9930-8adf-07ac-b5d101c94778 (at 10.151.0.49@o2ib) [784102.542310] Lustre: Skipped 129 previous similar messages [784703.069354] Lustre: MGS: Connection restored to 21592ca1-3391-2c3b-91c2-421101bd02d5 (at 10.151.19.101@o2ib) [784703.069359] Lustre: Skipped 85 previous similar messages [784951.851840] Lustre: MGS: haven't heard from client ca238f1e-7994-d78d-29fc-fd124dc817d1 (at 10.151.55.142@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897dcdb9f000, cur 1591467050 expire 1591466900 last 1591466823 [785029.762590] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [785029.796088] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [785029.829295] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.55.142@o2ib (305): c: 30, oc: 0, rc: 32 [785029.870217] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [785304.350814] Lustre: MGS: Connection restored to cad6bcce-e2f5-1576-7106-09acc732b2c5 (at 10.149.3.208@o2ib313) [785304.350820] Lustre: Skipped 159 previous similar messages [786177.550305] Lustre: MGS: Connection restored to 63e8bfe9-92f2-d53a-4cbd-8eb2fc535fc1 (at 10.151.28.129@o2ib) [786177.550311] Lustre: Skipped 137 previous similar messages [786876.249143] Lustre: MGS: Connection restored to 24692e94-be31-b331-d445-37c1bc22c97c (at 10.149.5.52@o2ib313) [786876.249149] Lustre: Skipped 127 previous similar messages [787713.113427] Lustre: MGS: Connection restored to 9d8d47e3-a497-da31-824b-75ab876b2e39 (at 10.151.32.97@o2ib) [787713.113433] Lustre: Skipped 17 previous similar messages [788331.137470] Lustre: MGS: Connection restored to 2794ba05-3ba7-b86d-6d5f-dd2e7c651287 (at 10.151.8.60@o2ib) [788331.137474] Lustre: Skipped 117 previous similar messages [788998.112026] Lustre: MGS: Connection restored to 8232d438-c839-0474-9f07-2a8041482e6b (at 10.151.14.30@o2ib) [788998.112032] Lustre: Skipped 11 previous similar messages [789674.029693] Lustre: MGS: Connection restored to aba2a613-b68b-09be-e68e-7bafd38a7258 (at 10.151.6.29@o2ib) [789674.029698] Lustre: Skipped 89 previous similar messages [790402.624957] Lustre: MGS: Connection restored to ff64177c-aa2d-aa11-0879-11350a394770 (at 10.151.53.202@o2ib) [790402.624962] Lustre: Skipped 115 previous similar messages [791054.307070] Lustre: MGS: Connection restored to 211389ea-0dfb-aa86-f6f0-9be827bab283 (at 10.141.3.19@o2ib417) [791054.307075] Lustre: Skipped 263 previous similar messages [791934.975849] Lustre: MGS: Connection restored to 30e3f430-d12a-d600-bc06-3eb38d111698 (at 10.151.35.59@o2ib) [791934.975854] Lustre: Skipped 123 previous similar messages [792559.542638] Lustre: MGS: Connection restored to c2bf2336-1e0a-a6ae-897e-af927c56651b (at 10.151.3.34@o2ib) [792559.542643] Lustre: Skipped 25 previous similar messages [793159.575841] Lustre: MGS: Connection restored to 5e44c96a-f56a-7fa8-f681-517306800fb7 (at 10.151.32.89@o2ib) [793159.575848] Lustre: Skipped 566 previous similar messages [793800.533805] Lustre: MGS: Connection restored to ccd15da9-d608-ddb8-40db-adecfdc23367 (at 10.149.14.164@o2ib313) [793800.533811] Lustre: Skipped 34 previous similar messages [794626.105393] Lustre: MGS: Connection restored to cc252453-0548-50bb-379e-c336bdd8d4ee (at 10.151.7.100@o2ib) [794626.105399] Lustre: Skipped 499 previous similar messages [795463.181087] Lustre: MGS: Connection restored to 942d078d-6c85-1232-7a9a-961ec51b4e94 (at 10.151.33.91@o2ib) [795463.181092] Lustre: Skipped 17 previous similar messages [796244.095440] Lustre: MGS: Connection restored to 5e89f64c-ef31-49ad-a872-706f27ed46b2 (at 10.151.39.115@o2ib) [796244.095446] Lustre: Skipped 135 previous similar messages [797152.468236] Lustre: MGS: Connection restored to 6ff8a968-d2d0-7df8-0b24-d55058a692df (at 10.151.34.139@o2ib) [797152.468241] Lustre: Skipped 95 previous similar messages [797851.183660] Lustre: MGS: Connection restored to faa4e121-4e1d-8866-120a-043914d845c4 (at 10.149.8.188@o2ib313) [797851.183665] Lustre: Skipped 69 previous similar messages [798504.457012] Lustre: MGS: Connection restored to 25a90e87-6a1d-ed04-7c8e-9434dd558de9 (at 10.141.6.119@o2ib417) [798504.457016] Lustre: Skipped 103 previous similar messages [799115.225358] Lustre: MGS: Connection restored to 9bf2d960-7966-81de-5dfb-17af3e0919f2 (at 10.149.3.12@o2ib313) [799115.225364] Lustre: Skipped 129 previous similar messages [799745.768353] Lustre: MGS: Connection restored to bafe10d2-8f70-a6f1-70f3-405dbc716b4c (at 10.149.1.126@o2ib313) [799745.768358] Lustre: Skipped 25 previous similar messages [800543.967625] Lustre: MGS: Connection restored to 133f3770-473d-e3c6-89ce-161fdfc0ba11 (at 10.151.23.16@o2ib) [800543.967630] Lustre: Skipped 97 previous similar messages [801428.326230] Lustre: MGS: Connection restored to 8bc85b91-6bff-9b42-7099-4cb0f1f4fb9d (at 10.151.34.49@o2ib) [801428.326234] Lustre: Skipped 9 previous similar messages [802305.091615] Lustre: MGS: Connection restored to 443110a3-63b9-7535-fa5c-56dfc2ddb97e (at 10.151.32.134@o2ib) [802305.091619] Lustre: Skipped 603 previous similar messages [803511.691908] Lustre: MGS: Connection restored to 2e5ebb1e-8c4a-fbdb-472f-faa4f07f0c76 (at 10.151.32.151@o2ib) [803511.691914] Lustre: Skipped 107 previous similar messages [804179.828379] Lustre: MGS: Connection restored to 8582ac40-d525-63e0-84fd-093e991e5483 (at 10.151.28.178@o2ib) [804179.828383] Lustre: Skipped 63 previous similar messages [804784.790136] Lustre: MGS: Connection restored to 4415ef8a-0397-e3eb-7f05-618e17a96011 (at 10.151.3.30@o2ib) [804784.790141] Lustre: Skipped 25 previous similar messages [805934.408601] Lustre: MGS: Connection restored to 1da6cc0c-9c54-1268-ba44-fdacb9d7a6fb (at 10.149.14.35@o2ib313) [805934.408607] Lustre: Skipped 1407 previous similar messages [806341.544122] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [806341.577618] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.8.53@o2ib (303): c: 32, oc: 0, rc: 32 [806849.786757] Lustre: MGS: Connection restored to a5e9d159-ad48-5bfb-81d2-e629e4c7a897 (at 10.151.29.140@o2ib) [806849.786763] Lustre: Skipped 85 previous similar messages [807587.626559] Lustre: MGS: Connection restored to dcfd0b54-b10d-0770-0097-5d4bdeca7854 (at 10.149.12.54@o2ib313) [807587.626565] Lustre: Skipped 1 previous similar message [808252.434248] Lustre: MGS: Connection restored to 1195a48a-0226-981c-b28a-468821b658d2 (at 10.149.1.251@o2ib313) [808252.434254] Lustre: Skipped 197 previous similar messages [808902.023354] Lustre: MGS: Connection restored to 29dde62b-0e2e-476a-1bb6-8c361754d9fe (at 10.149.2.199@o2ib313) [808902.023359] Lustre: Skipped 71 previous similar messages [809502.370473] Lustre: nbp8-MDT0000: Connection restored to 8d656d57-d47c-80cf-4bbb-c9a5d873fa0a (at 10.151.30.137@o2ib) [809502.370479] Lustre: Skipped 178 previous similar messages [810130.933139] Lustre: MGS: Connection restored to 1033c834-76df-7d34-08e9-a795125ca97a (at 10.149.15.226@o2ib313) [810130.933144] Lustre: Skipped 50 previous similar messages [810748.749260] Lustre: MGS: Connection restored to fa2222c3-01e0-3680-b841-850900d4a820 (at 10.151.19.235@o2ib) [810748.749266] Lustre: Skipped 101 previous similar messages [811461.648593] Lustre: MGS: Connection restored to b2eb5b70-a542-24bb-3be3-2e81799cef3a (at 10.151.32.94@o2ib) [811461.648599] Lustre: Skipped 79 previous similar messages [812919.551920] Lustre: MGS: Connection restored to a862ef13-7814-edd8-2b03-d4b93f1617e5 (at 10.151.7.80@o2ib) [812919.551926] Lustre: Skipped 561 previous similar messages [813034.483852] Lustre: MGS: Connection restored to 29dde62b-0e2e-476a-1bb6-8c361754d9fe (at 10.149.2.199@o2ib313) [813034.483858] Lustre: Skipped 3 previous similar messages [813366.565502] Lustre: MGS: Connection restored to 9ef084c1-6771-5f59-ff3a-5123ea137411 (at 10.151.3.33@o2ib) [813366.565507] Lustre: Skipped 1 previous similar message [813617.810972] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [813617.844472] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.35.17@o2ib (286): c: 32, oc: 0, rc: 32 [813632.901578] Lustre: nbp8-MDT0000: haven't heard from client 5db9023b-2fe1-401e-cffb-cf2733d05257 (at 10.141.6.150@o2ib417) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8998c3bda400, cur 1591495730 expire 1591495580 last 1591495503 [813632.974825] Lustre: Skipped 1 previous similar message [813724.408982] Lustre: MGS: Connection restored to c92768bf-630d-f9ca-d6e2-fbc3352df78d (at 10.153.12.151@o2ib233) [813724.408988] Lustre: Skipped 111 previous similar messages [814599.854337] Lustre: MGS: Connection restored to 0b32930f-5a52-9154-5d52-110613ffb4cd (at 10.151.38.134@o2ib) [814599.854342] Lustre: Skipped 107 previous similar messages [815300.245196] Lustre: MGS: Connection restored to 3e2f9598-ae80-f1b9-c72c-a3c10ef8d3de (at 10.149.2.8@o2ib313) [815300.245201] Lustre: Skipped 351 previous similar messages [816094.508475] Lustre: MGS: Connection restored to 5616c354-7e95-50a2-6df0-b8a0f849a3dc (at 10.153.15.202@o2ib233) [816094.508481] Lustre: Skipped 123 previous similar messages [816729.547160] Lustre: MGS: Connection restored to f0d26fa4-6120-9234-59ad-9932e2012c4c (at 10.151.36.28@o2ib) [816729.547166] Lustre: Skipped 77 previous similar messages [817331.179991] Lustre: MGS: Connection restored to bd67eebe-76f4-ceaa-162d-5088c609a2f0 (at 10.141.7.59@o2ib417) [817331.179996] Lustre: Skipped 241 previous similar messages [818130.213436] Lustre: MGS: Connection restored to 491e00b0-454d-4700-45fb-c848daf07cd7 (at 10.151.37.184@o2ib) [818130.213441] Lustre: Skipped 19 previous similar messages [818890.257826] Lustre: MGS: Connection restored to 13654f68-e174-2c90-c31f-422461ad0e2a (at 10.151.37.121@o2ib) [818890.257831] Lustre: Skipped 133 previous similar messages [819020.008920] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [819020.042418] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.4.174@o2ib (303): c: 32, oc: 0, rc: 32 [819527.657866] Lustre: MGS: Connection restored to b4d5e6b8-1c3a-f060-dc2e-ca3910a48683 (at 10.149.3.188@o2ib313) [819527.657872] Lustre: Skipped 3 previous similar messages [820141.882472] Lustre: MGS: Connection restored to 988c0b4a-ae53-5426-7f55-8c077367cee4 (at 10.151.57.153@o2ib) [820141.882478] Lustre: Skipped 213 previous similar messages [821076.269895] Lustre: MGS: Connection restored to 4b4d762e-5eca-07d3-7568-e2d079f1af4a (at 10.149.14.16@o2ib313) [821076.269900] Lustre: Skipped 203 previous similar messages [821805.058730] Lustre: MGS: Connection restored to e8f4e8a8-72a7-0b09-4f97-fe721a271298 (at 10.151.7.84@o2ib) [821805.058736] Lustre: Skipped 199 previous similar messages [821818.201373] Lustre: MGS: haven't heard from client 6c2a06c0-85b4-75e6-0f5e-040bede80674 (at 10.153.14.21@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897ed577a800, cur 1591503915 expire 1591503765 last 1591503688 [821818.272045] Lustre: Skipped 3 previous similar messages [821821.202367] Lustre: nbp8-MDT0000: haven't heard from client 14674ca2-b855-0dee-ad73-bc9313c48572 (at 10.153.14.21@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8979d2a7f400, cur 1591503918 expire 1591503768 last 1591503691 [822285.218926] Lustre: nbp8-MDT0000: haven't heard from client 5e53b75d-4a97-e685-aea3-8f97d2e843bc (at 10.149.7.101@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89991bfbe000, cur 1591504382 expire 1591504232 last 1591504155 [822427.733588] Lustre: MGS: Connection restored to 3666d695-6ca9-cc0a-778c-3a059058c564 (at 10.151.37.178@o2ib) [822427.733593] Lustre: Skipped 249 previous similar messages [822862.150786] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [822862.184286] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.3.34@o2ib (303): c: 32, oc: 0, rc: 32 [823050.952980] Lustre: MGS: Connection restored to 138589c5-f104-aa41-5216-c6bd7db40b17 (at 10.151.14.240@o2ib) [823050.952985] Lustre: Skipped 129 previous similar messages [823440.261842] Lustre: MGS: haven't heard from client 28b5b0c8-f6f7-5d85-50ea-a16e3f082b90 (at 10.153.10.75@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897427f9fc00, cur 1591505537 expire 1591505387 last 1591505310 [823440.332515] Lustre: Skipped 1 previous similar message [823444.260237] Lustre: nbp8-MDT0000: haven't heard from client 29cf79ca-956b-beea-627c-b9bb9fa0aa8b (at 10.153.10.75@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8995e93d9400, cur 1591505541 expire 1591505391 last 1591505314 [823856.276460] Lustre: MGS: haven't heard from client 2d0baf40-62c9-6061-df93-6a398a4605f4 (at 10.153.10.75@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8975692d8400, cur 1591505953 expire 1591505803 last 1591505726 [823892.478058] Lustre: MGS: Connection restored to 5df1551a-3e8d-1005-12a6-30d0edf22592 (at 10.151.3.59@o2ib) [823892.478063] Lustre: Skipped 29 previous similar messages [824496.515688] Lustre: MGS: Connection restored to a332a380-3506-934a-3623-9e733ae168d3 (at 10.151.10.146@o2ib) [824496.515700] Lustre: Skipped 91 previous similar messages [825217.110020] Lustre: MGS: Connection restored to 87ebd042-39f9-e677-b796-d73f2892d807 (at 10.151.14.63@o2ib) [825217.110026] Lustre: Skipped 199 previous similar messages [825880.187433] Lustre: MGS: Connection restored to f4af1ce1-6c7d-5947-dbc8-0bc30ca449f3 (at 10.149.1.142@o2ib313) [825880.187438] Lustre: Skipped 227 previous similar messages [826493.777154] Lustre: MGS: Connection restored to 60988744-2fe6-e055-860b-b9fa4ff4ee1a (at 10.151.4.51@o2ib) [826493.777160] Lustre: Skipped 37 previous similar messages [826568.379021] Lustre: MGS: haven't heard from client 69d6f536-3526-a021-32ac-ed81dda9afe7 (at 10.153.10.75@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8973b26b7400, cur 1591508665 expire 1591508515 last 1591508438 [826568.449699] Lustre: Skipped 1 previous similar message [826579.377068] Lustre: nbp8-MDT0000: haven't heard from client 820042a2-746f-51f5-2e83-3cbfa9be7f14 (at 10.153.10.75@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8998cf214c00, cur 1591508676 expire 1591508526 last 1591508449 [827067.399366] Lustre: MGS: haven't heard from client 7b0bae90-2477-09f0-909e-602fb4a22e1f (at 10.153.10.75@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8978eee8fc00, cur 1591509164 expire 1591509014 last 1591508937 [827073.395415] Lustre: nbp8-MDT0000: haven't heard from client d1f9e160-512d-101e-22eb-7ed4cb571811 (at 10.153.10.75@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897259558800, cur 1591509170 expire 1591509020 last 1591508943 [827122.050301] Lustre: MGS: Connection restored to 8be4b7a9-5469-7c65-bdd3-18ac7788f916 (at 10.149.9.134@o2ib313) [827122.050306] Lustre: Skipped 87 previous similar messages [827752.420114] Lustre: nbp8-MDT0000: haven't heard from client c34dd928-0c77-0b60-c966-bd3b83d5c9ce (at 10.153.10.75@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8973ace5a800, cur 1591509849 expire 1591509699 last 1591509622 [827820.095217] Lustre: MGS: Connection restored to f2dbfe86-725f-3d26-3e81-5cc75f88dcf4 (at 10.153.10.75@o2ib233) [827820.095223] Lustre: Skipped 39 previous similar messages [828204.442551] Lustre: MGS: haven't heard from client 3d612942-ead7-078e-ac10-d3b4635f8791 (at 10.153.10.75@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897dfe0c1400, cur 1591510301 expire 1591510151 last 1591510074 [828204.513224] Lustre: Skipped 1 previous similar message [828423.908915] Lustre: MGS: Connection restored to 094c0bec-800a-780a-9bfb-278ba5fca626 (at 10.141.2.239@o2ib417) [828423.908920] Lustre: Skipped 157 previous similar messages [829051.964949] Lustre: MGS: Connection restored to 243af763-4093-c57a-75c4-c673954fa3ef (at 10.151.50.101@o2ib) [829051.964955] Lustre: Skipped 247 previous similar messages [829780.468426] Lustre: MGS: Connection restored to cd9c588b-69d3-d7c9-d80e-88c341548cdd (at 10.151.4.148@o2ib) [829780.468432] Lustre: Skipped 11 previous similar messages [829942.502307] Lustre: MGS: haven't heard from client a554b572-2d5e-45d9-4be3-d8d92bcc99f4 (at 10.153.10.75@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897d41f87000, cur 1591512039 expire 1591511889 last 1591511812 [829942.572983] Lustre: Skipped 1 previous similar message [829962.502935] Lustre: nbp8-MDT0000: haven't heard from client 982ee888-2cd2-c130-48b2-c044a6b8a28e (at 10.153.10.75@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8974cd08c800, cur 1591512059 expire 1591511909 last 1591511832 [830615.151669] Lustre: MGS: Connection restored to 6e886fa9-7e06-aafe-c831-bb67910317ad (at 10.149.14.67@o2ib313) [830615.151675] Lustre: Skipped 157 previous similar messages [831265.373734] Lustre: MGS: Connection restored to 17d9c427-09ba-0ad3-81da-b30ed712b48a (at 10.149.15.99@o2ib313) [831265.373740] Lustre: Skipped 121 previous similar messages [831945.703188] Process accounting resumed [831983.999876] Lustre: MGS: Connection restored to 2b0a0f6c-c528-064e-eb8a-7607672b8d2b (at 10.149.11.203@o2ib313) [831983.999881] Lustre: Skipped 567 previous similar messages [832053.586970] Lustre: MGS: haven't heard from client 35784539-5806-2719-a69d-ad7c5d7a5687 (at 10.153.10.75@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897bf8aeec00, cur 1591514150 expire 1591514000 last 1591513923 [832056.578471] Lustre: nbp8-MDT0000: haven't heard from client 1a9dd0e7-8d3b-a7ff-dc0f-b34224989758 (at 10.153.10.75@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8972c427f800, cur 1591514153 expire 1591514003 last 1591513926 [832546.610343] Lustre: nbp8-MDT0000: haven't heard from client bbe441d0-f037-e3a5-08ed-6ccc7eef4734 (at 10.153.10.75@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89986ef1f800, cur 1591514643 expire 1591514493 last 1591514416 [832556.506936] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [832556.540429] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.46.31@o2ib (287): c: 32, oc: 0, rc: 32 [832600.934272] Lustre: MGS: Connection restored to f2dbfe86-725f-3d26-3e81-5cc75f88dcf4 (at 10.153.10.75@o2ib233) [832600.934278] Lustre: Skipped 121 previous similar messages [833212.632293] Lustre: nbp8-MDT0000: haven't heard from client ac8ea650-f7cd-3fe3-0f15-cdee1a837ab4 (at 10.153.10.75@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899f1ceef000, cur 1591515309 expire 1591515159 last 1591515082 [833212.705566] Lustre: Skipped 1 previous similar message [833264.520116] Lustre: MGS: Connection restored to 77da37a1-42ce-36f4-b72d-bfc801af5a73 (at 10.151.55.177@o2ib) [833264.520122] Lustre: Skipped 993 previous similar messages [833660.637689] Lustre: MGS: haven't heard from client 4ce287b7-0a7b-5092-1b61-d93db81fa35a (at 10.153.10.75@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8973b9751000, cur 1591515757 expire 1591515607 last 1591515530 [833660.708363] Lustre: Skipped 1 previous similar message [833662.641327] Lustre: nbp8-MDT0000: haven't heard from client 7a0a78ec-e45a-888b-99f0-7e6d44e6690f (at 10.153.10.75@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899f09b27800, cur 1591515759 expire 1591515609 last 1591515532 [833845.643226] Lustre: nbp8-MDT0000: haven't heard from client b180900f-b248-ae1c-b386-1ff8a4b365f3 (at 10.149.8.186@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89976ef70000, cur 1591515942 expire 1591515792 last 1591515715 [833906.094159] Lustre: MGS: Connection restored to a163dd3d-7f9b-7d99-2f5c-c9efd7a98296 (at 10.151.29.143@o2ib) [833906.094165] Lustre: Skipped 17 previous similar messages [834038.653014] Lustre: nbp8-MDT0000: haven't heard from client 894250da-2408-ef85-bb8f-49548612ada9 (at 10.151.30.200@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff898159f9a400, cur 1591516135 expire 1591515985 last 1591515908 [834038.725695] Lustre: Skipped 1 previous similar message [834114.660260] Lustre: MGS: haven't heard from client 35fe66b8-834f-71a9-573e-71e4e26ec746 (at 10.151.49.246@o2ib) in 193 seconds. I think it's dead, and I am evicting it. exp ffff897ad6285c00, cur 1591516211 expire 1591516061 last 1591516018 [834114.730367] Lustre: Skipped 253 previous similar messages [834120.565329] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [834120.598832] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.50.215@o2ib (304): c: 30, oc: 0, rc: 32 [834121.564385] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [834121.597892] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [834121.631088] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.46.231@o2ib (308): c: 30, oc: 0, rc: 32 [834121.672018] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [834123.564451] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [834123.597956] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 5 previous similar messages [834123.631442] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.50.221@o2ib (306): c: 30, oc: 0, rc: 32 [834123.672362] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 5 previous similar messages [834126.565565] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [834126.599073] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 4 previous similar messages [834126.632560] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.50.227@o2ib (310): c: 30, oc: 0, rc: 32 [834126.673480] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 4 previous similar messages [834131.564766] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [834131.598276] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 11 previous similar messages [834131.632041] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.50.237@o2ib (313): c: 30, oc: 0, rc: 32 [834131.672962] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 11 previous similar messages [834141.566113] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [834141.599614] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 19 previous similar messages [834141.633387] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.51.203@o2ib (327): c: 30, oc: 0, rc: 32 [834141.674301] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 19 previous similar messages [834158.565841] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [834158.599349] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 35 previous similar messages [834158.633123] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.49.245@o2ib (344): c: 30, oc: 0, rc: 32 [834158.674036] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 35 previous similar messages [834260.569473] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [834260.602987] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 44 previous similar messages [834260.636768] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.49.246@o2ib (339): c: 30, oc: 0, rc: 32 [834260.677689] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 44 previous similar messages [834533.669011] Lustre: MGS: haven't heard from client d05ed8be-6d40-c343-fbf6-38eddcbe82f3 (at 10.153.10.75@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89998372cc00, cur 1591516630 expire 1591516480 last 1591516403 [834533.739686] Lustre: Skipped 1 previous similar message [834617.766711] Lustre: MGS: Connection restored to f2dbfe86-725f-3d26-3e81-5cc75f88dcf4 (at 10.153.10.75@o2ib233) [834617.766716] Lustre: Skipped 285 previous similar messages [835329.112622] Lustre: MGS: Connection restored to dab42435-195d-fafe-378b-c03775fd84c7 (at 10.149.15.238@o2ib313) [835329.112628] Lustre: Skipped 419 previous similar messages [836089.720721] Lustre: MGS: Connection restored to 738a9610-a5c3-d274-f637-23b151710368 (at 10.151.44.180@o2ib) [836089.720727] Lustre: Skipped 61 previous similar messages [836112.637551] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [836112.671046] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.44.181@o2ib (267): c: 32, oc: 0, rc: 32 [836671.749517] Lustre: MGS: haven't heard from client 6effbeb0-96c5-a7a9-f165-d66b65a1e7d6 (at 10.153.10.75@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89727be67800, cur 1591518768 expire 1591518618 last 1591518541 [836671.820203] Lustre: Skipped 1 previous similar message [836694.748817] Lustre: nbp8-MDT0000: haven't heard from client db938ac1-24bb-e8fe-d0b9-164efc3ba62f (at 10.153.10.75@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8979d2666400, cur 1591518791 expire 1591518641 last 1591518564 [836754.978257] Lustre: MGS: Connection restored to f2dbfe86-725f-3d26-3e81-5cc75f88dcf4 (at 10.153.10.75@o2ib233) [836754.978262] Lustre: Skipped 9 previous similar messages [837369.784811] Lustre: MGS: Connection restored to 8b643904-a157-6433-e853-1f1d71b54be9 (at 10.141.3.184@o2ib417) [837369.784815] Lustre: Skipped 29 previous similar messages [837975.088868] Lustre: MGS: Connection restored to 8582ac40-d525-63e0-84fd-093e991e5483 (at 10.151.28.178@o2ib) [837975.088874] Lustre: Skipped 643 previous similar messages [838727.826757] Lustre: MGS: haven't heard from client f227e2a9-7525-ce6c-91b3-5ecb3d7a69db (at 10.153.10.75@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899e96e39000, cur 1591520824 expire 1591520674 last 1591520597 [838729.823375] Lustre: nbp8-MDT0000: haven't heard from client 2b7e8877-27e3-2984-51ca-3e6d5b522f33 (at 10.153.10.75@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a3e3799c00, cur 1591520826 expire 1591520676 last 1591520599 [838790.232150] Lustre: MGS: Connection restored to f2dbfe86-725f-3d26-3e81-5cc75f88dcf4 (at 10.153.10.75@o2ib233) [838790.232156] Lustre: Skipped 845 previous similar messages [839176.842900] Lustre: MGS: haven't heard from client d601ea70-3ecb-6322-e883-9ac505957021 (at 10.153.10.75@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8976aca40800, cur 1591521273 expire 1591521123 last 1591521046 [839216.840595] Lustre: nbp8-MDT0000: haven't heard from client 5d0b2a47-cd09-21ce-4130-a76cacf1e164 (at 10.153.10.75@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897aa1659400, cur 1591521313 expire 1591521163 last 1591521086 [839592.596783] Lustre: MGS: Connection restored to 5b3c44f0-ea6b-865f-628f-2384b2ae481c (at 10.151.37.26@o2ib) [839592.596789] Lustre: Skipped 49 previous similar messages [839882.865420] Lustre: nbp8-MDT0000: haven't heard from client f9515378-517d-f0b0-7d00-77a36f2f1d1a (at 10.153.10.75@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a060b8bc00, cur 1591521979 expire 1591521829 last 1591521752 [839883.870041] Lustre: MGS: haven't heard from client 61d3bcdb-1e1f-1bb0-108e-a8a11e9250a5 (at 10.153.10.75@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897c1bfdb400, cur 1591521980 expire 1591521830 last 1591521753 [840287.710315] Lustre: MGS: Connection restored to 523c2977-4441-5295-2034-963785195b28 (at 10.149.4.158@o2ib313) [840287.710321] Lustre: Skipped 67 previous similar messages [840328.885000] Lustre: MGS: haven't heard from client 87ffdf24-9e90-9495-fe4c-5b7952ebea9b (at 10.153.10.75@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8973b8badc00, cur 1591522425 expire 1591522275 last 1591522198 [840771.899888] Lustre: MGS: haven't heard from client 1e483f20-1a83-dc65-2122-60592389ed39 (at 10.153.10.75@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899f257ff800, cur 1591522868 expire 1591522718 last 1591522641 [840771.970612] Lustre: Skipped 1 previous similar message [841215.915924] Lustre: MGS: haven't heard from client 7d124b28-c4ff-5070-a2f2-e9e444513e2a (at 10.153.10.75@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8974c6b9d800, cur 1591523312 expire 1591523162 last 1591523085 [841215.986612] Lustre: Skipped 1 previous similar message [841228.009121] Lustre: MGS: Connection restored to 897f8d54-9b6f-d64d-e056-fc0ab9cab716 (at 10.141.6.153@o2ib417) [841228.009127] Lustre: Skipped 335 previous similar messages [841236.915194] Lustre: nbp8-MDT0000: haven't heard from client 5dc9ea23-f152-9b5e-1aa7-12f274afa22a (at 10.153.10.75@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a2ec621400, cur 1591523333 expire 1591523183 last 1591523106 [842309.522159] Lustre: MGS: Connection restored to e6185ddf-437c-969c-0b90-788a8b525d69 (at 10.149.8.25@o2ib313) [842309.522165] Lustre: Skipped 185 previous similar messages [842677.980531] Lustre: MGS: haven't heard from client a8d11cb7-3a26-6c0e-77f1-3f1cf9c99f66 (at 10.153.10.75@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8973f0f3a800, cur 1591524774 expire 1591524624 last 1591524547 [842695.967171] Lustre: nbp8-MDT0000: haven't heard from client 6c165332-4bd0-6774-14f2-9b5653ecffb0 (at 10.153.10.75@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899e51ee9400, cur 1591524792 expire 1591524642 last 1591524565 [843160.901840] Lustre: MGS: Connection restored to 60f0dc2d-2480-f061-6991-38428889933e (at 10.151.34.78@o2ib) [843160.901847] Lustre: Skipped 113 previous similar messages [843907.987151] Lustre: MGS: Connection restored to 63538d16-360f-fddc-c990-7ef1fa02325a (at 10.151.12.138@o2ib) [843907.987157] Lustre: Skipped 113 previous similar messages [844736.614113] Lustre: MGS: Connection restored to 1f07c8b7-6abd-85c1-ed26-4fbcfbed3661 (at 10.151.36.85@o2ib) [844736.614122] Lustre: Skipped 305 previous similar messages [844767.043308] Lustre: MGS: haven't heard from client 2b3b119f-7c04-50cd-a53f-8671324fdc91 (at 10.153.10.75@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897988b97c00, cur 1591526863 expire 1591526713 last 1591526636 [844772.043139] Lustre: nbp8-MDT0000: haven't heard from client de7762ec-eb9b-e2eb-0484-85afea083831 (at 10.153.10.75@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a321f61000, cur 1591526868 expire 1591526718 last 1591526641 [845262.064325] Lustre: MGS: haven't heard from client adf78666-a406-cc61-ee93-a43c6c6e2c82 (at 10.153.10.75@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897dfacddc00, cur 1591527358 expire 1591527208 last 1591527131 [845267.062113] Lustre: nbp8-MDT0000: haven't heard from client 573954bd-ca84-8cb4-b512-c63b22f5ede2 (at 10.153.10.75@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899ba82f5c00, cur 1591527363 expire 1591527213 last 1591527136 [845772.668574] Lustre: MGS: Connection restored to 3983c90b-b312-b0d0-11fc-72c1ad174b2e (at 10.151.51.112@o2ib) [845772.668581] Lustre: Skipped 43 previous similar messages [846045.090065] Lustre: nbp8-MDT0000: haven't heard from client e08798d3-21d6-cb88-d2aa-01a3c45c879f (at 10.153.10.75@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8974bba82c00, cur 1591528141 expire 1591527991 last 1591527914 [846047.091525] Lustre: MGS: haven't heard from client 9020a4bf-8be6-5e97-941f-d9aeb5ea81aa (at 10.153.10.75@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8973fc759c00, cur 1591528143 expire 1591527993 last 1591527916 [846376.898125] Lustre: MGS: Connection restored to 2d4062d4-c700-2bb0-1236-5b325eff9d43 (at 10.151.19.170@o2ib) [846376.898130] Lustre: Skipped 127 previous similar messages [846512.111520] Lustre: MGS: haven't heard from client 96cffe32-943d-c836-cbd5-c635c607ac32 (at 10.153.10.75@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8973e8b2d800, cur 1591528608 expire 1591528458 last 1591528381 [846555.109135] Lustre: nbp8-MDT0000: haven't heard from client 0c025bc4-03ff-f5b4-4004-77173e71ade7 (at 10.153.10.75@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8973e1b43000, cur 1591528651 expire 1591528501 last 1591528424 [846990.129580] Lustre: MGS: haven't heard from client bea59367-97f8-0f86-ecca-71336845551d (at 10.153.10.75@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8973b8baf800, cur 1591529086 expire 1591528936 last 1591528859 [847063.876567] Lustre: MGS: Connection restored to c76df0be-f603-1410-39ae-40c729c5bf25 (at 10.151.19.131@o2ib) [847063.876573] Lustre: Skipped 237 previous similar messages [847594.149469] Lustre: MGS: haven't heard from client a8035340-4b7a-82cb-d818-21276f8d7edb (at 10.153.10.75@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89725ff8e800, cur 1591529690 expire 1591529540 last 1591529463 [847594.220153] Lustre: Skipped 1 previous similar message [847688.752277] Lustre: MGS: Connection restored to 3b12227f-a05e-adf2-cd20-a28f4d16e51e (at 10.141.3.86@o2ib417) [847688.752283] Lustre: Skipped 255 previous similar messages [848485.628236] Lustre: MGS: Connection restored to 006c561a-6089-a435-1f4d-80a197fd5f30 (at 10.151.3.66@o2ib) [848485.628242] Lustre: Skipped 63 previous similar messages [848634.184936] Lustre: nbp8-MDT0000: haven't heard from client 6b37a993-3ec7-2e98-74fb-7ff94ffacbe8 (at 10.149.7.101@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a0023e8000, cur 1591530730 expire 1591530580 last 1591530503 [848634.258206] Lustre: Skipped 1 previous similar message [848647.187984] Lustre: MGS: haven't heard from client 65dd07a0-903f-3621-a628-f884afe932ba (at 10.149.7.101@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897ee59a0800, cur 1591530743 expire 1591530593 last 1591530516 [849126.204533] Lustre: MGS: haven't heard from client c4d68ab1-d1c3-69b9-8b38-a52377c6b830 (at 10.153.10.75@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8979023dec00, cur 1591531222 expire 1591531072 last 1591530995 [849195.748034] Lustre: MGS: Connection restored to f2dbfe86-725f-3d26-3e81-5cc75f88dcf4 (at 10.153.10.75@o2ib233) [849195.748040] Lustre: Skipped 25 previous similar messages [849944.769053] Lustre: MGS: Connection restored to 29dde62b-0e2e-476a-1bb6-8c361754d9fe (at 10.149.2.199@o2ib313) [849944.769058] Lustre: Skipped 1 previous similar message [850938.045675] Lustre: MGS: Connection restored to a2f6e70a-065c-41da-2264-f0e266e9bb2c (at 10.149.13.7@o2ib313) [850938.045680] Lustre: Skipped 57 previous similar messages [851254.280863] Lustre: MGS: haven't heard from client b31a2e4f-9bb9-4996-7722-7b7c1c3aa3e4 (at 10.153.10.75@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897c94fad000, cur 1591533350 expire 1591533200 last 1591533123 [851254.351556] Lustre: Skipped 1 previous similar message [851274.282133] Lustre: nbp8-MDT0000: haven't heard from client a9038eff-76ce-dfff-020a-850807da68d8 (at 10.153.10.75@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899e5d32bc00, cur 1591533370 expire 1591533220 last 1591533143 [851602.108103] Lustre: MGS: Connection restored to f2dbfe86-725f-3d26-3e81-5cc75f88dcf4 (at 10.153.10.75@o2ib233) [851602.108108] Lustre: Skipped 193 previous similar messages [851677.297309] Lustre: nbp8-MDT0000: haven't heard from client 21500cf3-3254-ae44-7b00-1dbc936dd343 (at 10.153.16.53@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899f41721c00, cur 1591533773 expire 1591533623 last 1591533546 [852304.152093] Lustre: MGS: Connection restored to b8e63df8-d5b9-9a4d-3bd9-09a74c928e4d (at 10.149.14.60@o2ib313) [852304.152098] Lustre: Skipped 37 previous similar messages [852507.327308] Lustre: MGS: haven't heard from client 519cb93e-222c-f7a9-4b7d-72b70552f332 (at 10.153.10.75@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8979d1ef1400, cur 1591534603 expire 1591534453 last 1591534376 [852507.398010] Lustre: Skipped 1 previous similar message [852530.327744] Lustre: nbp8-MDT0000: haven't heard from client d1cfcf14-ca44-e5b8-8879-7747604de4ad (at 10.153.10.75@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89996aa52c00, cur 1591534626 expire 1591534476 last 1591534399 [853363.328748] Lustre: MGS: Connection restored to c5376847-fb73-b364-bdfe-eb222098ae77 (at 10.151.52.239@o2ib) [853363.328754] Lustre: Skipped 95 previous similar messages [854037.911787] Lustre: MGS: Connection restored to 29dde62b-0e2e-476a-1bb6-8c361754d9fe (at 10.149.2.199@o2ib313) [854037.911793] Lustre: Skipped 179 previous similar messages [854142.391067] Lustre: MGS: haven't heard from client 9a3ff86e-8f94-e25f-4795-a9e08542432a (at 10.153.10.75@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897491f19400, cur 1591536238 expire 1591536088 last 1591536011 [854153.389038] Lustre: nbp8-MDT0000: haven't heard from client f7345dcf-1885-da9b-9acf-4a421144f1df (at 10.153.10.75@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899cc373a400, cur 1591536249 expire 1591536099 last 1591536022 [854640.273168] Lustre: MGS: Connection restored to 4abdd9e9-76e2-5e29-ceb9-4b3b2e7cf4ee (at 10.151.34.51@o2ib) [854640.273174] Lustre: Skipped 637 previous similar messages [855721.447642] Lustre: MGS: haven't heard from client 44ac407a-b5c8-c193-4e8e-7d809a741547 (at 10.153.10.75@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89778da3f800, cur 1591537817 expire 1591537667 last 1591537590 [855731.446321] Lustre: nbp8-MDT0000: haven't heard from client ab27d01b-184e-2acb-99d5-6b8df4d88ebc (at 10.153.10.75@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8974bc3f6400, cur 1591537827 expire 1591537677 last 1591537600 [855731.519686] LustreError: 6560:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) ### client (nid 10.153.10.75@o2ib233) failed to reply to blocking AST (req@ffff896b66ac1680 x1667959236700864 status 0 rc -5), evict it ns: mdt-nbp8-MDT0000_UUID lock: ffff899a98fb9b00/0xa22cee35ea47192d lrc: 4/0,0 mode: PR/PR res: [0x3608ac7ec:0x8f7a:0x0].0x0 bits 0x13/0x0 rrc: 5 type: IBT flags: 0x54a01400000020 nid: 10.153.10.75@o2ib233 remote: 0x221352e702dce30 expref: 1238 pid: 12642 timeout: 855956 lvb_type: 0 [855731.662828] LustreError: 138-a: nbp8-MDT0000: A client on nid 10.153.10.75@o2ib233 was evicted due to a lock blocking callback time out: rc -5 [855807.207085] Lustre: MGS: Connection restored to f2dbfe86-725f-3d26-3e81-5cc75f88dcf4 (at 10.153.10.75@o2ib233) [855807.207090] Lustre: Skipped 27 previous similar messages [856415.499425] Lustre: MGS: Connection restored to 014beae4-5dd5-a430-4c09-500be3b621b4 (at 10.149.15.25@o2ib313) [856415.499431] Lustre: Skipped 61 previous similar messages [856656.480997] Lustre: MGS: haven't heard from client 90205cd2-8812-2557-a8e8-8f948bac50e7 (at 10.153.10.75@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897ea4765400, cur 1591538752 expire 1591538602 last 1591538525 [856671.481048] Lustre: nbp8-MDT0000: haven't heard from client 1596fae1-cca4-b303-ab0b-c963a0761b4e (at 10.153.10.75@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8998acba6000, cur 1591538767 expire 1591538617 last 1591538540 [857049.033361] Lustre: MGS: Connection restored to 147c1b90-381e-dbdf-bfcc-1902826cfd60 (at 10.149.15.103@o2ib313) [857049.033367] Lustre: Skipped 73 previous similar messages [857094.497721] Lustre: MGS: haven't heard from client af2ed1ca-34ec-a501-1d40-5adf9c7267b6 (at 10.153.10.75@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8973c474c400, cur 1591539190 expire 1591539040 last 1591538963 [857108.496532] Lustre: nbp8-MDT0000: haven't heard from client ba0d1172-7041-f464-7da7-e7bf72896264 (at 10.153.10.75@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899ddaac6c00, cur 1591539204 expire 1591539054 last 1591538977 [857527.512368] Lustre: MGS: haven't heard from client 429dfb28-a3cb-2a45-d812-0d9f0af1fe3d (at 10.153.10.75@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8974d3f9fc00, cur 1591539623 expire 1591539473 last 1591539396 [857533.511290] LustreError: 14088:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) ### client (nid 10.153.10.75@o2ib233) failed to reply to blocking AST (req@ffff89a070fe8480 x1667959248056640 status 0 rc -5), evict it ns: mdt-nbp8-MDT0000_UUID lock: ffff897db57c9440/0xa22cee35f325778d lrc: 4/0,0 mode: PR/PR res: [0x3608ac7ec:0x8f79:0x0].0x0 bits 0x13/0x0 rrc: 5 type: IBT flags: 0x50200400000020 nid: 10.153.10.75@o2ib233 remote: 0x173a889b3357ce41 expref: 1235 pid: 14120 timeout: 857786 lvb_type: 0 [857533.654905] LustreError: 138-a: nbp8-MDT0000: A client on nid 10.153.10.75@o2ib233 was evicted due to a lock blocking callback time out: rc -5 [857943.342536] Lustre: MGS: Connection restored to e3c3387d-e8b9-9075-7486-e3fb23c7cf40 (at 10.149.15.115@o2ib313) [857943.342542] Lustre: Skipped 33 previous similar messages [858548.302923] Lustre: MGS: Connection restored to 541d213d-9f66-f832-9d24-8f99f662cf6b (at 10.149.11.24@o2ib313) [858548.302929] Lustre: Skipped 47 previous similar messages [858557.555870] Lustre: MGS: haven't heard from client e723067f-6874-3786-3233-b80aeca8dd25 (at 10.153.10.75@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897e532eb800, cur 1591540653 expire 1591540503 last 1591540426 [858557.626553] Lustre: Skipped 1 previous similar message [858574.549984] Lustre: nbp8-MDT0000: haven't heard from client 4834967d-007b-db2e-c6a1-41a6db04dc98 (at 10.153.10.75@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897eb2336800, cur 1591540670 expire 1591540520 last 1591540443 [858998.576587] Lustre: MGS: haven't heard from client 4ba0a430-2957-045d-7638-64979725ddc8 (at 10.153.10.75@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897aca650000, cur 1591541094 expire 1591540944 last 1591540867 [859013.567367] Lustre: nbp8-MDT0000: haven't heard from client 09b8bd7f-068f-1c67-b7d8-5d5cb63c5140 (at 10.153.10.75@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a00874d000, cur 1591541109 expire 1591540959 last 1591540882 [859451.584720] Lustre: MGS: haven't heard from client c330f86f-9ea8-9e02-0719-c21af57e2062 (at 10.153.10.75@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a056a47c00, cur 1591541547 expire 1591541397 last 1591541320 [859538.035905] Lustre: MGS: Connection restored to f2dbfe86-725f-3d26-3e81-5cc75f88dcf4 (at 10.153.10.75@o2ib233) [859538.035911] Lustre: Skipped 41 previous similar messages [859792.598799] Lustre: MGS: haven't heard from client ef313767-507a-20d6-6d4e-1750cacf1fab (at 10.153.10.75@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8975107b9400, cur 1591541888 expire 1591541738 last 1591541661 [859792.669474] Lustre: Skipped 1 previous similar message [860083.605092] Lustre: MGS: haven't heard from client 2cf489ca-fb0e-f018-afc3-b6e334769a0f (at 10.153.10.75@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897aa83a2800, cur 1591542179 expire 1591542029 last 1591541952 [860083.675765] Lustre: Skipped 1 previous similar message [860146.135099] Lustre: MGS: Connection restored to f2dbfe86-725f-3d26-3e81-5cc75f88dcf4 (at 10.153.10.75@o2ib233) [860146.135105] Lustre: Skipped 25 previous similar messages [860911.839514] Lustre: MGS: Connection restored to d516657b-9de0-109e-9cfe-3c1b46fd0f2e (at 10.149.4.244@o2ib313) [860911.839520] Lustre: Skipped 107 previous similar messages [861548.746402] Lustre: MGS: Connection restored to 4bda5cb9-0d35-f970-0420-5d72acc8483c (at 10.151.11.53@o2ib) [861548.746408] Lustre: Skipped 339 previous similar messages [862259.067221] Lustre: MGS: Connection restored to 29dde62b-0e2e-476a-1bb6-8c361754d9fe (at 10.149.2.199@o2ib313) [862259.067227] Lustre: Skipped 139 previous similar messages [862939.837474] Lustre: MGS: Connection restored to 5bb58f62-f765-9a15-3222-73e52ab289eb (at 10.149.12.142@o2ib313) [862939.837480] Lustre: Skipped 31 previous similar messages [863609.570509] Lustre: MGS: Connection restored to 672f5146-477f-c0ad-1db3-b32db52b737a (at 10.153.10.86@o2ib233) [863609.570515] Lustre: Skipped 171 previous similar messages [864433.467635] Lustre: MGS: Connection restored to 64bc51ad-3275-2eb9-38dd-991e9fdf0a9f (at 10.151.37.183@o2ib) [864433.467641] Lustre: Skipped 441 previous similar messages [865097.172924] Lustre: MGS: Connection restored to 404d1867-b89a-b5b3-71bb-2ad17d303f3d (at 10.151.32.113@o2ib) [865097.172929] Lustre: Skipped 131 previous similar messages [866068.524201] Lustre: MGS: Connection restored to 0a58e842-a386-d062-a9cd-7334e8560f65 (at 10.149.15.101@o2ib313) [866068.524208] Lustre: Skipped 79 previous similar messages [866671.549943] Lustre: MGS: Connection restored to 147c1b90-381e-dbdf-bfcc-1902826cfd60 (at 10.149.15.103@o2ib313) [866671.549949] Lustre: Skipped 301 previous similar messages [867507.895132] Lustre: MGS: Connection restored to 5b7c01e8-d13f-5021-5d72-341095a930f8 (at 10.141.6.116@o2ib417) [867507.895137] Lustre: Skipped 95 previous similar messages [868348.627906] Lustre: MGS: Connection restored to ae46beab-118b-7f4f-16e2-1d2947d3124c (at 10.151.43.16@o2ib) [868348.627912] Lustre: Skipped 141 previous similar messages [869582.013780] Lustre: MGS: Connection restored to 1cf494bc-d035-b3c5-7058-ec583d6c6a84 (at 10.149.2.190@o2ib313) [869582.013785] Lustre: Skipped 221 previous similar messages [869668.573009] Lustre: MGS: Connection restored to d249c73b-cbd2-d491-9d7b-fbfbb894c141 (at 10.149.1.157@o2ib313) [869668.573015] Lustre: Skipped 19 previous similar messages [869989.289096] Lustre: MGS: Connection restored to 885c1fc2-aaa6-396a-aca3-dcf27edfc5a9 (at 10.151.47.28@o2ib) [869989.289102] Lustre: Skipped 51 previous similar messages [870377.168004] Lustre: MGS: Connection restored to 11a76c06-767e-4805-9c13-08a564dddcaa (at 10.151.7.113@o2ib) [870377.168010] Lustre: Skipped 11 previous similar messages [871163.660303] Lustre: MGS: Connection restored to ae2ce2d7-609c-31c4-7188-49bb27b41c88 (at 10.151.3.45@o2ib) [871163.660308] Lustre: Skipped 5 previous similar messages [871821.990422] Lustre: MGS: Connection restored to b2eb5b70-a542-24bb-3be3-2e81799cef3a (at 10.151.32.94@o2ib) [871821.990427] Lustre: Skipped 107 previous similar messages [872942.854425] Lustre: MGS: Connection restored to 426cbe01-9d72-0dce-c774-8b5cc595c3d4 (at 10.151.39.4@o2ib) [872942.854431] Lustre: Skipped 129 previous similar messages [873691.754678] Lustre: MGS: Connection restored to 17abbf57-4080-00c0-3695-a772594ae835 (at 10.151.46.92@o2ib) [873691.754684] Lustre: Skipped 7 previous similar messages [874309.262605] Lustre: MGS: Connection restored to f445d0d1-9ba9-9ecc-e477-e71233b3e595 (at 10.141.2.242@o2ib417) [874309.262611] Lustre: Skipped 703 previous similar messages [875114.906520] Lustre: MGS: Connection restored to 4d6ba510-3de3-4d34-763d-5cb0801248be (at 10.141.6.122@o2ib417) [875114.906526] Lustre: Skipped 65 previous similar messages [875905.833817] Lustre: MGS: Connection restored to 2fcc5742-19f6-fd63-49ae-a7a4892ec56a (at 10.141.2.241@o2ib417) [875905.833822] Lustre: Skipped 41 previous similar messages [876511.858963] Lustre: MGS: Connection restored to a422b349-5ae1-b84f-418d-2b86eaac1766 (at 10.149.14.101@o2ib313) [876511.858969] Lustre: Skipped 99 previous similar messages [877718.769051] Lustre: MGS: Connection restored to 83ff9ce1-fb2b-383e-3dd4-f11d954b8b3b (at 10.149.4.167@o2ib313) [877718.769056] Lustre: Skipped 163 previous similar messages [878507.962118] Lustre: MGS: Connection restored to d1baac4c-84bd-0896-fc97-fc118712b136 (at 10.149.5.136@o2ib313) [878507.962125] Lustre: Skipped 79 previous similar messages [879226.446500] Lustre: MGS: Connection restored to 6e3e761b-6b03-ad1c-f6d1-ce952f2ed2af (at 10.151.35.179@o2ib) [879226.446506] Lustre: Skipped 137 previous similar messages [879993.992692] Lustre: MGS: Connection restored to f5e42e90-81d5-2b9a-706a-ba449263158e (at 10.151.32.223@o2ib) [879993.992698] Lustre: Skipped 59 previous similar messages [880125.339657] Lustre: nbp8-MDT0000: haven't heard from client d103c68d-e35d-ca7a-72cb-df6f9fdf4ac3 (at 10.151.28.210@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899e4bbe6800, cur 1591562220 expire 1591562070 last 1591561993 [880125.412332] Lustre: Skipped 1 previous similar message [880145.251039] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [880145.284539] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.41.231@o2ib (230): c: 32, oc: 0, rc: 32 [880196.253043] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [880196.286549] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.41.233@o2ib (303): c: 32, oc: 0, rc: 32 [880199.253162] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [880199.286662] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 2 previous similar messages [880199.320143] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.41.239@o2ib (281): c: 32, oc: 0, rc: 32 [880199.361063] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 2 previous similar messages [880238.254478] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [880238.287986] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.28.210@o2ib (339): c: 30, oc: 0, rc: 32 [880416.351937] Lustre: MGS: haven't heard from client 109ba4a5-5e63-8cf4-7727-3cf3888e9714 (at 10.153.10.93@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89727be58400, cur 1591562511 expire 1591562361 last 1591562284 [880416.422616] Lustre: Skipped 3 previous similar messages [881182.067917] Lustre: MGS: Connection restored to 9941b52e-535f-937a-b931-8de5817d18a3 (at 10.149.1.156@o2ib313) [881182.067923] Lustre: Skipped 125 previous similar messages [881855.035619] Lustre: MGS: Connection restored to 558aa337-5eb5-4f5d-9423-a31aa83391db (at 10.151.33.85@o2ib) [881855.035625] Lustre: Skipped 429 previous similar messages [882069.321753] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [882069.355254] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [882069.388463] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.32.121@o2ib (304): c: 32, oc: 0, rc: 32 [882069.429385] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [882607.405093] Lustre: MGS: Connection restored to 379de0db-1d82-7e3b-8413-27f879d61d42 (at 10.141.6.75@o2ib417) [882607.405098] Lustre: Skipped 391 previous similar messages [883214.305110] Lustre: MGS: Connection restored to c012055f-46ab-ae95-b823-1755f217ce32 (at 10.149.14.3@o2ib313) [883214.305116] Lustre: Skipped 301 previous similar messages [883917.865209] Lustre: MGS: Connection restored to c8872b05-8e08-92c2-2259-7129751dd471 (at 10.151.24.94@o2ib) [883917.865215] Lustre: Skipped 49 previous similar messages [884522.705981] Lustre: MGS: Connection restored to 7755856f-492b-ee89-c31e-96354a7be8f3 (at 10.151.2.38@o2ib) [884522.705994] Lustre: Skipped 109 previous similar messages [886188.456321] Lustre: MGS: Connection restored to abf52ae5-a1de-18ef-0913-d061cd7d2677 (at 10.151.33.181@o2ib) [886188.456326] Lustre: Skipped 367 previous similar messages [886360.617447] Lustre: MGS: Connection restored to 49221635-5f6e-a02e-d61c-c39e909c2ff4 (at 10.151.3.180@o2ib) [886360.617453] Lustre: Skipped 1 previous similar message [886952.981267] Lustre: MGS: Connection restored to 29dde62b-0e2e-476a-1bb6-8c361754d9fe (at 10.149.2.199@o2ib313) [886952.981273] Lustre: Skipped 49 previous similar messages [887294.549024] Lustre: MGS: Connection restored to 7755856f-492b-ee89-c31e-96354a7be8f3 (at 10.151.2.38@o2ib) [887294.549030] Lustre: Skipped 67 previous similar messages [888257.136038] Lustre: MGS: Connection restored to 370372fc-fb65-db55-c529-62a6dcce9d34 (at 10.149.4.33@o2ib313) [888257.136044] Lustre: Skipped 211 previous similar messages [888861.005839] Lustre: MGS: Connection restored to efa35531-2485-1d66-b146-376f080e33c5 (at 10.149.14.117@o2ib313) [888861.005845] Lustre: Skipped 335 previous similar messages [889312.678545] Lustre: nbp8-MDT0000: haven't heard from client 1810e3a2-e72d-1109-2311-a167b4a1e56e (at 10.153.17.93@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899f52e6b400, cur 1591571407 expire 1591571257 last 1591571180 [889312.751797] Lustre: Skipped 1 previous similar message [889519.206282] Lustre: MGS: Connection restored to 06f9ac56-cb49-3d6a-014b-f67c39168f28 (at 10.151.3.159@o2ib) [889519.206288] Lustre: Skipped 195 previous similar messages [890213.936159] Lustre: MGS: Connection restored to 7755856f-492b-ee89-c31e-96354a7be8f3 (at 10.151.2.38@o2ib) [890213.936165] Lustre: Skipped 13 previous similar messages [890948.823700] Lustre: MGS: Connection restored to 57df8b41-89ec-7f6b-e6c0-2128b66ceb5d (at 10.151.7.94@o2ib) [890948.823705] Lustre: Skipped 93 previous similar messages [891736.305658] Lustre: MGS: Connection restored to 58271c12-d647-2709-c10e-8d4578fcc80d (at 10.151.35.170@o2ib) [891736.305663] Lustre: Skipped 105 previous similar messages [892908.870795] Lustre: MGS: Connection restored to f5e1f854-b1e0-0a23-a85d-126023cb26cf (at 10.151.54.76@o2ib) [892908.870801] Lustre: Skipped 437 previous similar messages [893675.008372] Lustre: MGS: Connection restored to 62b574af-4e10-67fa-560e-1c877f8bad4a (at 10.151.3.87@o2ib) [893675.008378] Lustre: Skipped 27 previous similar messages [894362.784388] Lustre: MGS: Connection restored to 99105a86-d7a8-9ad6-786a-3326323e63f5 (at 10.151.32.104@o2ib) [894362.784394] Lustre: Skipped 167 previous similar messages [895065.585137] Lustre: MGS: Connection restored to 243af763-4093-c57a-75c4-c673954fa3ef (at 10.151.50.101@o2ib) [895065.585142] Lustre: Skipped 135 previous similar messages [895940.550398] Lustre: MGS: Connection restored to 0d223501-007f-93a7-350d-877e67b8018d (at 10.151.1.233@o2ib) [895940.550404] Lustre: Skipped 237 previous similar messages [896546.033891] Lustre: MGS: Connection restored to 6652fd8f-2799-4785-e399-db3b7f5a4662 (at 10.151.12.189@o2ib) [896546.033897] Lustre: Skipped 235 previous similar messages [897274.527360] Lustre: MGS: Connection restored to 17779bf1-e591-6257-9cde-3dc72871da05 (at 10.149.4.93@o2ib313) [897274.527365] Lustre: Skipped 131 previous similar messages [897947.448602] Lustre: MGS: Connection restored to d25679f2-b8bb-aae5-d9cf-857c633312fa (at 10.151.24.55@o2ib) [897947.448607] Lustre: Skipped 897 previous similar messages [898615.343911] Lustre: MGS: Connection restored to 648ecf60-2203-1161-2e16-8e0c3b3d89d9 (at 10.149.4.236@o2ib313) [898615.343916] Lustre: Skipped 123 previous similar messages [899272.130690] Lustre: MGS: Connection restored to 29dde62b-0e2e-476a-1bb6-8c361754d9fe (at 10.149.2.199@o2ib313) [899272.130696] Lustre: Skipped 145 previous similar messages [899887.756310] Lustre: MGS: Connection restored to b790c1f3-bac4-d325-7dfa-d60268853ca6 (at 10.151.4.157@o2ib) [899887.756316] Lustre: Skipped 29 previous similar messages [900691.130424] Lustre: MGS: Connection restored to 87f19c83-5f0b-c2a4-65e6-7d6921f4cc25 (at 10.151.54.94@o2ib) [900691.130429] Lustre: Skipped 101 previous similar messages [901507.050549] Lustre: MGS: Connection restored to 721b8689-d692-b08d-ab0f-2f79bed7a3f9 (at 10.151.39.203@o2ib) [901507.050555] Lustre: Skipped 59 previous similar messages [902232.678367] Lustre: MGS: Connection restored to 6df683d3-85c0-be49-61e0-a2596506bdf8 (at 10.151.31.11@o2ib) [902232.678373] Lustre: Skipped 23 previous similar messages [902832.922730] Lustre: nbp8-MDT0000: Connection restored to e468a103-d414-c56e-6f11-0bcf0878d0dd (at 10.151.53.140@o2ib) [902832.922736] Lustre: Skipped 144 previous similar messages [903729.233789] Lustre: MGS: Connection restored to b91e2ad0-266d-bac8-313a-0965ad7d6803 (at 10.151.49.95@o2ib) [903729.233795] Lustre: Skipped 10 previous similar messages [904363.968561] Lustre: MGS: Connection restored to a3d9c36d-ae44-d070-29c3-c67ea0ab58c7 (at 10.153.13.220@o2ib233) [904363.968566] Lustre: Skipped 79 previous similar messages [904968.062363] Lustre: MGS: Connection restored to 265d3380-f9c2-915b-28a9-3437399a2b01 (at 10.151.38.147@o2ib) [904968.062369] Lustre: Skipped 191 previous similar messages [905605.542589] Lustre: MGS: Connection restored to b9f754c0-ff77-1c7c-89f4-3a3a92a1e1f0 (at 10.151.33.17@o2ib) [905605.542595] Lustre: Skipped 363 previous similar messages [906329.304152] Lustre: MGS: haven't heard from client cd1ba036-f023-c567-615a-90c80d4c5c1b (at 10.151.38.135@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897bf8ae9000, cur 1591588423 expire 1591588273 last 1591588196 [906329.374262] Lustre: Skipped 1 previous similar message [906331.303258] Lustre: MGS: Connection restored to 5b7c01e8-d13f-5021-5d72-341095a930f8 (at 10.141.6.116@o2ib417) [906331.303264] Lustre: Skipped 33 previous similar messages [906428.217490] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [906428.250983] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.38.135@o2ib (325): c: 30, oc: 0, rc: 32 [906479.219360] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [906479.252861] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.38.136@o2ib (332): c: 30, oc: 0, rc: 32 [906975.326639] Lustre: nbp8-MDT0000: haven't heard from client 64cddadd-f2dd-1ea0-d732-3cfde5aaff45 (at 10.151.4.67@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897e9a84e400, cur 1591589069 expire 1591588919 last 1591588842 [906975.398837] Lustre: Skipped 7 previous similar messages [906996.331246] Lustre: MGS: haven't heard from client 325d49e3-9a80-cfa4-06b3-f2f63cc76fb0 (at 10.151.4.67@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896abaa1dc00, cur 1591589090 expire 1591588940 last 1591588863 [906996.400776] Lustre: Skipped 9 previous similar messages [907024.637688] Lustre: MGS: Connection restored to 8ea48c80-5005-24f8-deca-3ba881aec483 (at 10.151.32.91@o2ib) [907024.637694] Lustre: Skipped 23 previous similar messages [907089.241768] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [907089.275243] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 2 previous similar messages [907089.308737] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.4.60@o2ib (311): c: 30, oc: 0, rc: 32 [907089.349087] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 2 previous similar messages [907092.242950] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [907092.276444] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.4.67@o2ib (322): c: 31, oc: 0, rc: 32 [907097.242150] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [907097.275656] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 3 previous similar messages [907097.309143] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.4.77@o2ib (328): c: 30, oc: 0, rc: 32 [907097.349485] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 3 previous similar messages [907705.578036] Lustre: MGS: Connection restored to 6fb10a1f-9ad6-2e5e-02f5-628b450dba48 (at 10.151.37.88@o2ib) [907705.578041] Lustre: Skipped 37 previous similar messages [908596.823149] Lustre: MGS: Connection restored to 85222724-004f-61a4-5c14-81d3db44c5ca (at 10.151.3.39@o2ib) [908596.823155] Lustre: Skipped 49 previous similar messages [909284.742198] Lustre: MGS: Connection restored to c338d321-79df-3eb6-28d4-33c4a39fe6b9 (at 10.141.3.113@o2ib417) [909284.742203] Lustre: Skipped 391 previous similar messages [909976.345533] Lustre: MGS: Connection restored to ae8b748b-e58b-638e-3de7-15cf5044ed96 (at 10.149.4.30@o2ib313) [909976.345539] Lustre: Skipped 9 previous similar messages [910683.312556] Lustre: MGS: Connection restored to dc6bcb8d-dae8-f4dd-fcc6-d24ca5898f77 (at 10.151.48.235@o2ib) [910683.312561] Lustre: Skipped 43 previous similar messages [911263.497478] Lustre: MGS: haven't heard from client d818c6d1-2cdc-266c-29b0-c9f05f1a137f (at 10.151.36.147@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89991ce55400, cur 1591593357 expire 1591593207 last 1591593130 [911263.567583] Lustre: Skipped 9 previous similar messages [911344.398718] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [911344.432217] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 4 previous similar messages [911344.465706] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.35.130@o2ib (304): c: 30, oc: 0, rc: 32 [911344.506620] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 4 previous similar messages [911354.680031] Lustre: MGS: Connection restored to 89d272e6-f0d5-b3d8-63a4-9d927be8db2a (at 10.149.3.15@o2ib313) [911354.680036] Lustre: Skipped 317 previous similar messages [911371.399722] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [911371.433221] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.35.184@o2ib (307): c: 30, oc: 0, rc: 32 [911380.399969] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [911380.433469] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 3 previous similar messages [911380.466961] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.36.147@o2ib (344): c: 30, oc: 0, rc: 32 [911380.507877] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 3 previous similar messages [911406.401011] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [911406.434505] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.36.99@o2ib (342): c: 30, oc: 0, rc: 32 [911894.510954] Lustre: nbp8-MDT0000: haven't heard from client bce4bee2-fa31-e4a2-0b55-b0740fac6b0b (at 10.149.1.4@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899a6dafdc00, cur 1591593988 expire 1591593838 last 1591593761 [911894.583653] Lustre: Skipped 19 previous similar messages [912079.885844] Lustre: MGS: Connection restored to 1f07c8b7-6abd-85c1-ed26-4fbcfbed3661 (at 10.151.36.85@o2ib) [912079.885850] Lustre: Skipped 97 previous similar messages [912326.525982] Lustre: nbp8-MDT0000: haven't heard from client 64f636e9-7074-8f5e-7c27-6dd4a76046e9 (at 10.151.32.127@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897cfee45000, cur 1591594420 expire 1591594270 last 1591594193 [912326.598655] Lustre: Skipped 19 previous similar messages [912335.460358] LNet: 45706:0:(o2iblnd_cb.c:2602:kiblnd_passive_connect()) Conn stale 10.151.38.130@o2ib version 12/12 incarnation 1588818552291577/1591594375775030 [912422.438509] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [912422.472002] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 3 previous similar messages [912422.505491] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.32.125@o2ib (314): c: 30, oc: 0, rc: 32 [912422.546410] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 3 previous similar messages [912588.535210] Lustre: nbp8-MDT0000: haven't heard from client 432729d3-ea9d-f3ce-1936-93f1dc60e79f (at 10.151.38.119@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89750def3c00, cur 1591594682 expire 1591594532 last 1591594455 [912588.607887] Lustre: Skipped 19 previous similar messages [912682.448127] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [912682.481634] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [912682.514844] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.38.119@o2ib (321): c: 30, oc: 0, rc: 32 [912682.555765] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [912686.449149] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [912686.482643] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 2 previous similar messages [912686.516129] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.38.126@o2ib (324): c: 30, oc: 0, rc: 32 [912686.557043] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 2 previous similar messages [912707.448902] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [912707.482407] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 4 previous similar messages [912707.515908] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.32.88@o2ib (341): c: 30, oc: 0, rc: 32 [912707.556544] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 4 previous similar messages [912751.184641] Lustre: MGS: Connection restored to 943ea180-ea22-36eb-a53b-ff23c42c3358 (at 10.151.32.125@o2ib) [912751.184647] Lustre: Skipped 47 previous similar messages [913383.884762] Lustre: MGS: Connection restored to b7a13814-962b-db25-0204-d7dd9d91d9f0 (at 10.149.10.18@o2ib313) [913383.884767] Lustre: Skipped 47 previous similar messages [914040.126742] Lustre: MGS: Connection restored to f17d232c-a9ba-1e4c-4bee-eb8598f9e891 (at 10.149.5.78@o2ib313) [914040.126747] Lustre: Skipped 63 previous similar messages [914739.639213] Lustre: MGS: Connection restored to 40e0116a-c6f3-146c-4efd-146d1f8d7d89 (at 10.149.10.137@o2ib313) [914739.639219] Lustre: Skipped 257 previous similar messages [915363.701310] Lustre: MGS: Connection restored to 6098a340-8dcd-ea78-3a27-b3cf1fa99a6b (at 10.149.10.128@o2ib313) [915363.701316] Lustre: Skipped 79 previous similar messages [916005.222404] Lustre: MGS: Connection restored to 9294dc9d-5650-49cc-5b53-485d1d73f63e (at 10.151.7.95@o2ib) [916005.222410] Lustre: Skipped 191 previous similar messages [916940.589849] Lustre: MGS: Connection restored to 1e2c7e28-81ec-165e-dbaf-8e6599e43b01 (at 10.151.3.38@o2ib) [916940.589854] Lustre: Skipped 135 previous similar messages [917642.847823] Lustre: MGS: Connection restored to 923e6419-80cb-3f0b-e6ee-89094257852d (at 10.151.12.226@o2ib) [917642.847829] Lustre: Skipped 9 previous similar messages [917928.859787] Process accounting resumed [918555.762275] Lustre: MGS: haven't heard from client b70e2d0d-dbb3-aea6-ac33-b9dd85a0afb5 (at 10.151.28.135@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899abc337000, cur 1591600649 expire 1591600499 last 1591600422 [918555.832401] Lustre: Skipped 19 previous similar messages [918556.754279] Lustre: nbp8-MDT0000: haven't heard from client 9753fe3f-994f-5d0b-3144-019f89a3f765 (at 10.151.28.137@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8996f2797400, cur 1591600650 expire 1591600500 last 1591600423 [918556.826984] Lustre: Skipped 9 previous similar messages [918633.667131] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [918633.700622] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [918633.733829] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.28.137@o2ib (304): c: 30, oc: 0, rc: 32 [918633.774742] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [918636.667306] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [918636.700812] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [918636.734014] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.28.144@o2ib (306): c: 30, oc: 0, rc: 32 [918636.774933] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [918680.668931] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [918680.702431] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [918680.735630] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.28.131@o2ib (321): c: 30, oc: 0, rc: 32 [918680.776542] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [918690.669288] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [918690.702792] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [918690.736001] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.28.151@o2ib (338): c: 31, oc: 0, rc: 32 [918690.776921] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [918694.039085] Lustre: MGS: Connection restored to 76528814-30f3-cf19-b159-c9f81638a007 (at 10.151.36.14@o2ib) [918694.039091] Lustre: Skipped 65 previous similar messages [919058.773249] Lustre: MGS: haven't heard from client bcef5bce-f17b-598f-b658-542310a195aa (at 10.151.36.15@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899a96b90800, cur 1591601152 expire 1591601002 last 1591600925 [919058.843069] Lustre: Skipped 9 previous similar messages [919141.685922] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [919141.719428] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 3 previous similar messages [919141.752922] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.36.15@o2ib (310): c: 30, oc: 0, rc: 32 [919141.793559] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 3 previous similar messages [919191.687736] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [919191.721243] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 8 previous similar messages [919191.754736] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.36.14@o2ib (326): c: 30, oc: 0, rc: 32 [919191.795364] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 8 previous similar messages [919295.851116] Lustre: MGS: Connection restored to 8bc8e014-b57c-b3ae-c80f-ccfe3082d889 (at 10.151.37.43@o2ib) [919295.851122] Lustre: Skipped 27 previous similar messages [920392.098921] Lustre: MGS: Connection restored to 554558de-c83e-d15d-2eda-855d9fc4e0dd (at 10.149.9.249@o2ib313) [920392.098927] Lustre: Skipped 213 previous similar messages [921514.521654] Lustre: MGS: Connection restored to a959dd60-5302-5142-e726-5cad08e1d64f (at 10.151.30.13@o2ib) [921514.521660] Lustre: Skipped 1 previous similar message [922209.715173] Lustre: MGS: Connection restored to 9590db68-2e84-daf9-841c-82a44b1b6484 (at 10.149.4.205@o2ib313) [922209.715179] Lustre: Skipped 117 previous similar messages [922828.329835] Lustre: MGS: Connection restored to 923e6419-80cb-3f0b-e6ee-89094257852d (at 10.151.12.226@o2ib) [922828.329841] Lustre: Skipped 65 previous similar messages [923844.386214] Lustre: MGS: Connection restored to 29dde62b-0e2e-476a-1bb6-8c361754d9fe (at 10.149.2.199@o2ib313) [923844.386219] Lustre: Skipped 25 previous similar messages [924885.101566] Lustre: MGS: Connection restored to 2a8c749a-faa9-679d-d624-dab038a4f66c (at 10.151.0.65@o2ib) [924885.101572] Lustre: Skipped 257 previous similar messages [925996.320513] Lustre: MGS: Connection restored to 9294dc9d-5650-49cc-5b53-485d1d73f63e (at 10.151.7.95@o2ib) [925996.320519] Lustre: Skipped 75 previous similar messages [926738.471016] Lustre: MGS: Connection restored to a6b0cf44-e130-57f5-bbc2-762bf02b19b9 (at 10.149.3.249@o2ib313) [926738.471022] Lustre: Skipped 105 previous similar messages [927551.912201] Lustre: MGS: Connection restored to 2a8c749a-faa9-679d-d624-dab038a4f66c (at 10.151.0.65@o2ib) [927551.912207] Lustre: Skipped 67 previous similar messages [928305.430241] Lustre: MGS: Connection restored to cd80cffb-9bbe-a67e-ea35-83f307a84a55 (at 10.151.32.43@o2ib) [928305.430247] Lustre: Skipped 55 previous similar messages [929014.521331] Lustre: MGS: Connection restored to 6e2067c6-8c8b-63e7-77ab-1e29fe662f4a (at 10.151.54.127@o2ib) [929014.521337] Lustre: Skipped 303 previous similar messages [929900.240094] Lustre: MGS: Connection restored to 485c6224-36fb-c0e3-303c-2b6314058842 (at 10.151.54.123@o2ib) [929900.240099] Lustre: Skipped 331 previous similar messages [930752.416238] Lustre: MGS: Connection restored to 98602cca-5247-e95a-5d39-6b5cc4b4f758 (at 10.151.18.169@o2ib) [930752.416243] Lustre: Skipped 99 previous similar messages [931375.392392] Lustre: MGS: Connection restored to 0d2009ad-6caa-9fea-43eb-11654a746448 (at 10.151.2.15@o2ib) [931375.392397] Lustre: Skipped 85 previous similar messages [932030.715992] Lustre: MGS: Connection restored to 29dde62b-0e2e-476a-1bb6-8c361754d9fe (at 10.149.2.199@o2ib313) [932030.715998] Lustre: Skipped 235 previous similar messages [933098.815986] Lustre: MGS: Connection restored to c48318c7-9095-d9c4-8814-cff863a10076 (at 10.151.33.69@o2ib) [933098.815993] Lustre: Skipped 21 previous similar messages [934306.335996] Lustre: MGS: haven't heard from client 23e69f95-c0b6-4cb5-6924-3e424bcc3c1a (at 10.151.27.21@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89814d49b400, cur 1591616399 expire 1591616249 last 1591616172 [934306.405812] Lustre: Skipped 19 previous similar messages [934325.021502] LNet: 44454:0:(o2iblnd_cb.c:2602:kiblnd_passive_connect()) Conn stale 10.151.27.21@o2ib version 12/12 incarnation 1590085887045914/1591616413511515 [934325.068741] LNet: 44454:0:(o2iblnd_cb.c:2602:kiblnd_passive_connect()) Skipped 6 previous similar messages [934325.106747] Lustre: MGS: Connection restored to 23e69f95-c0b6-4cb5-6924-3e424bcc3c1a (at 10.151.27.21@o2ib) [934325.106751] Lustre: Skipped 281 previous similar messages [934421.493490] Lustre: MGS: Connection restored to 9a45c937-275f-d5c7-dfb6-6a69741f0020 (at 10.141.3.199@o2ib417) [934421.493495] Lustre: Skipped 1 previous similar message [934995.430165] Lustre: MGS: Connection restored to 04ccb031-6eec-1beb-51f5-8361153ece89 (at 10.151.8.135@o2ib) [934995.430170] Lustre: Skipped 17 previous similar messages [936112.192448] Lustre: MGS: Connection restored to 29dde62b-0e2e-476a-1bb6-8c361754d9fe (at 10.149.2.199@o2ib313) [936112.192454] Lustre: Skipped 71 previous similar messages [936814.426776] Lustre: MGS: Connection restored to 8b7e9a8a-d6d1-3818-9797-c6db7b01ce44 (at 10.151.32.23@o2ib) [936814.426781] Lustre: Skipped 1 previous similar message [936821.200182] Lustre: MGS: Connection restored to b6732f28-50c1-f84a-1087-9c954d997e38 (at 10.151.29.187@o2ib) [936821.200188] Lustre: Skipped 3 previous similar messages [937091.345983] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [937091.379470] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.34.166@o2ib (303): c: 32, oc: 0, rc: 32 [937093.444951] Lustre: nbp8-MDT0000: haven't heard from client 936f0ab1-eb43-13bb-2724-82424589ba74 (at 10.149.7.105@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899a4b2ee800, cur 1591619186 expire 1591619036 last 1591618959 [937093.518210] Lustre: Skipped 1 previous similar message [937124.347086] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [937124.380589] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.31.90@o2ib (303): c: 32, oc: 0, rc: 32 [937135.347470] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [937135.380949] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.9.189@o2ib (303): c: 32, oc: 0, rc: 32 [937166.349620] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [937166.383119] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [937166.416322] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.32.19@o2ib (296): c: 32, oc: 0, rc: 32 [937166.456955] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [937377.233759] Lustre: MGS: Connection restored to 8ecab140-47d9-736c-ef59-f1e46a69b8d9 (at 10.149.10.90@o2ib313) [937377.233765] Lustre: Skipped 281 previous similar messages [937492.694460] Lustre: MGS: Connection restored to 4fd6ac9c-fef0-b139-533b-162ffaebcd2d (at 10.151.31.212@o2ib) [937492.694466] Lustre: Skipped 1 previous similar message [937975.049739] Lustre: MGS: Connection restored to a5000f76-b630-e3e6-937f-89732301666c (at 10.151.35.103@o2ib) [937975.049745] Lustre: Skipped 63 previous similar messages [938195.821796] Lustre: MGS: Connection restored to d30b2ab3-ba05-e108-07ec-e19e07bf6135 (at 10.151.54.120@o2ib) [938195.821802] Lustre: Skipped 17 previous similar messages [938346.872871] Lustre: MGS: Connection restored to e20ae8ef-c717-a7f8-79d6-cf504165927f (at 10.141.2.26@o2ib417) [938346.872877] Lustre: Skipped 23 previous similar messages [938764.171197] Lustre: MGS: Connection restored to 4940d7d0-3ccd-a0b7-a2f2-c93f5461d6dc (at 10.141.3.208@o2ib417) [938764.171203] Lustre: Skipped 487 previous similar messages [939477.527895] Lustre: MGS: Connection restored to a8f3fe90-d273-2ca9-60a0-d77ec58ad38c (at 10.151.7.41@o2ib) [939477.527901] Lustre: Skipped 173 previous similar messages [940198.599629] Lustre: MGS: Connection restored to 29dde62b-0e2e-476a-1bb6-8c361754d9fe (at 10.149.2.199@o2ib313) [940198.599633] Lustre: Skipped 247 previous similar messages [940891.264469] Lustre: MGS: Connection restored to ecf4a68d-80b7-25a7-4638-2856a6d6f2ea (at 10.141.6.2@o2ib417) [940891.264475] Lustre: Skipped 45 previous similar messages [941939.673745] Lustre: MGS: Connection restored to 8e7e5410-047a-fa53-2289-c74fbd8c94ef (at 10.149.15.86@o2ib313) [941939.673750] Lustre: Skipped 41 previous similar messages [942715.966926] Lustre: MGS: Connection restored to dde07089-8ded-a14e-de9f-ddbe27e07f56 (at 10.151.14.126@o2ib) [942715.966932] Lustre: Skipped 87 previous similar messages [943690.984073] Lustre: MGS: Connection restored to 682b5f7c-872a-39d8-afe2-405c615dee0d (at 10.151.42.250@o2ib) [943690.984079] Lustre: Skipped 129 previous similar messages [944309.779599] Lustre: MGS: Connection restored to 29dde62b-0e2e-476a-1bb6-8c361754d9fe (at 10.149.2.199@o2ib313) [944309.779605] Lustre: Skipped 379 previous similar messages [945018.731442] Lustre: MGS: Connection restored to f9cc82fd-1f00-e578-0fb2-2bdf9bb4927f (at 10.151.30.195@o2ib) [945018.731448] Lustre: Skipped 203 previous similar messages [945809.716229] Lustre: MGS: Connection restored to 62168799-ffd3-66ba-8c23-becd27eff00d (at 10.151.28.200@o2ib) [945809.716235] Lustre: Skipped 313 previous similar messages [946232.773184] Lustre: MGS: haven't heard from client 49aaf3d3-32c8-b98e-935f-630b00cae714 (at 10.153.10.93@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896abca74800, cur 1591628325 expire 1591628175 last 1591628098 [946232.843864] Lustre: Skipped 3 previous similar messages [946487.944678] Lustre: MGS: Connection restored to db1980d4-c722-1758-5576-ce33ef163c38 (at 10.151.23.161@o2ib) [946487.944684] Lustre: Skipped 207 previous similar messages [947123.129616] Lustre: MGS: Connection restored to 90973ce8-c928-5c7e-76ee-9aedf8beb9ac (at 10.151.4.192@o2ib) [947123.129622] Lustre: Skipped 223 previous similar messages [947785.141294] Lustre: MGS: Connection restored to 5fc8657f-cb68-f92f-4580-3ef40f0de0b7 (at 10.151.37.149@o2ib) [947785.141300] Lustre: Skipped 95 previous similar messages [948559.954920] Lustre: MGS: Connection restored to 6a87a9ac-7589-74df-6d27-db6247d26ad0 (at 10.151.30.145@o2ib) [948559.954926] Lustre: Skipped 201 previous similar messages [948616.860516] Lustre: MGS: haven't heard from client b36cdfb8-06bd-97d2-41d2-58c761405026 (at 10.153.13.68@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8973c22e1c00, cur 1591630709 expire 1591630559 last 1591630482 [948616.931204] Lustre: Skipped 1 previous similar message [949179.264208] Lustre: MGS: Connection restored to 0561f4ab-2b7d-edfb-7990-5d4dfaf37684 (at 10.149.3.233@o2ib313) [949179.264214] Lustre: Skipped 605 previous similar messages [950015.045867] Lustre: MGS: Connection restored to 49984aa3-ff59-c685-5ddb-416575425dc0 (at 10.151.1.172@o2ib) [950015.045872] Lustre: Skipped 163 previous similar messages [950650.211385] Lustre: MGS: Connection restored to 3d73bc0d-e4d3-270b-a6fc-5ef002961e60 (at 10.151.1.43@o2ib) [950650.211391] Lustre: Skipped 73 previous similar messages [951490.193247] Lustre: MGS: Connection restored to ed4d1805-d4d9-2bb2-e5ad-67e245d2f7c9 (at 10.151.0.208@o2ib) [951490.193252] Lustre: Skipped 183 previous similar messages [952154.900765] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [952154.934272] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [952154.967481] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.37.32@o2ib (289): c: 32, oc: 0, rc: 32 [952155.008110] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [952184.902788] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [952184.936290] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.31.11@o2ib (295): c: 32, oc: 0, rc: 32 [952236.879973] Lustre: MGS: Connection restored to 9a089b3d-20ac-1204-093e-4de189a1af7d (at 10.149.10.98@o2ib313) [952236.879978] Lustre: Skipped 9 previous similar messages [952272.905046] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [952272.938533] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.37.65@o2ib (247): c: 32, oc: 0, rc: 32 [952458.911892] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [952458.945402] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.32.101@o2ib (286): c: 32, oc: 0, rc: 32 [952539.005093] Lustre: nbp8-MDT0000: haven't heard from client 1d654ba8-5c61-2b04-2a8f-203f95e43b22 (at 10.149.1.44@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899f527c8800, cur 1591634631 expire 1591634481 last 1591634404 [952539.078060] Lustre: Skipped 3 previous similar messages [953088.247496] Lustre: MGS: Connection restored to 17b57d0d-d7a4-5a14-1f08-d9669af2d657 (at 10.151.2.92@o2ib) [953088.247502] Lustre: Skipped 5 previous similar messages [954234.439309] Lustre: MGS: Connection restored to dd686d47-537d-198d-e016-145772bcc3b2 (at 10.151.0.46@o2ib) [954234.439315] Lustre: Skipped 357 previous similar messages [954837.482805] Lustre: MGS: Connection restored to 5dd40a6e-c081-601c-3921-9af618660a06 (at 10.151.22.32@o2ib) [954837.482811] Lustre: Skipped 133 previous similar messages [954947.003141] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [954947.036641] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.58.239@o2ib (303): c: 32, oc: 0, rc: 32 [955455.441572] Lustre: MGS: Connection restored to 600635ec-6961-6b7d-43e5-371c4d116332 (at 10.151.3.104@o2ib) [955455.441577] Lustre: Skipped 195 previous similar messages [956128.534885] Lustre: MGS: Connection restored to 646bb563-40b1-9b78-306e-1ac9dbf29d7f (at 10.151.56.153@o2ib) [956128.534891] Lustre: Skipped 225 previous similar messages [956740.446562] Lustre: MGS: Connection restored to 36ff6857-1a08-5eb7-0e31-7cf75896cfe8 (at 10.151.23.27@o2ib) [956740.446567] Lustre: Skipped 83 previous similar messages [957470.170937] Lustre: MGS: Connection restored to 8f974db2-0e68-d880-3d29-b9bde5448f84 (at 10.151.35.156@o2ib) [957470.170943] Lustre: Skipped 397 previous similar messages [958129.252308] Lustre: MGS: Connection restored to 688e3b20-eca3-0a67-62f7-78f0b3833248 (at 10.151.24.93@o2ib) [958129.252313] Lustre: Skipped 285 previous similar messages [958732.530654] Lustre: MGS: Connection restored to 5b14c9e9-63bf-4331-b50e-0f7972e9bfe7 (at 10.151.55.231@o2ib) [958732.530660] Lustre: Skipped 135 previous similar messages [959405.444002] Lustre: MGS: Connection restored to d9d575d9-a4a3-9782-69c3-b03aeecfcbc2 (at 10.151.37.122@o2ib) [959405.444008] Lustre: Skipped 215 previous similar messages [960046.554443] Lustre: MGS: Connection restored to 9112ccf6-6ec7-27ee-899c-124b8a659528 (at 10.151.11.140@o2ib) [960046.554449] Lustre: Skipped 63 previous similar messages [960774.664356] Lustre: MGS: Connection restored to 8d3dbdfd-4125-115f-5e9c-27c8874fbc59 (at 10.151.0.50@o2ib) [960774.664362] Lustre: Skipped 87 previous similar messages [961376.267015] Lustre: MGS: Connection restored to f446f38e-858c-76e9-3f71-1a7238071726 (at 10.151.7.156@o2ib) [961376.267020] Lustre: Skipped 203 previous similar messages [962007.172537] Lustre: MGS: Connection restored to c3e41625-7382-ced5-cd88-3ba3c8b9a5f1 (at 10.151.32.22@o2ib) [962007.172542] Lustre: Skipped 1145 previous similar messages [962280.364060] Lustre: MGS: haven't heard from client 1be05162-2ee9-b723-1b54-8f35b17875c6 (at 10.151.48.252@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897963e4a400, cur 1591644372 expire 1591644222 last 1591644145 [962280.434163] Lustre: Skipped 3 previous similar messages [962289.363975] Lustre: nbp8-MDT0000: haven't heard from client 771ff804-b067-7490-cd59-6f087f57a4da (at 10.151.48.252@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897d447ccc00, cur 1591644381 expire 1591644231 last 1591644154 [962356.369220] Lustre: MGS: haven't heard from client 6d4396d0-3894-c5cd-7c8b-ecf84428f4d9 (at 10.151.36.230@o2ib) in 170 seconds. I think it's dead, and I am evicting it. exp ffff8973b0ac5400, cur 1591644448 expire 1591644298 last 1591644278 [962365.365230] Lustre: nbp8-MDT0000: haven't heard from client 448069c6-11fc-451d-4372-2cf02943ab2f (at 10.151.36.230@o2ib) in 179 seconds. I think it's dead, and I am evicting it. exp ffff899f473bcc00, cur 1591644457 expire 1591644307 last 1591644278 [962410.277197] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [962410.310690] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.48.252@o2ib (347): c: 31, oc: 0, rc: 32 [962529.281656] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [962529.315158] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.36.230@o2ib (342): c: 30, oc: 0, rc: 32 [962560.373635] Lustre: MGS: haven't heard from client 065ab2ce-452b-bfeb-b169-1cdd701b0e6f (at 10.151.47.214@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8974acf72400, cur 1591644652 expire 1591644502 last 1591644425 [962636.382931] Lustre: MGS: haven't heard from client 42ecfede-c233-c957-417f-cb849fb44073 (at 10.151.36.229@o2ib) in 158 seconds. I think it's dead, and I am evicting it. exp ffff8973b0ac3800, cur 1591644728 expire 1591644578 last 1591644570 [962636.453056] Lustre: Skipped 1 previous similar message [962655.566407] Lustre: MGS: Connection restored to f136143e-40ee-3753-f02b-7db52f6a7330 (at 10.151.29.174@o2ib) [962655.566413] Lustre: Skipped 23 previous similar messages [962667.287627] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [962667.321132] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.47.214@o2ib (333): c: 30, oc: 0, rc: 32 [962781.291929] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [962781.325430] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.36.229@o2ib (302): c: 30, oc: 0, rc: 32 [963262.725605] Lustre: MGS: Connection restored to 4aef2e7e-6ff9-8166-d84c-41cb761d8763 (at 10.151.7.158@o2ib) [963262.725610] Lustre: Skipped 79 previous similar messages [963264.397860] Lustre: nbp8-MDT0000: haven't heard from client 5b7d513f-9b28-a97e-1990-33657e1e31d3 (at 10.151.36.228@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899bc87acc00, cur 1591645356 expire 1591645206 last 1591645129 [963264.470543] Lustre: Skipped 1 previous similar message [963387.314101] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [963387.347602] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.36.228@o2ib (348): c: 30, oc: 0, rc: 32 [963468.405145] Lustre: nbp8-MDT0000: haven't heard from client 68372738-e910-516f-5941-a0ad0de4199f (at 10.151.53.22@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897c9bd3c400, cur 1591645560 expire 1591645410 last 1591645333 [963468.477531] Lustre: Skipped 1 previous similar message [963516.936573] LNet: 638:0:(o2iblnd_cb.c:2602:kiblnd_passive_connect()) Conn stale 10.151.53.22@o2ib version 12/12 incarnation 1589109801821230/1591645511857493 [963876.385887] Lustre: MGS: Connection restored to f3fe66a6-33bf-adc3-8e9a-7349c7723ce9 (at 10.151.32.19@o2ib) [963876.385893] Lustre: Skipped 181 previous similar messages [964691.378785] Lustre: MGS: Connection restored to a68a17b7-894e-b919-0489-abdb279813fc (at 10.151.29.196@o2ib) [964691.378790] Lustre: Skipped 393 previous similar messages [964886.368181] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [964886.401669] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.35.250@o2ib (273): c: 32, oc: 0, rc: 32 [964898.368710] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [964898.402210] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.38.214@o2ib (246): c: 32, oc: 0, rc: 32 [965418.388816] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [965418.422315] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.51.249@o2ib (303): c: 32, oc: 0, rc: 32 [965446.319674] Lustre: MGS: Connection restored to c3e41625-7382-ced5-cd88-3ba3c8b9a5f1 (at 10.151.32.22@o2ib) [965446.319680] Lustre: Skipped 45 previous similar messages [965506.390967] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [965506.424469] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.38.217@o2ib (236): c: 32, oc: 0, rc: 32 [965606.394752] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [965606.428238] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.38.216@o2ib (303): c: 32, oc: 0, rc: 32 [966070.564883] Lustre: MGS: Connection restored to 47825875-706c-894a-0d83-96324651ab75 (at 10.151.0.72@o2ib) [966070.564889] Lustre: Skipped 41 previous similar messages [966708.671611] Lustre: MGS: Connection restored to d0b7e9f8-5f87-449a-8a55-9c5cac5fbb60 (at 10.151.23.178@o2ib) [966708.671615] Lustre: Skipped 319 previous similar messages [967440.922178] Lustre: MGS: Connection restored to c5ae7bea-f5b1-486e-1fac-606c20aeddb8 (at 10.149.10.39@o2ib313) [967440.922184] Lustre: Skipped 21 previous similar messages [968074.352453] Lustre: MGS: Connection restored to dacdc2ab-47cc-0019-e3c0-6d6c87525a2c (at 10.151.0.61@o2ib) [968074.352458] Lustre: Skipped 151 previous similar messages [968368.588231] Lustre: MGS: haven't heard from client fa938946-2fb9-31a7-c436-0f24967b3995 (at 10.151.63.40@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897480afd800, cur 1591650460 expire 1591650310 last 1591650233 [968368.658048] Lustre: Skipped 1 previous similar message [968466.499817] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [968466.533318] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.63.40@o2ib (324): c: 30, oc: 0, rc: 32 [968747.885633] Lustre: MGS: Connection restored to da337552-f3ab-870e-6d50-9bb58828e878 (at 10.149.10.59@o2ib313) [968747.885639] Lustre: Skipped 159 previous similar messages [969594.202627] Lustre: MGS: Connection restored to 26e2e30a-81eb-fec3-4a4b-a9100f3b69fe (at 10.151.19.34@o2ib) [969594.202632] Lustre: Skipped 77 previous similar messages [970298.165398] Lustre: MGS: Connection restored to 82e93b3e-be0b-ee5f-a368-19e1f90008ce (at 10.149.15.48@o2ib313) [970298.165404] Lustre: Skipped 301 previous similar messages [970898.277838] Lustre: MGS: Connection restored to f14dafe9-c388-dd16-441f-94320a3001dc (at 10.151.55.148@o2ib) [970898.277844] Lustre: Skipped 53 previous similar messages [971248.691119] Lustre: nbp8-MDT0000: haven't heard from client 77bd1322-d770-a6bf-9a7f-02c0f692124d (at 10.153.17.232@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899dce645800, cur 1591653340 expire 1591653190 last 1591653113 [971248.764653] Lustre: Skipped 1 previous similar message [971587.692216] Lustre: MGS: Connection restored to 00ceda12-6657-7bcd-4802-edec0f1dc9d1 (at 10.151.49.252@o2ib) [971587.692221] Lustre: Skipped 71 previous similar messages [971606.708099] Lustre: MGS: haven't heard from client 8a959039-e882-8c24-5b3c-c4020e0c0324 (at 10.149.1.63@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8972e98d2000, cur 1591653698 expire 1591653548 last 1591653471 [971606.778490] Lustre: Skipped 1 previous similar message [971895.716321] Lustre: MGS: haven't heard from client ca6880ef-173d-dd87-44d6-c6eddae5db6e (at 10.153.16.104@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8973b62b7400, cur 1591653987 expire 1591653837 last 1591653760 [971895.787281] Lustre: Skipped 1 previous similar message [971913.715805] Lustre: nbp8-MDT0000: haven't heard from client 24a4e7c4-73b6-e122-d471-02865c3e703b (at 10.153.16.104@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897ef99fb000, cur 1591654005 expire 1591653855 last 1591653778 [971913.789350] Lustre: Skipped 1 previous similar message [972587.652449] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [972587.685950] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.55.231@o2ib (303): c: 32, oc: 0, rc: 32 [973506.980904] Lustre: MGS: Connection restored to 0773b5f7-08d7-b1cf-ba3d-7ce46d84f19b (at 10.149.10.8@o2ib313) [973506.980909] Lustre: Skipped 125 previous similar messages [974500.126762] Lustre: MGS: Connection restored to 29dde62b-0e2e-476a-1bb6-8c361754d9fe (at 10.149.2.199@o2ib313) [974500.126767] Lustre: Skipped 1 previous similar message [974636.034879] Lustre: MGS: Connection restored to 81df8a59-e8bc-88a0-d563-cb62440a832e (at 10.149.14.149@o2ib313) [974636.034885] Lustre: Skipped 1 previous similar message [975349.843541] Lustre: nbp8-MDT0000: haven't heard from client 2541175d-26ec-f8dc-0bf0-1b0a6f32131a (at 10.149.2.37@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897d3fbac400, cur 1591657441 expire 1591657291 last 1591657214 [975349.916509] Lustre: Skipped 1 previous similar message [975356.753284] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [975356.786772] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.15.51@o2ib (219): c: 32, oc: 0, rc: 32 [975357.753314] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [975357.786807] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.23.25@o2ib (303): c: 32, oc: 0, rc: 32 [975359.753391] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [975359.786881] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [975359.820075] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.34.142@o2ib (304): c: 32, oc: 0, rc: 32 [975359.861005] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [975364.846799] Lustre: MGS: haven't heard from client 2576f227-1727-6a9b-4217-2ac89de807c6 (at 10.149.2.37@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a3c2756800, cur 1591657456 expire 1591657306 last 1591657229 [975364.917231] Lustre: Skipped 1 previous similar message [975372.753809] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [975372.787309] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.34.66@o2ib (303): c: 32, oc: 0, rc: 32 [975379.754069] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [975379.787570] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 2 previous similar messages [975379.821056] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.34.181@o2ib (279): c: 32, oc: 0, rc: 32 [975379.861976] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 2 previous similar messages [975416.755425] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [975416.788923] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 2 previous similar messages [975416.822406] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.37.93@o2ib (304): c: 32, oc: 0, rc: 32 [975416.863042] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 2 previous similar messages [975435.757205] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [975435.790698] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 3 previous similar messages [975435.824182] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.22.235@o2ib (304): c: 32, oc: 0, rc: 32 [975435.865106] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 3 previous similar messages [975468.757394] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [975468.790884] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 7 previous similar messages [975468.824369] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.4.162@o2ib (345): c: 31, oc: 0, rc: 32 [975468.865013] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 7 previous similar messages [975558.738811] Lustre: MGS: Connection restored to 7a16ca4d-058c-9db9-312f-5104bb1b3b4f (at 10.153.17.83@o2ib233) [975558.738816] Lustre: Skipped 59 previous similar messages [975562.927933] Lustre: MGS: Connection restored to d7d5f4d2-fa59-a0c6-7951-ba23829fad41 (at 10.153.16.104@o2ib233) [975562.927939] Lustre: Skipped 5 previous similar messages [975675.855439] Lustre: nbp8-MDT0000: haven't heard from client f53af66a-f9d5-5a56-d0eb-7af40c100bab (at 10.149.2.183@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897d3fbac800, cur 1591657767 expire 1591657617 last 1591657540 [975675.928696] Lustre: Skipped 1 previous similar message [975678.765042] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [975678.798535] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 4 previous similar messages [975678.832019] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.4.177@o2ib (304): c: 32, oc: 0, rc: 32 [975678.872651] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 4 previous similar messages [975751.863849] Lustre: nbp8-MDT0000: haven't heard from client 1b4d478e-4d9d-dc18-0ad8-7043fd69ad5c (at 10.153.16.104@o2ib233) in 179 seconds. I think it's dead, and I am evicting it. exp ffff8973b1f15400, cur 1591657843 expire 1591657693 last 1591657664 [975751.937380] Lustre: Skipped 1 previous similar message [975827.868921] Lustre: nbp8-MDT0000: haven't heard from client 2708cbbd-4aff-fb59-5a73-1d37c201e28e (at 10.153.17.137@o2ib233) in 185 seconds. I think it's dead, and I am evicting it. exp ffff899a83250c00, cur 1591657919 expire 1591657769 last 1591657734 [975827.942479] Lustre: Skipped 1 previous similar message [975942.774849] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [975942.808355] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.29.164@o2ib (286): c: 32, oc: 0, rc: 32 [976879.293322] Lustre: MGS: Connection restored to c012055f-46ab-ae95-b823-1755f217ce32 (at 10.149.14.3@o2ib313) [976879.293327] Lustre: Skipped 293 previous similar messages [976880.188572] Lustre: nbp8-MDT0000: Connection restored to 3d7f8e38-2748-2bf0-5144-79c215f01378 (at 10.151.23.149@o2ib) [976880.188578] Lustre: Skipped 2 previous similar messages [976924.256856] Lustre: MGS: Connection restored to 35681b0e-2ab5-afa8-7102-f8396042ac7c (at 10.151.56.172@o2ib) [976990.366589] Lustre: MGS: Connection restored to 8a5a6f4c-774d-d9f5-3107-d0d5601a4f9f (at 10.151.52.30@o2ib) [976990.366595] Lustre: Skipped 1 previous similar message [976997.006641] Lustre: MGS: Connection restored to afd77f48-f5d3-32bf-c4f4-c0d9438d0a92 (at 10.151.52.36@o2ib) [976997.006647] Lustre: Skipped 1 previous similar message [977030.601351] Lustre: MGS: Connection restored to 6f5f0e57-acb8-2d08-571b-8ba31de38ea0 (at 10.151.56.49@o2ib) [977030.601355] Lustre: Skipped 17 previous similar messages [977049.909188] Lustre: nbp8-MDT0000: Connection restored to 4eaf2b84-07e2-acaa-7d6c-679e1d423f46 (at 10.151.7.218@o2ib) [977049.909194] Lustre: Skipped 428 previous similar messages [977088.627577] Lustre: MGS: Connection restored to 6bfc8067-74b8-4922-c842-aec0be69aba9 (at 10.149.2.153@o2ib313) [977088.627583] Lustre: Skipped 282 previous similar messages [977394.779834] Lustre: MGS: Connection restored to 11a76c06-767e-4805-9c13-08a564dddcaa (at 10.151.7.113@o2ib) [977394.779839] Lustre: Skipped 267 previous similar messages [977731.724525] Lustre: MGS: Connection restored to c51d49af-1093-04d6-f123-a195629916a8 (at 10.151.19.180@o2ib) [977731.724530] Lustre: Skipped 437 previous similar messages [977781.842467] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [977781.875975] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 4 previous similar messages [977781.909468] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.10.222@o2ib (304): c: 32, oc: 0, rc: 32 [977781.950389] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 4 previous similar messages [977837.845513] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [977837.879006] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.19.151@o2ib (322): c: 31, oc: 0, rc: 32 [978428.866160] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [978428.899666] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.15.134@o2ib (309): c: 31, oc: 0, rc: 32 [978556.395859] Lustre: MGS: Connection restored to 28f49415-e63e-7384-e64c-40c90062d786 (at 10.151.28.95@o2ib) [978556.395865] Lustre: Skipped 87 previous similar messages [978603.872622] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [978603.906107] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 3 previous similar messages [978603.939600] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.15.81@o2ib (227): c: 32, oc: 0, rc: 32 [978603.980227] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 3 previous similar messages [978809.979034] Lustre: MGS: haven't heard from client 20f8362a-69d6-1e7e-034b-0890520c8fc2 (at 10.149.15.37@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a324f56800, cur 1591660901 expire 1591660751 last 1591660674 [978810.049712] Lustre: Skipped 3 previous similar messages [979007.887358] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [979007.920858] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 5 previous similar messages [979007.954361] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.2.176@o2ib (290): c: 32, oc: 0, rc: 32 [979007.995005] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 5 previous similar messages [979198.985148] Lustre: nbp8-MDT0000: haven't heard from client nbp8-MDT0000-lwp-OST0033_UUID (at 10.151.27.87@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a35a94a800, cur 1591661290 expire 1591661140 last 1591661063 [979233.895873] Lustre: 5528:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1591661122/real 1591661324] req@ffff89a391424800 x1667960070723328/t0(0) o400->nbp8-OST00cf-osc-MDT0000@10.151.27.87@o2ib:28/4 lens 224/224 e 0 to 1 dl 1591661745 ref 1 fl Rpc:eXN/0/ffffffff rc 0/-1 [979233.895886] Lustre: nbp8-OST0033-osc-MDT0000: Connection to nbp8-OST0033 (at 10.151.27.87@o2ib) was lost; in progress operations using this service will wait for recovery to complete [979234.044659] Lustre: 5528:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 28 previous similar messages [979284.897548] LNet: 4812:0:(o2iblnd_cb.c:3397:kiblnd_check_conns()) Timed out tx for 10.151.27.87@o2ib: 53 seconds [979284.931429] Lustre: 5524:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1591661247/real 1591661375] req@ffff89a3eab1f080 x1667960071230080/t0(0) o400->nbp8-OST0033-osc-MDT0000@10.151.27.87@o2ib:28/4 lens 224/224 e 0 to 1 dl 1591661870 ref 1 fl Rpc:eXN/0/ffffffff rc 0/-1 [979285.026412] Lustre: 5524:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 32 previous similar messages [979334.899463] LNet: 4812:0:(o2iblnd_cb.c:3397:kiblnd_check_conns()) Timed out tx for 10.151.27.87@o2ib: 78 seconds [979334.933240] LNet: 4812:0:(o2iblnd_cb.c:3397:kiblnd_check_conns()) Skipped 31 previous similar messages [979334.964347] Lustre: 5527:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1591661297/real 1591661425] req@ffff899f7af3b600 x1667960071425664/t0(0) o400->nbp8-OST00e9-osc-MDT0000@10.151.27.87@o2ib:28/4 lens 224/224 e 0 to 1 dl 1591661920 ref 1 fl Rpc:eXN/0/ffffffff rc 0/-1 [979335.059338] Lustre: 5527:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 22 previous similar messages [979343.380651] Lustre: MGS: Connection restored to 8de51225-7556-e888-a78a-95d9aea4ba44 (at 10.151.36.105@o2ib) [979343.380656] Lustre: Skipped 2269 previous similar messages [979385.901264] LNet: 4812:0:(o2iblnd_cb.c:3397:kiblnd_check_conns()) Timed out tx for 10.151.27.87@o2ib: 154 seconds [979385.935334] LNet: 4812:0:(o2iblnd_cb.c:3397:kiblnd_check_conns()) Skipped 23 previous similar messages [979385.966572] Lustre: 5525:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1591661322/real 1591661476] req@ffff899feb2b3a80 x1667960071522816/t0(0) o400->nbp8-OST0019-osc-MDT0000@10.151.27.87@o2ib:28/4 lens 224/224 e 0 to 1 dl 1591661945 ref 1 fl Rpc:eXN/0/ffffffff rc 0/-1 [979386.061571] Lustre: 5525:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 11 previous similar messages [979434.025039] Lustre: 5524:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1591661272/real 1591661525] req@ffff89a3bc88a880 x1667960071327936/t0(0) o400->nbp8-OST0067-osc-MDT0000@10.151.27.87@o2ib:28/4 lens 224/224 e 0 to 1 dl 1591661895 ref 1 fl Rpc:eXN/0/ffffffff rc 0/-1 [979434.120026] Lustre: 5524:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 9 previous similar messages [979536.906866] LNet: 4812:0:(o2iblnd_cb.c:3397:kiblnd_check_conns()) Timed out tx for 10.151.27.87@o2ib: 1 seconds [979536.940356] LNet: 4812:0:(o2iblnd_cb.c:3397:kiblnd_check_conns()) Skipped 16 previous similar messages [979666.911545] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [979666.945037] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [979666.978237] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.45.182@o2ib (352): c: 31, oc: 0, rc: 32 [979667.019159] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [979890.919795] LNet: 4812:0:(o2iblnd_cb.c:3397:kiblnd_check_conns()) Timed out tx for 10.151.27.87@o2ib: 30 seconds [979890.953581] LNet: 4812:0:(o2iblnd_cb.c:3397:kiblnd_check_conns()) Skipped 2 previous similar messages [979905.922519] LNetError: 80657:0:(o2iblnd_cb.c:2962:kiblnd_rejected()) 10.151.27.87@o2ib rejected: o2iblnd fatal error [979960.106787] Lustre: nbp8-MDT0000: Connection restored to ca77e8f1-b849-65b0-f747-01ef064e01a8 (at 10.149.10.10@o2ib313) [979960.106792] Lustre: Skipped 15 previous similar messages [980202.763494] LNet: 80657:0:(o2iblnd_cb.c:2602:kiblnd_passive_connect()) Conn stale 10.151.4.143@o2ib version 12/12 incarnation 1588819430665534/1591662236857923 [980581.154249] Lustre: MGS: Connection restored to 70736745-4474-a54a-2e12-2b25988434ca (at 10.151.38.24@o2ib) [980581.154254] Lustre: Skipped 256 previous similar messages [980918.183418] Lustre: 5525:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1591661082/real 1591661082] req@ffff899fc72d0d80 x1667960070569472/t0(0) o13->nbp8-OST0103-osc-MDT0000@10.151.27.87@o2ib:7/4 lens 224/368 e 0 to 1 dl 1591661705 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [980918.276406] Lustre: 5525:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 1 previous similar message [981193.553507] Lustre: 5528:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1591661084/real 1591661084] req@ffff89a254e3c380 x1667960070574144/t0(0) o13->nbp8-OST00cf-osc-MDT0000@10.151.27.87@o2ib:7/4 lens 224/368 e 0 to 1 dl 1591661707 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [981269.502680] Lustre: MGS: Connection restored to 9b94e4b8-a656-a80c-47da-deac5951c76f (at 10.149.15.166@o2ib313) [981269.502686] Lustre: Skipped 113 previous similar messages [981428.976146] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [981429.009660] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 15 previous similar messages [981429.043432] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.15.74@o2ib (311): c: 31, oc: 0, rc: 32 [981429.084059] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 15 previous similar messages [981930.733207] Lustre: MGS: Connection restored to dcbd01eb-8a3c-76fa-cb48-94f9df9ffbbf (at 10.151.56.110@o2ib) [981930.733212] Lustre: Skipped 1149 previous similar messages [982212.094141] Lustre: MGS: haven't heard from client 1c671418-cb72-5963-7e34-bbc7c952835c (at 10.149.14.207@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a403abc400, cur 1591664303 expire 1591664153 last 1591664076 [982212.165110] Lustre: Skipped 12 previous similar messages [982215.959946] Lustre: 5519:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1591661087/real 1591661087] req@ffff8971ebb75100 x1667960070586688/t0(0) o13->nbp8-OST0067-osc-MDT0000@10.151.27.87@o2ib:7/4 lens 224/368 e 0 to 1 dl 1591661710 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [982288.101419] Lustre: MGS: haven't heard from client 493321f2-7ba4-547d-5384-47e0f599bd1c (at 10.151.56.109@o2ib) in 219 seconds. I think it's dead, and I am evicting it. exp ffff897dc64dd800, cur 1591664379 expire 1591664229 last 1591664160 [982288.171537] Lustre: Skipped 1 previous similar message [982296.097922] Lustre: nbp8-MDT0000: haven't heard from client e44f129e-b911-4940-ff29-b947a9c97b21 (at 10.151.56.109@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a38905a400, cur 1591664387 expire 1591664237 last 1591664160 [982402.012761] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [982402.046246] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 9 previous similar messages [982402.079733] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.56.109@o2ib (333): c: 30, oc: 0, rc: 32 [982402.120649] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 9 previous similar messages [982766.267108] Lustre: 5527:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1591661083/real 1591661083] req@ffff89a228ac2400 x1667960070573824/t0(0) o13->nbp8-OST0137-osc-MDT0000@10.151.27.87@o2ib:7/4 lens 224/368 e 0 to 1 dl 1591661706 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [982766.360097] Lustre: 5527:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 1 previous similar message [982767.083868] Lustre: MGS: Connection restored to 29dde62b-0e2e-476a-1bb6-8c361754d9fe (at 10.149.2.199@o2ib313) [982767.083873] Lustre: Skipped 415 previous similar messages [983207.041351] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [983207.074846] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.4.185@o2ib (287): c: 32, oc: 0, rc: 32 [983827.351956] Lustre: MGS: Connection restored to ff238681-6c0a-f265-67d2-7bfa0ac9f628 (at 10.149.2.243@o2ib313) [983827.351961] Lustre: Skipped 201 previous similar messages [984377.084216] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [984377.117705] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [984377.150914] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.46.156@o2ib (277): c: 32, oc: 0, rc: 32 [984377.191833] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [984423.085856] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [984423.119362] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.18.143@o2ib (303): c: 31, oc: 0, rc: 32 [984459.087255] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [984459.120769] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [984459.153971] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.18.216@o2ib (340): c: 31, oc: 0, rc: 32 [984459.194889] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [984693.539694] Lustre: MGS: Connection restored to ef8b2b2d-8f09-3b00-19e1-fbc0bde84bc5 (at 10.149.9.209@o2ib313) [984693.539700] Lustre: Skipped 119 previous similar messages [985453.407455] Lustre: MGS: Connection restored to c116ca92-f11d-b385-da91-fe42ad76f4c3 (at 10.151.34.137@o2ib) [985453.407461] Lustre: Skipped 39 previous similar messages [985630.130161] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [985630.163662] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.9.215@o2ib (298): c: 32, oc: 0, rc: 32 [986121.794450] Lustre: MGS: Connection restored to e8f4e8a8-72a7-0b09-4f97-fe721a271298 (at 10.151.7.84@o2ib) [986121.794456] Lustre: Skipped 534 previous similar messages [986547.227872] Lustre: 5511:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1591668014/real 1591668014] req@ffff897bdf3dde80 x1667960100236928/t0(0) o400->nbp8-OST0067-osc-MDT0000@10.151.27.87@o2ib:28/4 lens 224/224 e 0 to 1 dl 1591668637 ref 1 fl Rpc:X/c0/ffffffff rc 0/-1 [986547.321754] Lustre: 5511:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 1 previous similar message [986554.415140] Lustre: 5511:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1591668022/real 1591668022] req@ffff897bdf3df980 x1667960100851968/t0(0) o400->nbp8-OST011d-osc-MDT0000@10.151.27.87@o2ib:28/4 lens 224/224 e 0 to 1 dl 1591668645 ref 1 fl Rpc:X/c0/ffffffff rc 0/-1 [986554.509014] Lustre: 5511:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 1 previous similar message [986825.001722] Lustre: nbp8-OST0019-osc-MDT0000: Connection restored to 10.151.27.87@o2ib (at 10.151.27.87@o2ib) [986825.001727] Lustre: Skipped 12 previous similar messages [987471.186954] Lustre: MGS: Connection restored to 4d4139b6-3b5c-faa7-cbd5-a882c97e1843 (at 10.151.30.134@o2ib) [987471.186961] Lustre: Skipped 470 previous similar messages [988537.378679] Lustre: MGS: Connection restored to d4ce0bf4-959c-4180-2b96-34ef70adaeea (at 10.151.11.138@o2ib) [988537.378684] Lustre: Skipped 139 previous similar messages [989137.525463] Lustre: nbp8-MDT0000: Connection restored to c17da790-07c1-d4c9-c99c-304f2187b50e (at 10.151.52.20@o2ib) [989137.525468] Lustre: Skipped 90 previous similar messages [989742.505978] Lustre: MGS: Connection restored to bfa66a46-1a1d-3420-d9f0-25f29c571124 (at 10.151.45.20@o2ib) [989742.505984] Lustre: Skipped 252 previous similar messages [990415.894275] Lustre: MGS: Connection restored to 087df1c2-29a0-d9fa-e791-106ffd9df25c (at 10.151.56.28@o2ib) [990415.894281] Lustre: Skipped 167 previous similar messages [991028.707626] Lustre: MGS: Connection restored to 8e307473-0bac-9d67-4c96-fbd2942595e0 (at 10.149.14.24@o2ib313) [991028.707632] Lustre: Skipped 59 previous similar messages [991710.758030] Lustre: MGS: Connection restored to c7b3a164-a12d-837f-a3e9-efde29e73165 (at 10.151.19.203@o2ib) [991710.758035] Lustre: Skipped 291 previous similar messages [992322.584540] Lustre: MGS: Connection restored to 8c1bd8bc-9be6-7328-fa72-16ae8e505a12 (at 10.149.2.227@o2ib313) [992322.584545] Lustre: Skipped 109 previous similar messages [992815.485189] Lustre: nbp8-MDT0000: haven't heard from client 59bf31d9-79e7-901a-b53f-604dd2ec334d (at 10.149.3.204@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8998cd2c9800, cur 1591674906 expire 1591674756 last 1591674679 [993031.976422] Lustre: MGS: Connection restored to 9051940e-5906-8e2a-a43b-16cc42990b46 (at 10.151.54.101@o2ib) [993031.976427] Lustre: Skipped 179 previous similar messages [993635.287727] Lustre: MGS: Connection restored to 71c87d53-c72b-cd9c-42aa-ab5a9c65a67c (at 10.149.3.241@o2ib313) [993635.287732] Lustre: Skipped 13 previous similar messages [993930.436335] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [993930.469829] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.42.187@o2ib (216): c: 32, oc: 0, rc: 32 [993942.436760] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [993942.470268] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.47.42@o2ib (221): c: 32, oc: 0, rc: 32 [994010.439193] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [994010.472691] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.22.12@o2ib (303): c: 32, oc: 0, rc: 32 [994032.440025] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [994032.473527] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.42.188@o2ib (303): c: 32, oc: 0, rc: 32 [994238.374233] Lustre: MGS: Connection restored to 88520345-6585-514a-7c42-dce3bd8628de (at 10.151.44.158@o2ib) [994238.374239] Lustre: Skipped 197 previous similar messages [994845.640918] Lustre: MGS: Connection restored to d4800b89-24b7-51b4-e281-fc0f84c3307e (at 10.151.57.152@o2ib) [994845.640924] Lustre: Skipped 93 previous similar messages [995041.568307] Lustre: nbp8-MDT0000: haven't heard from client 2bd63bef-3ea6-1b74-1f0c-3f254a08d93f (at 10.151.37.61@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8998c3bd8000, cur 1591677132 expire 1591676982 last 1591676905 [995041.640716] Lustre: Skipped 1 previous similar message [995146.481251] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [995146.514746] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.37.61@o2ib (331): c: 30, oc: 0, rc: 32 [995551.932335] Lustre: MGS: Connection restored to d38dd9b7-dfbf-65ea-8796-18d2d22159ce (at 10.151.50.159@o2ib) [995551.932341] Lustre: Skipped 187 previous similar messages [995738.502974] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [995738.536466] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.16.156@o2ib (299): c: 32, oc: 0, rc: 32 [996190.679404] Lustre: MGS: Connection restored to 6615d197-b537-f4cf-efa7-5b1037dff3c3 (at 10.151.4.35@o2ib) [996190.679410] Lustre: Skipped 161 previous similar messages [996805.542416] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [996805.575916] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.47.213@o2ib (297): c: 32, oc: 0, rc: 32 [996810.542609] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [996810.576102] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.47.223@o2ib (303): c: 32, oc: 0, rc: 32 [996834.696402] Lustre: MGS: Connection restored to 7d303cb4-41f2-8284-eeb9-c4f1abe168f1 (at 10.151.56.60@o2ib) [996834.696407] Lustre: Skipped 75 previous similar messages [996841.543693] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [996841.577193] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.33.232@o2ib (284): c: 32, oc: 0, rc: 32 [996922.888954] LNet: 60513:0:(o2iblnd_cb.c:2602:kiblnd_passive_connect()) Conn stale 10.151.55.171@o2ib version 12/12 incarnation 1589141592495532/1591678991112113 [996922.936507] LNet: 60513:0:(o2iblnd_cb.c:2602:kiblnd_passive_connect()) Skipped 2 previous similar messages [997563.406038] Lustre: MGS: Connection restored to 27f6bc99-fa46-3342-3425-a80574d77548 (at 10.151.45.58@o2ib) [997563.406043] Lustre: Skipped 193 previous similar messages [998364.860942] Lustre: MGS: Connection restored to 8b21571a-96da-f137-65d4-705bccf9aa99 (at 10.151.34.136@o2ib) [998364.860948] Lustre: Skipped 195 previous similar messages [998964.930627] Lustre: nbp8-MDT0000: Connection restored to 943ea180-ea22-36eb-a53b-ff23c42c3358 (at 10.151.32.125@o2ib) [998964.930633] Lustre: Skipped 174 previous similar messages [999746.976254] Lustre: MGS: Connection restored to db4ee27c-e5e9-d53c-a88b-955452af9b06 (at 10.151.4.105@o2ib) [999746.976259] Lustre: Skipped 34 previous similar messages [1000186.408994] LNet: 80657:0:(o2iblnd_cb.c:2602:kiblnd_passive_connect()) Conn stale 10.151.52.51@o2ib version 12/12 incarnation 1589130198652604/1591682257881996 [1000361.545710] Lustre: MGS: Connection restored to 7842a763-7e7e-f15b-23db-b9257f797a4e (at 10.151.32.189@o2ib) [1000361.545715] Lustre: Skipped 55 previous similar messages [1001079.991810] Lustre: MGS: Connection restored to 49ce8880-72b8-e3e9-3fe0-f12f13be0aa5 (at 10.151.47.77@o2ib) [1001079.991816] Lustre: Skipped 285 previous similar messages [1001219.704765] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1001219.738540] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.43.167@o2ib (303): c: 31, oc: 0, rc: 32 [1001246.705764] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1001246.739563] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [1001246.773051] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.43.120@o2ib (331): c: 31, oc: 0, rc: 32 [1001246.814258] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [1001248.705859] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1001248.739643] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.43.124@o2ib (331): c: 31, oc: 0, rc: 32 [1001252.706014] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1001252.739808] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [1001252.773288] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.43.132@o2ib (335): c: 31, oc: 0, rc: 32 [1001252.814495] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [1001257.706101] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1001257.739877] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.43.141@o2ib (340): c: 31, oc: 0, rc: 32 [1001267.706475] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1001267.740259] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [1001267.773755] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.43.161@o2ib (349): c: 31, oc: 0, rc: 32 [1001267.814955] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [1001422.136560] LNet: 80657:0:(o2iblnd_cb.c:2602:kiblnd_passive_connect()) Conn stale 10.151.56.36@o2ib version 12/12 incarnation 1588859517164949/1591683480369172 [1001707.616594] Lustre: MGS: Connection restored to 972dcacb-ba5f-8ba1-3d53-0d817e2fed0d (at 10.151.10.204@o2ib) [1001707.616606] Lustre: Skipped 557 previous similar messages [1002334.644607] Lustre: MGS: Connection restored to bb4072cc-6e46-6c9c-9c28-977a41f542c0 (at 10.151.3.201@o2ib) [1002334.644613] Lustre: Skipped 165 previous similar messages [1002865.440434] LNet: 60513:0:(o2iblnd_cb.c:2602:kiblnd_passive_connect()) Conn stale 10.151.54.50@o2ib version 12/12 incarnation 1588924737595386/1591684917053557 [1002997.862360] Lustre: MGS: haven't heard from client 554f6059-339c-46ef-e693-b4c3cc6c11c1 (at 10.151.50.85@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a0496ac000, cur 1591685088 expire 1591684938 last 1591684861 [1002997.932465] Lustre: Skipped 1 previous similar message [1003067.707730] Lustre: MGS: Connection restored to e08d5ac5-b5ac-4011-d8f0-75070c2d8b8c (at 10.151.4.111@o2ib) [1003067.707736] Lustre: Skipped 85 previous similar messages [1003084.773216] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1003084.807000] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 2 previous similar messages [1003084.840767] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.50.85@o2ib (314): c: 30, oc: 0, rc: 32 [1003084.881686] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 2 previous similar messages [1003678.573742] Lustre: MGS: Connection restored to 00b65975-706e-237b-a62e-8ad1a87881f6 (at 10.151.6.65@o2ib) [1003678.573747] Lustre: Skipped 183 previous similar messages [1004328.375859] Lustre: MGS: Connection restored to 9294dc9d-5650-49cc-5b53-485d1d73f63e (at 10.151.7.95@o2ib) [1004328.375865] Lustre: Skipped 29 previous similar messages [1004632.438138] Process accounting resumed [1005162.321757] Lustre: MGS: Connection restored to c5cd8558-c80d-2eac-bd6a-59b604778aff (at 10.151.32.52@o2ib) [1005162.321762] Lustre: Skipped 321 previous similar messages [1005676.869088] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [1005676.902865] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.15.88@o2ib (284): c: 32, oc: 0, rc: 32 [1005766.263706] Lustre: MGS: Connection restored to 0b96ecbf-14c7-1460-760a-130f9f3fd46a (at 10.151.7.31@o2ib) [1005766.263711] Lustre: Skipped 147 previous similar messages [1006557.978205] Lustre: MGS: Connection restored to e0836992-1f62-d3aa-2f94-6b0b9e43d292 (at 10.151.47.242@o2ib) [1006557.978211] Lustre: Skipped 87 previous similar messages [1006741.995983] Lustre: nbp8-MDT0000: haven't heard from client 971f7a91-f4a8-bbcd-2167-fc39820110bc (at 10.151.1.208@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a2d86a2c00, cur 1591688832 expire 1591688682 last 1591688605 [1006742.068689] Lustre: Skipped 1 previous similar message [1006822.911002] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [1006822.944791] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.1.208@o2ib (306): c: 31, oc: 0, rc: 32 [1007192.674415] Lustre: MGS: Connection restored to 9294dc9d-5650-49cc-5b53-485d1d73f63e (at 10.151.7.95@o2ib) [1007192.674421] Lustre: Skipped 5 previous similar messages [1007247.925599] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1007247.959385] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.1.48@o2ib (328): c: 31, oc: 0, rc: 32 [1008082.413018] Lustre: MGS: Connection restored to bb4072cc-6e46-6c9c-9c28-977a41f542c0 (at 10.151.3.201@o2ib) [1008082.413024] Lustre: Skipped 35 previous similar messages [1008437.058917] Lustre: MGS: haven't heard from client 7e3df4f3-3287-c7f6-8433-b9b498c95c64 (at 10.151.11.93@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899b7d3b6c00, cur 1591690527 expire 1591690377 last 1591690300 [1008437.129017] Lustre: Skipped 1 previous similar message [1008550.973029] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1008551.006816] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.11.93@o2ib (340): c: 30, oc: 0, rc: 32 [1008697.404636] Lustre: MGS: Connection restored to 56c34b46-107b-c417-59a1-22971c6a96f6 (at 10.151.7.29@o2ib) [1008697.404642] Lustre: Skipped 11 previous similar messages [1009313.764143] Lustre: MGS: Connection restored to c538ccde-4aa3-f5d4-3d05-4413453a60cb (at 10.151.3.65@o2ib) [1009313.764148] Lustre: Skipped 57 previous similar messages [1009917.942555] Lustre: MGS: Connection restored to 0bd9b7ca-0378-55cb-c2a8-81782ffe3c16 (at 10.151.3.50@o2ib) [1009917.942560] Lustre: Skipped 1845 previous similar messages [1010760.788077] Lustre: MGS: Connection restored to 6e57124e-fba3-98a7-adef-acf64a98e4cb (at 10.151.38.135@o2ib) [1010760.788082] Lustre: Skipped 17 previous similar messages [1010800.055252] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1010800.089047] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.10.201@o2ib (303): c: 32, oc: 0, rc: 32 [1011353.075518] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1011353.109288] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.6.109@o2ib (284): c: 32, oc: 0, rc: 32 [1011500.958451] Lustre: MGS: Connection restored to eceb1ba1-7cfb-266a-a523-80c4031c63cb (at 10.151.23.94@o2ib) [1011500.958457] Lustre: Skipped 65 previous similar messages [1012396.105053] Lustre: MGS: Connection restored to 8d043a39-b6de-1387-8467-148c1edfc73a (at 10.151.3.237@o2ib) [1012396.105059] Lustre: Skipped 257 previous similar messages [1012553.119529] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1012553.153313] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.22.29@o2ib (303): c: 32, oc: 0, rc: 32 [1013555.494745] Lustre: MGS: Connection restored to 3d443b54-e6b1-0f67-9fb5-7ef7ebfdc86e (at 10.151.17.177@o2ib) [1013555.494751] Lustre: Skipped 237 previous similar messages [1014171.648029] Lustre: MGS: Connection restored to e80db3c5-0534-ee6e-fead-35f902e17ce9 (at 10.141.7.65@o2ib417) [1014171.648034] Lustre: Skipped 297 previous similar messages [1014771.670056] Lustre: nbp8-MDT0000: Connection restored to 90604765-caff-7671-ea28-994c180d8eaf (at 10.151.0.58@o2ib) [1014771.670061] Lustre: Skipped 83 previous similar messages [1015431.429199] Lustre: MGS: Connection restored to 29dde62b-0e2e-476a-1bb6-8c361754d9fe (at 10.149.2.199@o2ib313) [1015431.429205] Lustre: Skipped 385 previous similar messages [1016233.446475] Lustre: MGS: Connection restored to 49f481b0-dba9-7cfc-c2a1-685a8959124b (at 10.151.4.164@o2ib) [1016233.446480] Lustre: Skipped 85 previous similar messages [1016892.188807] Lustre: MGS: Connection restored to 65a2b494-0684-d1ff-2f94-468bfcb19806 (at 10.151.24.199@o2ib) [1016892.188812] Lustre: Skipped 337 previous similar messages [1017689.324123] Lustre: MGS: Connection restored to 2e3a6d7c-8dbd-97d3-92e2-7d36337df66d (at 10.141.6.223@o2ib417) [1017689.324129] Lustre: Skipped 19 previous similar messages [1018367.606483] Lustre: MGS: Connection restored to ae7f5195-05d8-c3ae-29a5-6dfe0474bd14 (at 10.149.2.250@o2ib313) [1018367.606489] Lustre: Skipped 439 previous similar messages [1019389.209940] Lustre: MGS: Connection restored to 5bd7d97e-86ae-cf0c-cb5e-230795c2bfbb (at 10.151.18.155@o2ib) [1019389.209946] Lustre: Skipped 23 previous similar messages [1019757.385980] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [1019757.419772] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.53.138@o2ib (284): c: 32, oc: 0, rc: 32 [1019801.386615] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1019801.420401] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.51.31@o2ib (303): c: 32, oc: 0, rc: 32 [1020053.303983] Lustre: MGS: Connection restored to 307d0fde-8cf3-4b20-5a05-74ee35233e96 (at 10.151.24.43@o2ib) [1020053.303989] Lustre: Skipped 111 previous similar messages [1020616.505976] Lustre: nbp8-MDT0000: haven't heard from client 1f211c5b-7d58-33e6-1c30-d998bbe2bac6 (at 10.151.13.26@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89970e678000, cur 1591702706 expire 1591702556 last 1591702479 [1020616.578666] Lustre: Skipped 1 previous similar message [1020742.421377] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1020742.455163] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.13.26@o2ib (352): c: 30, oc: 0, rc: 32 [1020783.026492] Lustre: MGS: Connection restored to ecf4a68d-80b7-25a7-4638-2856a6d6f2ea (at 10.141.6.2@o2ib417) [1020783.026498] Lustre: Skipped 27 previous similar messages [1021021.431678] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1021021.465473] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.1.222@o2ib (304): c: 31, oc: 0, rc: 32 [1021045.432628] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1021045.466402] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.13.27@o2ib (327): c: 31, oc: 0, rc: 32 [1021065.433319] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1021065.467098] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.13.66@o2ib (347): c: 31, oc: 0, rc: 32 [1021498.980818] Lustre: MGS: Connection restored to c81ceb9a-57db-738f-4859-b8577b295a5a (at 10.149.3.97@o2ib313) [1021498.980823] Lustre: Skipped 81 previous similar messages [1022386.233037] Lustre: MGS: Connection restored to e175d4e3-1e46-a1b5-aaee-e3cc5a96d23e (at 10.149.15.147@o2ib313) [1022386.233043] Lustre: Skipped 79 previous similar messages [1023074.564748] Lustre: nbp8-MDT0000: Connection restored to bc685a4f-b38b-fad4-a8d9-e770a2869a6c (at 10.151.47.61@o2ib) [1023074.564753] Lustre: Skipped 390 previous similar messages [1023806.067513] Lustre: MGS: Connection restored to 15b1f41d-59fe-10c1-b45e-04d4ef27cde5 (at 10.151.37.127@o2ib) [1023806.067518] Lustre: Skipped 16 previous similar messages [1024051.543759] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1024051.577559] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [1024051.611041] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.1.21@o2ib (334): c: 31, oc: 0, rc: 32 [1024051.651676] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [1024626.311015] Lustre: MGS: Connection restored to 53e0e5e7-18c2-1cb8-3ca0-e1e0dd4507e1 (at 10.151.34.54@o2ib) [1024626.311020] Lustre: Skipped 37 previous similar messages [1026130.472400] Lustre: MGS: Connection restored to 81df8a59-e8bc-88a0-d563-cb62440a832e (at 10.149.14.149@o2ib313) [1026130.472406] Lustre: Skipped 167 previous similar messages [1026584.263676] Lustre: MGS: Connection restored to aeb269d8-ec7e-5b81-5100-33256f0743fb (at 10.149.13.55@o2ib313) [1026584.263682] Lustre: Skipped 60 previous similar messages [1027154.156402] Lustre: MGS: Connection restored to cb4822e0-ff41-f7e8-f955-4bf7e8a1bb1e (at 10.151.35.187@o2ib) [1027154.156408] Lustre: Skipped 58 previous similar messages [1027574.600435] Lustre: MGS: Connection restored to 5dee1a2b-1d7c-fe4c-9413-fb0319a28a88 (at 10.151.32.167@o2ib) [1027574.600441] Lustre: Skipped 11 previous similar messages [1028209.880832] Lustre: MGS: Connection restored to 4959b8b7-4143-abaa-d9a1-19efda94e515 (at 10.151.28.58@o2ib) [1028209.880838] Lustre: Skipped 149 previous similar messages [1028905.465098] Lustre: MGS: Connection restored to 89cc15ac-6f13-df2b-0e85-9d617cc09f00 (at 10.151.35.206@o2ib) [1028905.465104] Lustre: Skipped 31 previous similar messages [1029510.545604] Lustre: MGS: Connection restored to cc3d2dce-dd3c-85b8-25e9-848c34396312 (at 10.151.19.216@o2ib) [1029510.545610] Lustre: Skipped 463 previous similar messages [1030608.170231] Lustre: MGS: Connection restored to 858187eb-1489-d002-5e8c-38d04812ece2 (at 10.151.54.96@o2ib) [1030608.170236] Lustre: Skipped 3 previous similar messages [1031240.207696] Lustre: MGS: Connection restored to 2c9d4e57-42fd-a896-94e8-41c9202ddc0e (at 10.149.5.99@o2ib313) [1031240.207702] Lustre: Skipped 275 previous similar messages [1031832.830804] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1031832.864598] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.24.99@o2ib (303): c: 32, oc: 0, rc: 32 [1032135.245724] Lustre: MGS: Connection restored to c538ccde-4aa3-f5d4-3d05-4413453a60cb (at 10.151.3.65@o2ib) [1032135.245730] Lustre: Skipped 57 previous similar messages [1032739.868237] Lustre: MGS: Connection restored to 3d5bd4d8-26c1-97de-13b9-8641a12a520e (at 10.149.2.34@o2ib313) [1032739.868242] Lustre: Skipped 73 previous similar messages [1032950.963033] Lustre: MGS: haven't heard from client 6ff797d8-cafb-aa1d-26d1-3d802627c7a5 (at 10.141.3.78@o2ib417) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897cfee43800, cur 1591715040 expire 1591714890 last 1591714813 [1032951.033721] Lustre: Skipped 1 previous similar message [1033282.987135] Lustre: MGS: haven't heard from client 49c815b1-811e-9fad-0cfb-f95f82bade64 (at 10.153.13.68@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89809e3dcc00, cur 1591715372 expire 1591715222 last 1591715145 [1033283.058099] Lustre: Skipped 1 previous similar message [1033287.974423] Lustre: nbp8-MDT0000: haven't heard from client 0c6363d8-6dfb-a299-ebb9-201a62daaabb (at 10.153.13.68@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8995de34e400, cur 1591715377 expire 1591715227 last 1591715150 [1033288.047971] Lustre: Skipped 1 previous similar message [1033436.262864] Lustre: MGS: Connection restored to 9b64af46-68f2-bd57-f47e-c2878cfd63cb (at 10.151.42.184@o2ib) [1033436.262875] Lustre: Skipped 453 previous similar messages [1034145.972271] Lustre: MGS: Connection restored to 1c9775ac-da7a-317b-f5a4-3d31d5bed146 (at 10.151.30.152@o2ib) [1034145.972276] Lustre: Skipped 231 previous similar messages [1034763.531767] Lustre: MGS: Connection restored to ffda25b4-1c2a-fbe3-d745-a212c77598ed (at 10.151.54.110@o2ib) [1034763.531773] Lustre: Skipped 85 previous similar messages [1035812.751318] Lustre: MGS: Connection restored to 29dde62b-0e2e-476a-1bb6-8c361754d9fe (at 10.149.2.199@o2ib313) [1035812.751323] Lustre: Skipped 39 previous similar messages [1036457.750513] Lustre: MGS: Connection restored to 2895710d-d29d-6819-06a0-8a48b976a4cd (at 10.149.10.58@o2ib313) [1036457.750518] Lustre: Skipped 523 previous similar messages [1036547.004334] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1036547.038118] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.44.65@o2ib (302): c: 32, oc: 0, rc: 32 [1036548.004382] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1036548.038169] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.44.67@o2ib (301): c: 32, oc: 0, rc: 32 [1036560.004851] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1036560.038637] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [1036560.072130] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.49.124@o2ib (273): c: 32, oc: 0, rc: 32 [1036560.113333] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [1036563.004943] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1036563.038723] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.45.144@o2ib (303): c: 32, oc: 0, rc: 32 [1036621.006990] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1036621.040777] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.45.158@o2ib (303): c: 32, oc: 0, rc: 32 [1037086.580430] Lustre: MGS: Connection restored to 1b7e04f8-4a7f-c705-04a0-21dce89fd0a5 (at 10.151.36.81@o2ib) [1037086.580435] Lustre: Skipped 557 previous similar messages [1037940.004792] Lustre: MGS: Connection restored to ff492679-28a1-9f19-cdfc-0f379ee2ee88 (at 10.151.54.104@o2ib) [1037940.004797] Lustre: Skipped 247 previous similar messages [1038547.706623] Lustre: MGS: Connection restored to 156add0e-d5f0-7e69-2d05-74d6d9719e08 (at 10.151.32.186@o2ib) [1038547.706628] Lustre: Skipped 107 previous similar messages [1038602.169096] Lustre: MGS: haven't heard from client 43417b20-9044-2d51-4300-78111f71cb53 (at 10.153.13.68@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8982e75be000, cur 1591720691 expire 1591720541 last 1591720464 [1038602.240056] Lustre: Skipped 1 previous similar message [1038614.170694] Lustre: nbp8-MDT0000: haven't heard from client 77077957-369e-a22a-118c-6d6ed2955807 (at 10.153.13.68@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a3ff742400, cur 1591720703 expire 1591720553 last 1591720476 [1038970.093260] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1038970.127058] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [1038970.160560] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.22.38@o2ib (288): c: 32, oc: 0, rc: 32 [1038970.201474] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [1039279.433224] Lustre: MGS: Connection restored to 6e401d61-aeeb-ad8e-91fa-e00da621907c (at 10.151.24.71@o2ib) [1039279.433229] Lustre: Skipped 77 previous similar messages [1039885.363652] Lustre: MGS: Connection restored to feadee23-5be4-8ec9-eded-61b63f8a07bd (at 10.151.4.81@o2ib) [1039885.363658] Lustre: Skipped 77 previous similar messages [1040166.683941] LNet: 638:0:(o2iblnd_cb.c:2602:kiblnd_passive_connect()) Conn stale 10.151.14.198@o2ib version 12/12 incarnation 1588869892706790/1591722190761726 [1040181.137674] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1040181.171444] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.22.36@o2ib (269): c: 32, oc: 0, rc: 32 [1040199.228633] Lustre: nbp8-MDT0000: haven't heard from client 6ed50a08-841b-ca72-f39f-f8227bffda45 (at 10.153.13.68@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8997b331f400, cur 1591722288 expire 1591722138 last 1591722061 [1040508.291472] Lustre: MGS: Connection restored to 893931df-c52c-d77e-cef6-c68054d1ac4d (at 10.151.29.100@o2ib) [1040508.291477] Lustre: Skipped 279 previous similar messages [1040814.160976] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1040814.194757] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.24.84@o2ib (303): c: 32, oc: 0, rc: 32 [1040823.161250] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1040823.195036] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.8.56@o2ib (305): c: 31, oc: 0, rc: 32 [1041193.465434] Lustre: MGS: Connection restored to d38032a3-3e4d-922f-c667-dd50752ef09d (at 10.151.9.148@o2ib) [1041193.465439] Lustre: Skipped 83 previous similar messages [1042124.458348] Lustre: MGS: Connection restored to 0062c48e-fd95-e7c8-3c99-a3108eb3d264 (at 10.151.22.210@o2ib) [1042124.458354] Lustre: Skipped 297 previous similar messages [1042537.225246] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [1042537.259028] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.38.249@o2ib (209): c: 32, oc: 0, rc: 32 [1042571.225404] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1042571.259187] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.42.201@o2ib (303): c: 32, oc: 0, rc: 32 [1042574.226575] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1042574.260362] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [1042574.293849] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.38.222@o2ib (290): c: 32, oc: 0, rc: 32 [1042574.335057] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [1042578.225725] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1042578.259510] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [1042578.293005] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.38.230@o2ib (304): c: 32, oc: 0, rc: 32 [1042578.334204] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [1042622.227342] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1042622.261125] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.40.210@o2ib (303): c: 32, oc: 0, rc: 32 [1042906.747757] Lustre: MGS: Connection restored to f14dafe9-c388-dd16-441f-94320a3001dc (at 10.151.55.148@o2ib) [1042906.747762] Lustre: Skipped 85 previous similar messages [1043254.250540] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1043254.284324] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 2 previous similar messages [1043254.318104] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.48.234@o2ib (336): c: 31, oc: 0, rc: 32 [1043254.359311] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 2 previous similar messages [1043568.524283] Lustre: MGS: Connection restored to 6f0910f4-8f64-ae8a-18a5-2fe6b92461ca (at 10.151.35.41@o2ib) [1043568.524288] Lustre: Skipped 153 previous similar messages [1043825.271515] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1043825.305306] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.4.217@o2ib (306): c: 31, oc: 0, rc: 32 [1044169.082547] Lustre: MGS: Connection restored to 3d234f71-d483-f5f5-3158-c54bb14df38b (at 10.151.32.180@o2ib) [1044169.082553] Lustre: Skipped 327 previous similar messages [1044867.567814] Lustre: MGS: Connection restored to 80bf753f-919a-7d64-0e61-8f4cda6fefd5 (at 10.151.3.239@o2ib) [1044867.567820] Lustre: Skipped 77 previous similar messages [1045601.651206] Lustre: MGS: Connection restored to 031b7382-ba9d-ae8a-7659-9ab72a690907 (at 10.151.53.77@o2ib) [1045601.651212] Lustre: Skipped 85 previous similar messages [1045773.342873] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1045773.376660] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.28.90@o2ib (227): c: 32, oc: 0, rc: 32 [1045809.344131] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [1045809.377918] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.30.154@o2ib (303): c: 32, oc: 0, rc: 32 [1045853.345814] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1045853.379592] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [1045853.413087] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.30.142@o2ib (304): c: 32, oc: 0, rc: 32 [1045853.454299] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [1046520.477383] Lustre: MGS: Connection restored to f94b5f0c-6713-efba-5939-3070e91c2a0e (at 10.149.1.58@o2ib313) [1046520.477388] Lustre: Skipped 93 previous similar messages [1047281.055061] Lustre: MGS: Connection restored to 2b23b591-a19c-4d6f-31c8-d5c2441663c0 (at 10.151.52.134@o2ib) [1047281.055065] Lustre: Skipped 391 previous similar messages [1047963.423182] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1047963.456968] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.37.45@o2ib (232): c: 32, oc: 0, rc: 32 [1048008.278459] Lustre: MGS: Connection restored to aec35f14-d54b-88cd-1609-e315094e9e45 (at 10.151.3.172@o2ib) [1048008.278464] Lustre: Skipped 51 previous similar messages [1048147.429864] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1048147.463651] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.17.78@o2ib (303): c: 32, oc: 0, rc: 32 [1048626.441144] Lustre: MGS: Connection restored to 1f555acf-bd50-a4e5-1afc-91ac8c8c577f (at 10.151.29.170@o2ib) [1048626.441150] Lustre: Skipped 207 previous similar messages [1049154.558726] Lustre: MGS: haven't heard from client 549979d2-d47f-f28f-d010-adda46d5098d (at 10.151.12.130@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897d44e42c00, cur 1591731243 expire 1591731093 last 1591731016 [1049154.629123] Lustre: Skipped 1 previous similar message [1049250.470342] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1049250.504114] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.12.130@o2ib (321): c: 30, oc: 0, rc: 32 [1049326.119688] Lustre: MGS: Connection restored to 328daad6-f92f-2ad0-50e0-f4976da7d75b (at 10.151.51.123@o2ib) [1049326.119694] Lustre: Skipped 13 previous similar messages [1049968.351099] Lustre: MGS: Connection restored to f6853fbf-0f1a-b19e-c782-fe1144dc72c7 (at 10.151.56.126@o2ib) [1049968.351104] Lustre: Skipped 137 previous similar messages [1050195.595743] Lustre: nbp8-MDT0000: haven't heard from client 262ddf6b-d492-bbcf-add1-171b7e4f439e (at 10.151.5.102@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897d1aaa0400, cur 1591732284 expire 1591732134 last 1591732057 [1050195.668427] Lustre: Skipped 1 previous similar message [1050269.507787] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1050269.541573] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.39.104@o2ib (303): c: 32, oc: 0, rc: 32 [1050308.510279] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1050308.544067] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.5.100@o2ib (335): c: 30, oc: 0, rc: 32 [1050310.509286] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1050310.543074] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 2 previous similar messages [1050310.576845] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.5.103@o2ib (337): c: 30, oc: 0, rc: 32 [1050310.617757] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 2 previous similar messages [1050690.935339] Lustre: MGS: Connection restored to 61ea0174-0851-5990-e590-0534fdab580d (at 10.151.32.153@o2ib) [1050690.935344] Lustre: Skipped 305 previous similar messages [1051419.762362] Lustre: MGS: Connection restored to 44ce97f7-e720-c591-7489-ea93c85acbdb (at 10.151.50.139@o2ib) [1051419.762368] Lustre: Skipped 27 previous similar messages [1052027.398804] Lustre: MGS: Connection restored to 089d4ef2-0097-6e47-6111-e773f977b0e3 (at 10.151.31.29@o2ib) [1052027.398809] Lustre: Skipped 635 previous similar messages [1052668.770512] Lustre: MGS: Connection restored to 1c03a2f2-8238-c068-ce83-5e493704ce31 (at 10.151.34.142@o2ib) [1052668.770518] Lustre: Skipped 109 previous similar messages [1052750.599048] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1052750.632835] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.39.17@o2ib (291): c: 32, oc: 0, rc: 32 [1053293.918254] Lustre: MGS: Connection restored to 29dedb50-7cc6-c9bf-61ae-126dce434f2a (at 10.151.47.63@o2ib) [1053293.918260] Lustre: Skipped 85 previous similar messages [1053907.173506] Lustre: MGS: Connection restored to a7b9953f-8c62-cf9a-7d44-7a4717d720e5 (at 10.151.43.107@o2ib) [1053907.173511] Lustre: Skipped 225 previous similar messages [1054508.754356] Lustre: nbp8-MDT0000: haven't heard from client a4067226-1ca9-264e-4816-42e0992f540b (at 10.151.35.82@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897e53e56400, cur 1591736597 expire 1591736447 last 1591736370 [1054508.827031] Lustre: Skipped 7 previous similar messages [1054524.664288] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1054524.698048] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.34.98@o2ib (293): c: 32, oc: 0, rc: 32 [1054543.394840] Lustre: MGS: Connection restored to 586ad635-400b-028e-8da8-8d8444c1452e (at 10.151.18.133@o2ib) [1054543.394846] Lustre: Skipped 101 previous similar messages [1054543.664927] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1054543.698719] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.37.175@o2ib (295): c: 32, oc: 0, rc: 32 [1054594.666811] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1054594.700598] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.35.82@o2ib (312): c: 30, oc: 0, rc: 32 [1055239.436582] Lustre: MGS: Connection restored to 5e604969-60f7-8b28-03e1-624aa76d6742 (at 10.153.17.93@o2ib233) [1055239.436591] Lustre: Skipped 119 previous similar messages [1055242.780977] Lustre: MGS: haven't heard from client 8ab6f42c-e5bc-fadf-a0ee-5704ac60d70b (at 10.153.14.21@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899fc5faa000, cur 1591737331 expire 1591737181 last 1591737104 [1055242.851936] Lustre: Skipped 1 previous similar message [1055253.781089] Lustre: nbp8-MDT0000: haven't heard from client 1ddf0f9e-1ad3-0389-669e-2cf58ee473f5 (at 10.153.14.21@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8996f2794000, cur 1591737342 expire 1591737192 last 1591737115 [1055253.854622] Lustre: Skipped 1 previous similar message [1055633.799099] Lustre: MGS: haven't heard from client 86fc7e7e-5d77-3ab8-9fbe-585eecc1c198 (at 10.153.16.104@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897c5f5ef800, cur 1591737722 expire 1591737572 last 1591737495 [1055633.870355] Lustre: Skipped 1 previous similar message [1055646.799706] Lustre: nbp8-MDT0000: haven't heard from client 3d42f5c6-113d-6340-733a-11c6326b137c (at 10.153.16.104@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8974c979d400, cur 1591737735 expire 1591737585 last 1591737508 [1055842.485771] Lustre: MGS: Connection restored to 866c4eff-b146-65e5-e50c-0f750d6dd23c (at 10.151.32.27@o2ib) [1055842.485777] Lustre: Skipped 39 previous similar messages [1055977.809844] Lustre: MGS: haven't heard from client 40125524-8ed9-8b66-c341-b6cd66b4ce52 (at 10.153.16.71@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8974b5f80000, cur 1591738066 expire 1591737916 last 1591737839 [1055987.808066] Lustre: nbp8-MDT0000: haven't heard from client 9c8692a2-a4ec-e73f-77ce-9e1072cc7126 (at 10.153.16.71@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a1c2352400, cur 1591738076 expire 1591737926 last 1591737849 [1056459.134188] Lustre: MGS: Connection restored to 0f6abe20-2ce7-7a91-deab-ea99275f92d8 (at 10.151.34.75@o2ib) [1056459.134198] Lustre: Skipped 61 previous similar messages [1056747.835940] Lustre: MGS: haven't heard from client 6825c2f1-8caf-c451-166b-6f26ea7aef66 (at 10.153.17.232@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8983625be400, cur 1591738836 expire 1591738686 last 1591738609 [1056771.836048] Lustre: nbp8-MDT0000: haven't heard from client a3b0d2ac-d3e0-7317-2de6-d73e8f384fe0 (at 10.153.17.232@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897ab36b8c00, cur 1591738860 expire 1591738710 last 1591738633 [1057146.638230] Lustre: MGS: Connection restored to 1b3c2708-e86e-b1b6-d429-56c0fef74a72 (at 10.151.54.129@o2ib) [1057146.638236] Lustre: Skipped 247 previous similar messages [1057168.852325] Lustre: nbp8-MDT0000: haven't heard from client dbdff1e3-9993-d321-425f-33a2178c10f1 (at 10.153.17.224@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899756bd8800, cur 1591739257 expire 1591739107 last 1591739030 [1057330.856554] Lustre: MGS: haven't heard from client 0a7adeca-6ae4-ae73-d612-02261d79f061 (at 10.153.14.21@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8982c319c800, cur 1591739419 expire 1591739269 last 1591739192 [1057330.927531] Lustre: Skipped 1 previous similar message [1057351.858272] Lustre: nbp8-MDT0000: haven't heard from client e36b7116-5c1b-841f-31b4-55f58602e47f (at 10.153.14.21@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8998f0361800, cur 1591739440 expire 1591739290 last 1591739213 [1057970.598043] Lustre: MGS: Connection restored to fd137b05-de4d-e994-140c-3cf68f895aef (at 10.151.12.63@o2ib) [1057970.598049] Lustre: Skipped 269 previous similar messages [1058578.516655] Lustre: MGS: Connection restored to 36bb3628-d6b4-827c-6cd5-e3f011304203 (at 10.151.2.102@o2ib) [1058578.516661] Lustre: Skipped 333 previous similar messages [1058820.822074] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1058820.855854] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.45.217@o2ib (302): c: 31, oc: 0, rc: 32 [1058844.823004] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1058844.856799] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.44.219@o2ib (321): c: 31, oc: 0, rc: 32 [1058848.824093] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [1058848.857879] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.44.226@o2ib (328): c: 31, oc: 0, rc: 32 [1058856.823461] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1058856.857239] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [1058856.890724] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.44.243@o2ib (334): c: 31, oc: 0, rc: 32 [1058856.931933] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [1059200.497684] Lustre: MGS: Connection restored to 1058dcb2-b933-c149-91bc-0be07c9cb13d (at 10.151.14.122@o2ib) [1059200.497690] Lustre: Skipped 439 previous similar messages [1059425.934862] Lustre: MGS: haven't heard from client d2a839b4-40ba-6aab-f708-36428401eb4e (at 10.153.17.73@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897ed1da6000, cur 1591741514 expire 1591741364 last 1591741287 [1059432.934633] Lustre: nbp8-MDT0000: haven't heard from client 77f2af8d-daea-582a-f255-1e4b2282c5a2 (at 10.153.17.73@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8979528db400, cur 1591741521 expire 1591741371 last 1591741294 [1059813.818112] Lustre: MGS: Connection restored to 0ff215ad-d6ce-8493-1573-0528507cda94 (at 10.149.1.21@o2ib313) [1059813.818117] Lustre: Skipped 7 previous similar messages [1060510.243018] Lustre: MGS: Connection restored to 5393f2b0-b0ac-883e-a041-c9e5d6171d32 (at 10.151.38.247@o2ib) [1060510.243024] Lustre: Skipped 25 previous similar messages [1060581.977446] Lustre: nbp8-MDT0000: haven't heard from client 61d4f7ca-5f54-3c56-eed2-e5918a3a01cf (at 10.153.12.13@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897f165bf400, cur 1591742670 expire 1591742520 last 1591742443 [1060601.887326] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1060601.921122] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.50.76@o2ib (303): c: 32, oc: 0, rc: 32 [1061333.328587] Lustre: MGS: Connection restored to e9f3bdf1-8f9e-e083-bb61-ae0c122aee8a (at 10.149.9.11@o2ib313) [1061333.328593] Lustre: Skipped 177 previous similar messages [1062241.203600] Lustre: MGS: Connection restored to f1353eab-1156-148d-1557-38818fd36175 (at 10.151.8.26@o2ib) [1062241.203605] Lustre: Skipped 109 previous similar messages [1062854.119726] Lustre: MGS: Connection restored to 7842a763-7e7e-f15b-23db-b9257f797a4e (at 10.151.32.189@o2ib) [1062854.119731] Lustre: Skipped 189 previous similar messages [1063474.192250] Lustre: MGS: Connection restored to 6fed5dac-4937-afa6-c461-48bb933f248e (at 10.151.8.49@o2ib) [1063474.192256] Lustre: Skipped 103 previous similar messages [1064161.457307] Lustre: MGS: Connection restored to cc31e46b-2ff5-8348-b5b3-bbc46c20de59 (at 10.149.2.205@o2ib313) [1064161.457313] Lustre: Skipped 769 previous similar messages [1064801.588795] Lustre: MGS: Connection restored to 340a9ca9-5aed-2d98-d5cb-6efb78c73be3 (at 10.151.23.52@o2ib) [1064801.588801] Lustre: Skipped 1789 previous similar messages [1065401.896650] Lustre: MGS: Connection restored to 15753a52-7258-d72e-05e1-5bd43bff4107 (at 10.151.3.99@o2ib) [1065401.896655] Lustre: Skipped 81 previous similar messages [1066052.851926] Lustre: MGS: Connection restored to db667276-4973-4a4a-8224-95848fc65722 (at 10.151.37.116@o2ib) [1066052.851931] Lustre: Skipped 83 previous similar messages [1066292.184278] Lustre: nbp8-MDT0000: haven't heard from client 6757538b-7ec7-c73b-bd91-78dd36a5a4b9 (at 10.149.14.223@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89975d657c00, cur 1591748380 expire 1591748230 last 1591748153 [1066292.258118] Lustre: Skipped 1 previous similar message [1066736.623603] Lustre: MGS: Connection restored to b05e87a6-ad63-0a72-c414-f9c23ebc8dd7 (at 10.151.52.40@o2ib) [1066736.623608] Lustre: Skipped 67 previous similar messages [1066953.208496] Lustre: nbp8-MDT0000: haven't heard from client 09c86843-554a-a0d4-43de-ca29ffc2fc37 (at 10.151.33.83@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8999caba2c00, cur 1591749041 expire 1591748891 last 1591748814 [1066953.281161] Lustre: Skipped 1 previous similar message [1067065.123226] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1067065.157007] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.33.83@o2ib (338): c: 30, oc: 0, rc: 32 [1067066.123194] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1067066.156970] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.33.185@o2ib (339): c: 30, oc: 0, rc: 32 [1067078.123644] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1067078.157432] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.34.54@o2ib (303): c: 32, oc: 0, rc: 32 [1067336.655514] Lustre: MGS: Connection restored to b6f30dcb-a381-c875-3798-1bc6cf47834d (at 10.151.12.200@o2ib) [1067336.655521] Lustre: Skipped 94 previous similar messages [1068004.464560] Lustre: MGS: Connection restored to aec4ebbf-b9d5-a232-0513-99426e17af64 (at 10.151.18.52@o2ib) [1068004.464565] Lustre: Skipped 764 previous similar messages [1068628.180147] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1068628.213934] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.63.73@o2ib (303): c: 31, oc: 0, rc: 32 [1068628.590386] Lustre: MGS: Connection restored to 27c8a3ce-c503-bba9-967b-a7214d45e983 (at 10.151.63.73@o2ib) [1068628.590392] Lustre: Skipped 27 previous similar messages [1069237.538503] Lustre: MGS: Connection restored to 0e433bb0-d44d-fe03-5ff1-be159eeee052 (at 10.151.52.185@o2ib) [1069237.538508] Lustre: Skipped 195 previous similar messages [1070040.247625] Lustre: MGS: Connection restored to 753d37b1-be30-a9b6-e42b-73d588278e35 (at 10.151.12.172@o2ib) [1070040.247631] Lustre: Skipped 193 previous similar messages [1070814.876614] Lustre: MGS: Connection restored to 568947b0-5a0c-4b5a-7304-9bb4622aa4b5 (at 10.151.8.73@o2ib) [1070814.876619] Lustre: Skipped 205 previous similar messages [1071435.243407] Lustre: MGS: Connection restored to 121f69d4-adcb-b679-d459-d5fe7d82fbcc (at 10.149.15.67@o2ib313) [1071435.243412] Lustre: Skipped 145 previous similar messages [1071763.294652] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1071763.328427] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.3.89@o2ib (296): c: 32, oc: 0, rc: 32 [1071949.301517] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1071949.335313] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.63.50@o2ib (303): c: 32, oc: 0, rc: 32 [1071996.303220] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1071996.336991] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.24.129@o2ib (303): c: 32, oc: 0, rc: 32 [1072032.304475] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1072032.338263] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.17.73@o2ib (308): c: 31, oc: 0, rc: 32 [1072037.304721] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1072037.338506] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.17.185@o2ib (312): c: 31, oc: 0, rc: 32 [1072047.305025] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1072047.338796] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 3 previous similar messages [1072047.372564] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.17.103@o2ib (325): c: 31, oc: 0, rc: 32 [1072047.413766] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 3 previous similar messages [1072066.305717] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1072066.339503] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 3 previous similar messages [1072066.373278] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.17.141@o2ib (344): c: 31, oc: 0, rc: 32 [1072066.414485] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 3 previous similar messages [1072097.543918] Lustre: MGS: Connection restored to 1181e271-a412-496b-ad44-8bf244e4783a (at 10.151.55.158@o2ib) [1072097.543924] Lustre: Skipped 409 previous similar messages [1072356.316372] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1072356.350150] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 2 previous similar messages [1072356.383917] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.16.69@o2ib (295): c: 32, oc: 0, rc: 32 [1072356.424838] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 2 previous similar messages [1072739.117965] Lustre: MGS: Connection restored to acd1749a-6ba3-343e-4da5-b149f71c3079 (at 10.149.11.184@o2ib313) [1072739.117971] Lustre: Skipped 43 previous similar messages [1072780.422134] Lustre: nbp8-MDT0000: haven't heard from client 0accae6a-2401-7888-cff5-01a542e41868 (at 10.141.7.59@o2ib417) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897d1e6f8400, cur 1591754868 expire 1591754718 last 1591754641 [1072780.495411] Lustre: Skipped 3 previous similar messages [1073460.083487] Lustre: MGS: Connection restored to d3edc46e-8fc9-80f9-3291-06fb6a5ef157 (at 10.151.31.168@o2ib) [1073460.083493] Lustre: Skipped 77 previous similar messages [1073504.449470] Lustre: nbp8-MDT0000: haven't heard from client ec1ea688-044f-d054-485a-d4ea11c333d9 (at 10.151.37.184@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899b4f667c00, cur 1591755592 expire 1591755442 last 1591755365 [1073504.522443] Lustre: Skipped 1 previous similar message [1073536.359671] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 24 seconds [1073536.393734] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.37.184@o2ib (258): c: 29, oc: 0, rc: 32 [1074134.563829] Lustre: MGS: Connection restored to 78ca8674-7dc2-bd75-7bde-3521f8b3df02 (at 10.151.33.18@o2ib) [1074134.563835] Lustre: Skipped 25 previous similar messages [1074892.990976] Lustre: MGS: Connection restored to 491e00b0-454d-4700-45fb-c848daf07cd7 (at 10.151.37.184@o2ib) [1074892.990982] Lustre: Skipped 41 previous similar messages [1075124.417934] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1075124.451718] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.31.48@o2ib (303): c: 32, oc: 0, rc: 32 [1075629.170312] Lustre: MGS: Connection restored to 6e2067c6-8c8b-63e7-77ab-1e29fe662f4a (at 10.151.54.127@o2ib) [1075629.170318] Lustre: Skipped 201 previous similar messages [1076229.401444] Lustre: MGS: Connection restored to 86e70108-7c6f-2a01-11f8-02bc89f70429 (at 10.149.1.180@o2ib313) [1076229.401449] Lustre: Skipped 213 previous similar messages [1076755.477882] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1076755.511673] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.14.87@o2ib (267): c: 32, oc: 0, rc: 32 [1076949.708012] Lustre: MGS: Connection restored to d792b5db-e078-5ec7-3be6-e8f4da1137d7 (at 10.149.9.83@o2ib313) [1076949.708018] Lustre: Skipped 59 previous similar messages [1077631.243291] Lustre: MGS: Connection restored to 00062ffe-9beb-0c95-4d8a-a286495ea877 (at 10.151.1.94@o2ib) [1077631.243297] Lustre: Skipped 33 previous similar messages [1078342.089343] Lustre: MGS: Connection restored to e5f389e8-c4bd-5de2-0f25-85188aa00fa2 (at 10.151.39.104@o2ib) [1078342.089349] Lustre: Skipped 95 previous similar messages [1078362.627364] Lustre: nbp8-MDT0000: haven't heard from client 6bd20a5b-3356-bf55-2a26-380db66e701e (at 10.151.63.75@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a46bbcdc00, cur 1591760450 expire 1591760300 last 1591760223 [1078362.700113] Lustre: Skipped 1 previous similar message [1078833.554198] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1078833.587982] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.32.38@o2ib (303): c: 32, oc: 0, rc: 32 [1078961.116437] Lustre: MGS: Connection restored to 194f87cf-2dfd-9e22-a39a-2a1fe7314c52 (at 10.151.35.119@o2ib) [1078961.116443] Lustre: Skipped 385 previous similar messages [1080331.609363] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1080331.643136] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.49.97@o2ib (299): c: 32, oc: 0, rc: 32 [1080336.609468] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1080336.643255] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.52.146@o2ib (302): c: 32, oc: 0, rc: 32 [1080362.166923] Lustre: MGS: Connection restored to 04425314-f3f8-983e-2b9e-f3b4972b7d4d (at 10.151.44.54@o2ib) [1080362.166929] Lustre: Skipped 73 previous similar messages [1080362.294075] LNet: 77012:0:(o2iblnd_cb.c:2602:kiblnd_passive_connect()) Conn stale 10.151.44.115@o2ib version 12/12 incarnation 1588902795222866/1591762396137448 [1080362.341916] LNet: 77012:0:(o2iblnd_cb.c:2602:kiblnd_passive_connect()) Skipped 1 previous similar message [1080363.756174] LNet: 77012:0:(o2iblnd_cb.c:2602:kiblnd_passive_connect()) Conn stale 10.151.45.22@o2ib version 12/12 incarnation 1589041994164518/1591762398166662 [1080538.264123] Lustre: MGS: Connection restored to c9061bfa-7279-6584-5b0d-e1343a52150a (at 10.151.34.83@o2ib) [1080538.264129] Lustre: Skipped 85 previous similar messages [1080738.865180] LNet: 91452:0:(o2iblnd_cb.c:2602:kiblnd_passive_connect()) Conn stale 10.151.63.75@o2ib version 12/12 incarnation 1590082871664446/1591762820119801 [1080738.912725] LNet: 91452:0:(o2iblnd_cb.c:2602:kiblnd_passive_connect()) Skipped 2 previous similar messages [1080738.950206] Lustre: MGS: Connection restored to 01f85f7e-4cb7-464c-eb6f-a0c6f79c9ebd (at 10.151.63.75@o2ib) [1080738.950209] Lustre: Skipped 15 previous similar messages [1081107.891447] Lustre: MGS: Connection restored to f2fff0e6-907c-5176-84b3-65e0a0aa94a3 (at 10.151.3.55@o2ib) [1081107.891452] Lustre: Skipped 3 previous similar messages [1081292.644631] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1081292.678417] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [1081292.711913] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.13.225@o2ib (303): c: 32, oc: 0, rc: 32 [1081292.753119] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [1081669.749495] Lustre: MGS: haven't heard from client fd53008d-5fbe-a510-5fbb-315188453afc (at 10.153.10.23@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89756a6d7800, cur 1591763757 expire 1591763607 last 1591763530 [1081669.820477] Lustre: Skipped 1 previous similar message [1081687.748190] Lustre: nbp8-MDT0000: haven't heard from client 7ee6c38e-e231-f1ab-4c97-1f4372cedba4 (at 10.153.10.23@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8979d2666400, cur 1591763775 expire 1591763625 last 1591763548 [1082117.697884] Lustre: MGS: Connection restored to 0b233af1-c7c4-0039-c1ba-50615e3d4e23 (at 10.151.36.60@o2ib) [1082117.697890] Lustre: Skipped 125 previous similar messages [1082485.778670] Lustre: nbp8-MDT0000: haven't heard from client 182fa3ca-c2f7-257c-ae64-dcd4f5c5a5e9 (at 10.151.63.74@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897c42870400, cur 1591764573 expire 1591764423 last 1591764346 [1082602.678590] LNet: 3322:0:(o2iblnd_cb.c:2602:kiblnd_passive_connect()) Conn stale 10.151.63.74@o2ib version 12/12 incarnation 1590083645160663/1591764683295447 [1082732.522550] Lustre: MGS: Connection restored to f7f36d60-2704-964e-3032-45953d19ffd2 (at 10.151.31.160@o2ib) [1082732.522556] Lustre: Skipped 647 previous similar messages [1083332.783488] Lustre: nbp8-MDT0000: Connection restored to a91e2ce1-e58d-8197-8ede-1a6996d9d15f (at 10.151.47.29@o2ib) [1083332.783494] Lustre: Skipped 48 previous similar messages [1083366.720846] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1083366.754617] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.37.249@o2ib (295): c: 32, oc: 0, rc: 32 [1083554.817960] Lustre: nbp8-MDT0000: haven't heard from client 0d013c1a-2bfe-00c1-874b-bba56a6f9d49 (at 10.151.63.71@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff898158a0e000, cur 1591765642 expire 1591765492 last 1591765415 [1083554.890682] Lustre: Skipped 1 previous similar message [1083608.130004] LNet: 3322:0:(o2iblnd_cb.c:2602:kiblnd_passive_connect()) Conn stale 10.151.26.4@o2ib version 12/12 incarnation 1591752400044405/1591765691239944 [1083643.773324] LNet: 3322:0:(o2iblnd_cb.c:2602:kiblnd_passive_connect()) Conn stale 10.151.25.235@o2ib version 12/12 incarnation 1590086727903638/1591765727702596 [1083657.254716] LNet: 3322:0:(o2iblnd_cb.c:2602:kiblnd_passive_connect()) Conn stale 10.151.25.160@o2ib version 12/12 incarnation 1590081449104210/1591765741898616 [1083660.305188] LNet: 3322:0:(o2iblnd_cb.c:2602:kiblnd_passive_connect()) Conn stale 10.151.25.233@o2ib version 12/12 incarnation 1590081360280974/1591765744788890 [1083679.733341] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [1083679.767128] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.63.77@o2ib (332): c: 30, oc: 0, rc: 32 [1083687.422069] LNet: 3322:0:(o2iblnd_cb.c:2602:kiblnd_passive_connect()) Conn stale 10.151.25.229@o2ib version 12/12 incarnation 1590081559417916/1591765772241243 [1083699.275623] LNet: 3322:0:(o2iblnd_cb.c:2602:kiblnd_passive_connect()) Conn stale 10.151.63.70@o2ib version 12/12 incarnation 1590084980350851/1591765780150112 [1083729.255510] LNet: 3322:0:(o2iblnd_cb.c:2602:kiblnd_passive_connect()) Conn stale 10.151.25.230@o2ib version 12/12 incarnation 1590081551839584/1591765813980698 [1083729.303024] LNet: 3322:0:(o2iblnd_cb.c:2602:kiblnd_passive_connect()) Skipped 1 previous similar message [1083740.734592] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1083740.768380] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.25.231@o2ib (303): c: 30, oc: 0, rc: 32 [1083754.735096] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1083754.768877] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.26.3@o2ib (303): c: 30, oc: 0, rc: 32 [1083763.618215] LNet: 3322:0:(o2iblnd_cb.c:2602:kiblnd_passive_connect()) Conn stale 10.151.25.232@o2ib version 12/12 incarnation 1590081411772934/1591765848311542 [1083945.887915] LNet: 3322:0:(o2iblnd_cb.c:2602:kiblnd_passive_connect()) Conn stale 10.151.63.74@o2ib version 12/12 incarnation 1591764683295447/1591766025528334 [1083945.940769] Lustre: MGS: Connection restored to e26e4c59-9661-6cae-bc49-a67026caf90e (at 10.151.63.74@o2ib) [1083945.940773] Lustre: Skipped 100 previous similar messages [1084080.747076] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1084080.780865] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.63.71@o2ib (303): c: 30, oc: 0, rc: 32 [1084553.714399] Lustre: MGS: Connection restored to 5d992a23-e0dc-a68c-1c8e-c93bc2f31bd2 (at 10.141.7.51@o2ib417) [1084553.714404] Lustre: Skipped 157 previous similar messages [1085179.664546] Lustre: MGS: Connection restored to fad47894-f22a-f6f4-a9f9-636fa2e507fd (at 10.151.25.161@o2ib) [1085179.664552] Lustre: Skipped 55 previous similar messages [1085324.883918] LustreError: 12646:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) ### client (nid 10.141.6.222@o2ib417) returned error from blocking AST (req@ffff8994cd7a5a00 x1667962942241536 status -107 rc -107), evict it ns: mdt-nbp8-MDT0000_UUID lock: ffff8979c0285c40/0xa22cee3859f5c52d lrc: 4/0,0 mode: PR/PR res: [0x3608ac7eb:0x1a0a6:0x0].0x0 bits 0x13/0x0 rrc: 5 type: IBT flags: 0x60200400000020 nid: 10.141.6.222@o2ib417 remote: 0x2d94d264d21e07af expref: 46 pid: 14105 timeout: 1085696 lvb_type: 0 [1085325.029563] LustreError: 138-a: nbp8-MDT0000: A client on nid 10.141.6.222@o2ib417 was evicted due to a lock blocking callback time out: rc -107 [1085325.072812] LustreError: 5711:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 0s: evicting client at 10.141.6.222@o2ib417 ns: mdt-nbp8-MDT0000_UUID lock: ffff8979c0285c40/0xa22cee3859f5c52d lrc: 3/0,0 mode: PR/PR res: [0x3608ac7eb:0x1a0a6:0x0].0x0 bits 0x13/0x0 rrc: 5 type: IBT flags: 0x60200400000020 nid: 10.141.6.222@o2ib417 remote: 0x2d94d264d21e07af expref: 47 pid: 14105 timeout: 0 lvb_type: 0 [1085369.794552] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1085369.828322] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.3.133@o2ib (268): c: 32, oc: 0, rc: 32 [1085398.885668] Lustre: MGS: haven't heard from client 5ac80bbc-77d5-8505-4dd5-b0f21d053651 (at 10.141.6.222@o2ib417) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89681b7ad800, cur 1591767486 expire 1591767336 last 1591767259 [1085398.956647] Lustre: Skipped 33 previous similar messages [1086094.373504] Lustre: MGS: Connection restored to bba846d2-8c01-15da-358b-2dce1ec5744e (at 10.151.3.133@o2ib) [1086094.373510] Lustre: Skipped 165 previous similar messages [1086760.188210] Lustre: MGS: Connection restored to 1da6cc0c-9c54-1268-ba44-fdacb9d7a6fb (at 10.149.14.35@o2ib313) [1086760.188216] Lustre: Skipped 47 previous similar messages [1087341.957185] Lustre: nbp8-MDT0000: haven't heard from client 955381b3-4531-fdf4-8831-04b8920044ac (at 10.151.55.158@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a3805dd400, cur 1591769429 expire 1591769279 last 1591769202 [1087342.957250] Lustre: MGS: haven't heard from client 3d9e6f46-a749-b5aa-82b2-3df69c49c465 (at 10.151.55.158@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897c5f5e9c00, cur 1591769430 expire 1591769280 last 1591769203 [1087459.546591] Lustre: MGS: Connection restored to f707eab3-7a12-b4b5-9989-021b0c16d912 (at 10.141.2.23@o2ib417) [1087459.546597] Lustre: Skipped 303 previous similar messages [1088227.058624] Lustre: MGS: Connection restored to 05d5d6b8-b843-3bf3-0a1e-6a10f38c5d59 (at 10.151.22.41@o2ib) [1088227.058630] Lustre: Skipped 183 previous similar messages [1088273.991549] Lustre: nbp8-MDT0000: haven't heard from client c50800b7-a9f4-10e0-7b28-c3be950c80ef (at 10.151.55.158@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a2fe6bb800, cur 1591770361 expire 1591770211 last 1591770134 [1088351.904174] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [1088351.937960] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.55.158@o2ib (304): c: 30, oc: 0, rc: 32 [1088908.201915] Lustre: MGS: Connection restored to e154ec6d-9a5f-79db-1831-5e3efeb7449e (at 10.153.10.10@o2ib233) [1088908.201921] Lustre: Skipped 5 previous similar messages [1090204.543558] Lustre: MGS: Connection restored to bf1f48d7-5c75-7edd-f9b8-79a53048f73b (at 10.151.47.22@o2ib) [1090204.543564] Lustre: Skipped 933 previous similar messages [1090361.104185] Lustre: MGS: Connection restored to a2710192-b179-8890-67f8-2e7e97d4ee74 (at 10.151.35.62@o2ib) [1090361.104191] Lustre: Skipped 27 previous similar messages [1091079.033312] Lustre: MGS: Connection restored to 78ba67ce-346c-abca-8cf2-97efa750f521 (at 10.151.36.10@o2ib) [1091079.033318] Lustre: Skipped 1 previous similar message [1091335.151840] Process accounting resumed [1091501.829943] Lustre: MGS: Connection restored to cc893a10-2c7d-0435-be83-ca1474c38439 (at 10.151.4.219@o2ib) [1091501.829949] Lustre: Skipped 23 previous similar messages [1093057.832175] Lustre: MGS: Connection restored to b454b8e6-4481-f959-6899-584188191898 (at 10.151.13.187@o2ib) [1093057.832181] Lustre: Skipped 65 previous similar messages [1093182.788608] Lustre: MGS: Connection restored to a0f7c022-aec8-3c10-c32d-5f087e6f9a03 (at 10.151.37.126@o2ib) [1093182.788612] Lustre: Skipped 1 previous similar message [1093337.326827] Lustre: MGS: Connection restored to 1181e271-a412-496b-ad44-8bf244e4783a (at 10.151.55.158@o2ib) [1093337.326833] Lustre: Skipped 263 previous similar messages [1093672.168814] Lustre: MGS: Connection restored to fa43d24b-ddfe-005e-c6e3-7b917f425543 (at 10.151.2.204@o2ib) [1093672.168820] Lustre: Skipped 27 previous similar messages [1094307.651139] Lustre: MGS: Connection restored to f70102b7-455b-410c-d3b0-eaa46847fde9 (at 10.151.2.159@o2ib) [1094307.651144] Lustre: Skipped 55 previous similar messages [1094924.374392] Lustre: MGS: Connection restored to 11a76c06-767e-4805-9c13-08a564dddcaa (at 10.151.7.113@o2ib) [1094924.374398] Lustre: Skipped 105 previous similar messages [1095609.324877] Lustre: MGS: Connection restored to d92f37dd-d3bc-94e7-7a12-c6c4830116fc (at 10.151.2.108@o2ib) [1095609.324883] Lustre: Skipped 305 previous similar messages [1096211.650255] Lustre: MGS: Connection restored to 49ce8880-72b8-e3e9-3fe0-f12f13be0aa5 (at 10.151.47.77@o2ib) [1096211.650261] Lustre: Skipped 49 previous similar messages [1097430.435624] Lustre: MGS: Connection restored to e8f4e8a8-72a7-0b09-4f97-fe721a271298 (at 10.151.7.84@o2ib) [1097430.435630] Lustre: Skipped 5 previous similar messages [1097611.585723] Lustre: MGS: Connection restored to db14d46f-682a-ef07-1196-0568dc925fb1 (at 10.151.35.64@o2ib) [1097611.585729] Lustre: Skipped 1 previous similar message [1098023.012210] Lustre: MGS: Connection restored to a2decbd0-ebc8-de77-1183-4c0b6660ece3 (at 10.151.30.12@o2ib) [1098023.012216] Lustre: Skipped 115 previous similar messages [1098904.781430] Lustre: MGS: Connection restored to fe05b9ff-236c-88c5-0539-0b1aa7b6a966 (at 10.151.23.66@o2ib) [1098904.781436] Lustre: Skipped 35 previous similar messages [1099347.306975] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1099347.340755] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.4.56@o2ib (281): c: 32, oc: 0, rc: 32 [1099838.034780] Lustre: MGS: Connection restored to 5c65e110-8051-4113-63ff-1f962a226797 (at 10.141.7.46@o2ib417) [1099838.034786] Lustre: Skipped 163 previous similar messages [1102230.636538] Lustre: MGS: Connection restored to fc01f508-4542-1fef-b429-27e5c0ab0420 (at 10.151.34.71@o2ib) [1102230.636543] Lustre: Skipped 7 previous similar messages [1102988.785068] Lustre: MGS: Connection restored to b8e63df8-d5b9-9a4d-3bd9-09a74c928e4d (at 10.149.14.60@o2ib313) [1102988.785074] Lustre: Skipped 89 previous similar messages [1103151.864520] Lustre: MGS: Connection restored to efa9d2c7-4be0-c525-ae0c-db622d1e8ce5 (at 10.149.15.195@o2ib313) [1103151.864526] Lustre: Skipped 19 previous similar messages [1103428.510833] Lustre: MGS: Connection restored to 966e1a77-e4c5-bfce-5a66-3b05da6c5ddb (at 10.151.47.33@o2ib) [1103428.510839] Lustre: Skipped 227 previous similar messages [1103493.664184] Lustre: MGS: Connection restored to efc8829a-9a88-ace0-b387-f5120016f3c6 (at 10.151.28.96@o2ib) [1103493.664190] Lustre: Skipped 1 previous similar message [1103766.481151] Lustre: MGS: Connection restored to 1ea4ac9e-10a3-65c8-ab7e-5f78956cb793 (at 10.151.3.157@o2ib) [1103766.481158] Lustre: Skipped 1 previous similar message [1103773.733534] LNet: 3322:0:(o2iblnd_cb.c:2602:kiblnd_passive_connect()) Conn stale 10.151.14.181@o2ib version 12/12 incarnation 1588859519690232/1591785793781568 [1103773.781059] LNet: 3322:0:(o2iblnd_cb.c:2602:kiblnd_passive_connect()) Skipped 3 previous similar messages [1103835.471002] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1103835.504780] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.2.50@o2ib (311): c: 31, oc: 0, rc: 32 [1103846.471431] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1103846.505218] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.1.227@o2ib (323): c: 31, oc: 0, rc: 32 [1103860.471859] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1103860.505639] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.18.43@o2ib (336): c: 31, oc: 0, rc: 32 [1103863.471967] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1103863.505721] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.18.49@o2ib (338): c: 31, oc: 0, rc: 32 [1103874.472462] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1103874.506246] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 3 previous similar messages [1103874.540021] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.6.114@o2ib (351): c: 31, oc: 0, rc: 32 [1103874.580944] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 3 previous similar messages [1104068.479463] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1104068.513255] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.18.55@o2ib (303): c: 32, oc: 0, rc: 32 [1104476.460483] Lustre: MGS: Connection restored to cf52cf58-11a8-2d6b-b574-161833c111dd (at 10.151.33.12@o2ib) [1104476.460488] Lustre: Skipped 301 previous similar messages [1104651.592412] Lustre: MGS: haven't heard from client b6d55b5a-dafe-8dab-e02f-aff7ccc68934 (at 10.141.7.52@o2ib417) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8983b9bcdc00, cur 1591786738 expire 1591786588 last 1591786511 [1104651.663129] Lustre: Skipped 1 previous similar message [1104671.591502] Lustre: nbp8-MDT0000: haven't heard from client 5f80e035-6906-d131-ec3f-337f3607d357 (at 10.141.7.52@o2ib417) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899f0de38000, cur 1591786758 expire 1591786608 last 1591786531 [1105119.709014] Lustre: MGS: Connection restored to ed642340-3fd4-2be0-ab50-5ffbee1caf8b (at 10.151.14.99@o2ib) [1105119.709020] Lustre: Skipped 189 previous similar messages [1105563.534212] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1105563.567987] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.53.146@o2ib (303): c: 32, oc: 0, rc: 32 [1106488.568019] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1106488.601792] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.46.253@o2ib (298): c: 32, oc: 0, rc: 32 [1106778.638911] Lustre: MGS: Connection restored to 09224c84-a46d-f80f-49f7-dd6355905a65 (at 10.151.37.103@o2ib) [1106778.638916] Lustre: Skipped 7 previous similar messages [1106796.579267] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1106796.613043] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.36.96@o2ib (303): c: 32, oc: 0, rc: 32 [1107125.166321] Lustre: MGS: Connection restored to 8c487c07-4b76-fa63-23ff-00a2bf2a37ed (at 10.149.15.66@o2ib313) [1107125.166327] Lustre: Skipped 97 previous similar messages [1107369.296080] Lustre: MGS: Connection restored to c5f2ce6e-9021-d4d2-fa8b-c19c5726f19e (at 10.151.34.129@o2ib) [1107369.296085] Lustre: Skipped 5 previous similar messages [1107712.335926] Lustre: MGS: Connection restored to beb1ad93-2b9e-a131-fd30-3e4fe1a41eab (at 10.149.15.57@o2ib313) [1107712.335931] Lustre: Skipped 15 previous similar messages [1107946.621213] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1107946.654996] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.37.220@o2ib (279): c: 32, oc: 0, rc: 32 [1107955.622608] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [1107955.656378] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.37.239@o2ib (291): c: 32, oc: 0, rc: 32 [1107960.621720] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1107960.655509] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.36.201@o2ib (303): c: 32, oc: 0, rc: 32 [1107964.621948] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1107964.655736] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.36.210@o2ib (243): c: 32, oc: 0, rc: 32 [1108023.624026] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1108023.657818] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 2 previous similar messages [1108023.691592] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.36.226@o2ib (293): c: 32, oc: 0, rc: 32 [1108023.732800] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 2 previous similar messages [1108315.391425] Lustre: MGS: Connection restored to ce249dc8-07d3-f6e8-1821-8129fca9bee6 (at 10.151.3.116@o2ib) [1108315.391434] Lustre: Skipped 5 previous similar messages [1108874.744235] Lustre: nbp8-MDT0000: haven't heard from client 6ed0cf93-1966-ab54-b9c1-31624dec7653 (at 10.149.12.57@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a3fd859400, cur 1591790961 expire 1591790811 last 1591790734 [1108934.407732] Lustre: MGS: Connection restored to 579a4bd7-29e5-b7e3-b37d-f83f4285f9b2 (at 10.149.16.34@o2ib313) [1108934.407738] Lustre: Skipped 123 previous similar messages [1109138.754787] Lustre: nbp8-MDT0000: haven't heard from client 6fa10fea-e570-3b4d-0c4c-f04418a3e424 (at 10.149.7.101@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89784da15c00, cur 1591791225 expire 1591791075 last 1591790998 [1109138.828368] Lustre: Skipped 1 previous similar message [1109720.219691] Lustre: MGS: Connection restored to c463a1d6-d378-e7f1-9b12-b81fb84ee1d9 (at 10.151.39.114@o2ib) [1109720.219697] Lustre: Skipped 281 previous similar messages [1110452.429247] Lustre: MGS: Connection restored to 5b4d7143-8c7d-0803-ef20-4b2b6251b523 (at 10.149.16.17@o2ib313) [1110452.429254] Lustre: Skipped 157 previous similar messages [1111581.326468] Lustre: MGS: Connection restored to 4bbed37d-16c9-fe89-7381-e8b1ea13bb43 (at 10.151.11.142@o2ib) [1111581.326473] Lustre: Skipped 539 previous similar messages [1112227.777498] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1112227.811271] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.17.65@o2ib (304): c: 31, oc: 0, rc: 32 [1112231.319958] Lustre: MGS: Connection restored to 452efcb7-7627-bce2-1d65-ec156827488c (at 10.151.7.70@o2ib) [1112231.319964] Lustre: Skipped 75 previous similar messages [1112231.778717] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [1112231.812502] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 2 previous similar messages [1112231.846269] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.16.128@o2ib (308): c: 31, oc: 0, rc: 32 [1112231.887476] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 2 previous similar messages [1112234.777819] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1112234.811589] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.16.33@o2ib (310): c: 31, oc: 0, rc: 32 [1112238.813773] LNet: 3322:0:(o2iblnd_cb.c:2602:kiblnd_passive_connect()) Conn stale 10.151.17.61@o2ib version 12/12 incarnation 1588869891430968/1591794229081357 [1112238.861020] LNet: 3322:0:(o2iblnd_cb.c:2602:kiblnd_passive_connect()) Skipped 25 previous similar messages [1112239.778931] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [1112239.812704] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 3 previous similar messages [1112239.846468] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.16.42@o2ib (315): c: 31, oc: 0, rc: 32 [1112239.887381] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 3 previous similar messages [1112251.778369] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1112251.812163] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [1112251.845658] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.17.12@o2ib (323): c: 31, oc: 0, rc: 32 [1112251.886580] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [1112268.778989] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1112268.812783] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 9 previous similar messages [1112268.846563] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.16.201@o2ib (344): c: 31, oc: 0, rc: 32 [1112268.887771] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 9 previous similar messages [1113579.291955] Lustre: MGS: Connection restored to 7dc2c4db-2679-35d4-ecc1-ecd743620175 (at 10.151.18.242@o2ib) [1113579.291961] Lustre: Skipped 71 previous similar messages [1113706.256085] Lustre: MGS: Connection restored to 579a4bd7-29e5-b7e3-b37d-f83f4285f9b2 (at 10.149.16.34@o2ib313) [1113706.256091] Lustre: Skipped 3 previous similar messages [1113858.001661] Lustre: MGS: Connection restored to e093138f-3cb2-c2da-1092-853ea0dcb2ec (at 10.151.37.106@o2ib) [1113858.001666] Lustre: Skipped 399 previous similar messages [1114547.164290] Lustre: MGS: Connection restored to 10e24b75-cacb-0306-b556-c9a11eddb543 (at 10.151.54.177@o2ib) [1114547.164296] Lustre: Skipped 43 previous similar messages [1114553.778967] LNet: 3322:0:(o2iblnd_cb.c:2602:kiblnd_passive_connect()) Conn stale 10.151.43.172@o2ib version 12/12 incarnation 1588859516465174/1591796603070046 [1115739.711046] Lustre: MGS: Connection restored to aa4e8cea-0983-6a47-64df-052b3a531c4a (at 10.151.37.175@o2ib) [1115739.711052] Lustre: Skipped 439 previous similar messages [1115742.905942] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1115742.939737] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 3 previous similar messages [1115742.973510] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.18.72@o2ib (285): c: 32, oc: 0, rc: 32 [1115743.014431] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 3 previous similar messages [1115838.909541] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1115838.943319] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.18.63@o2ib (303): c: 32, oc: 0, rc: 32 [1116350.229604] Lustre: MGS: Connection restored to e16088f6-94f8-a8ed-ffca-232f63342869 (at 10.149.2.160@o2ib313) [1116350.229610] Lustre: Skipped 161 previous similar messages [1117082.171282] Lustre: MGS: Connection restored to 0b233af1-c7c4-0039-c1ba-50615e3d4e23 (at 10.151.36.60@o2ib) [1117082.171287] Lustre: Skipped 63 previous similar messages [1117733.910926] Lustre: MGS: Connection restored to 509d171d-9cfd-b1e0-813f-8b28060abdd4 (at 10.151.37.101@o2ib) [1117733.910931] Lustre: Skipped 83 previous similar messages [1118026.282721] LNet: 3322:0:(o2iblnd_cb.c:2602:kiblnd_passive_connect()) Conn stale 10.151.35.58@o2ib version 12/12 incarnation 1591701812888204/1591800064918085 [1118026.329977] LNet: 3322:0:(o2iblnd_cb.c:2602:kiblnd_passive_connect()) Skipped 1 previous similar message [1118510.984431] Lustre: MGS: Connection restored to 688e3b20-eca3-0a67-62f7-78f0b3833248 (at 10.151.24.93@o2ib) [1118510.984436] Lustre: Skipped 31 previous similar messages [1119318.662963] Lustre: MGS: Connection restored to f5303ff0-3f2b-4994-eeab-a06f75a9211b (at 10.151.24.96@o2ib) [1119318.662969] Lustre: Skipped 185 previous similar messages [1119370.038776] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1119370.072561] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.54.29@o2ib (299): c: 32, oc: 0, rc: 32 [1120539.227092] Lustre: MGS: Connection restored to 2eb9e8c5-f11e-1af8-2bc3-6a38683ecb71 (at 10.151.54.61@o2ib) [1120539.227098] Lustre: Skipped 5 previous similar messages [1120651.969094] Lustre: MGS: Connection restored to 6198d60b-6120-dd05-5303-339f4d58ef70 (at 10.151.32.44@o2ib) [1120651.969100] Lustre: Skipped 5 previous similar messages [1120906.207779] Lustre: MGS: Connection restored to 66784cf6-e519-dbf8-3c93-9454948265bd (at 10.151.31.158@o2ib) [1120906.207785] Lustre: Skipped 23 previous similar messages [1121225.063248] Lustre: MGS: Connection restored to 8f7389f6-17c6-a2cc-4348-945980bfc69e (at 10.151.57.21@o2ib) [1121225.063254] Lustre: Skipped 49 previous similar messages [1121678.658931] LNet: 77012:0:(o2iblnd_cb.c:2602:kiblnd_passive_connect()) Conn stale 10.151.47.85@o2ib version 12/12 incarnation 1591659793446953/1591803758539159 [1121752.217321] Lustre: nbp8-MDT0000: haven't heard from client ee9c16dd-c5f9-38a4-8cbc-90627e7dbf40 (at 10.151.47.85@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899a16ba9c00, cur 1591803838 expire 1591803688 last 1591803611 [1121752.290001] Lustre: Skipped 1 previous similar message [1121813.128613] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1121813.162417] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.24.71@o2ib (303): c: 32, oc: 0, rc: 32 [1121888.995104] Lustre: MGS: Connection restored to 54670120-691d-dee4-9c2d-aa8c0e011a58 (at 10.149.11.137@o2ib313) [1121888.995109] Lustre: Skipped 99 previous similar messages [1122257.236869] LNet: 77012:0:(o2iblnd_cb.c:2602:kiblnd_passive_connect()) Conn stale 10.151.34.129@o2ib version 12/12 incarnation 1591666399734783/1591804337524467 [1122302.248006] Lustre: MGS: haven't heard from client fd859681-1bc0-2c20-4405-fffb5900c352 (at 10.151.34.129@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897f44b54400, cur 1591804388 expire 1591804238 last 1591804161 [1122302.318414] Lustre: Skipped 1 previous similar message [1122314.236489] Lustre: nbp8-MDT0000: haven't heard from client 7703dcaa-ea37-a00c-ed64-fb575e61a7ab (at 10.151.34.129@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8982c319d800, cur 1591804400 expire 1591804250 last 1591804173 [1122350.148460] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1122350.182248] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.53.85@o2ib (303): c: 32, oc: 0, rc: 32 [1122497.147625] Lustre: MGS: Connection restored to d971419a-c125-5917-5088-53240cba29cf (at 10.151.47.90@o2ib) [1122497.147631] Lustre: Skipped 433 previous similar messages [1123101.587410] Lustre: MGS: Connection restored to e7865f38-16c4-74e6-29a2-4895a21ae019 (at 10.149.8.8@o2ib313) [1123101.587415] Lustre: Skipped 253 previous similar messages [1123775.252474] Lustre: MGS: Connection restored to c463a1d6-d378-e7f1-9b12-b81fb84ee1d9 (at 10.151.39.114@o2ib) [1123775.252480] Lustre: Skipped 199 previous similar messages [1124680.758232] Lustre: MGS: Connection restored to 23046b77-fff2-a069-c300-157c5347569e (at 10.153.10.13@o2ib233) [1124680.758238] Lustre: Skipped 185 previous similar messages [1125290.406898] Lustre: MGS: Connection restored to d84e0267-beb9-c0de-4324-18a6454a9b69 (at 10.151.56.63@o2ib) [1125290.406904] Lustre: Skipped 997 previous similar messages [1125345.046222] LNet: 4180:0:(o2iblnd_cb.c:2602:kiblnd_passive_connect()) Conn stale 10.151.6.61@o2ib version 12/12 incarnation 1589535185639277/1591807378392724 [1125352.640987] LNet: 77614:0:(o2iblnd_cb.c:2602:kiblnd_passive_connect()) Conn stale 10.151.6.57@o2ib version 12/12 incarnation 1589535185825835/1591807381108765 [1125370.259602] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1125370.293376] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.6.77@o2ib (299): c: 32, oc: 0, rc: 32 [1125403.261818] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [1125403.295598] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.6.42@o2ib (303): c: 32, oc: 0, rc: 32 [1125900.421292] Lustre: MGS: Connection restored to b7f74e2e-0345-258b-2e74-7fd1c3581dc9 (at 10.151.37.152@o2ib) [1125900.421298] Lustre: Skipped 89 previous similar messages [1126556.529572] Lustre: MGS: Connection restored to 7ea6a74d-99cf-b378-d191-d47876ff0b02 (at 10.151.14.134@o2ib) [1126556.529578] Lustre: Skipped 39 previous similar messages [1126674.397702] Lustre: nbp8-MDT0000: haven't heard from client d36359a8-349f-9481-2579-5a8a482548ae (at 10.153.16.104@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897988a40800, cur 1591808760 expire 1591808610 last 1591808533 [1127143.325867] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [1127143.359653] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.41.218@o2ib (207): c: 32, oc: 0, rc: 32 [1127144.324978] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1127144.358759] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.41.221@o2ib (223): c: 32, oc: 0, rc: 32 [1127145.326017] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [1127227.328045] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1127227.361823] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.59.223@o2ib (303): c: 32, oc: 0, rc: 32 [1127227.403063] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [1127333.510085] Lustre: MGS: Connection restored to 834f1aca-5153-3898-7a00-12a221475c48 (at 10.151.50.38@o2ib) [1127333.510091] Lustre: Skipped 95 previous similar messages [1128007.628210] Lustre: MGS: Connection restored to a825a0f2-ef1c-ceb9-4d20-882b1ed9239e (at 10.149.9.213@o2ib313) [1128007.628216] Lustre: Skipped 295 previous similar messages [1128608.554627] Lustre: MGS: Connection restored to 35649a01-0091-e06c-f653-94099db692cf (at 10.151.47.54@o2ib) [1128608.554633] Lustre: Skipped 447 previous similar messages [1128952.391422] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1128952.425209] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.49.172@o2ib (296): c: 32, oc: 0, rc: 32 [1128988.392809] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1128988.426595] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.49.144@o2ib (303): c: 32, oc: 0, rc: 32 [1129163.824468] LNet: 3322:0:(o2iblnd_cb.c:2602:kiblnd_passive_connect()) Conn stale 10.151.38.79@o2ib version 12/12 incarnation 1591725195274425/1591811243884168 [1129163.824552] Lustre: 14126:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1591811220/real 1591811249] req@ffff89803f361680 x1667965573981248/t0(0) o104->nbp8-MDT0000@10.151.38.79@o2ib:15/16 lens 296/224 e 0 to 1 dl 1591811843 ref 1 fl Rpc:eX/0/ffffffff rc 0/-1 [1129163.824555] Lustre: 14126:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 2 previous similar messages [1129164.001680] LustreError: 14126:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) ### client (nid 10.151.38.79@o2ib) returned error from blocking AST (req@ffff89803f361680 x1667965573981248 status -107 rc -107), evict it ns: mdt-nbp8-MDT0000_UUID lock: ffff897fb5726540/0xa22cee3963d5a736 lrc: 4/0,0 mode: PR/PR res: [0x3608af696:0x15c70:0x0].0x0 bits 0x13/0x0 rrc: 3 type: IBT flags: 0x60200400000020 nid: 10.151.38.79@o2ib remote: 0x99a80cfc4d39e0c8 expref: 12 pid: 14085 timeout: 1129533 lvb_type: 0 [1129164.145658] LustreError: 138-a: nbp8-MDT0000: A client on nid 10.151.38.79@o2ib was evicted due to a lock blocking callback time out: rc -107 [1129164.188091] LustreError: 5711:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 29s: evicting client at 10.151.38.79@o2ib ns: mdt-nbp8-MDT0000_UUID lock: ffff897fb5726540/0xa22cee3963d5a736 lrc: 3/0,0 mode: PR/PR res: [0x3608af696:0x15c70:0x0].0x0 bits 0x13/0x0 rrc: 3 type: IBT flags: 0x60200400000020 nid: 10.151.38.79@o2ib remote: 0x99a80cfc4d39e0c8 expref: 13 pid: 14085 timeout: 0 lvb_type: 0 [1129211.492865] Lustre: MGS: haven't heard from client 003a596c-6230-b159-715a-d1b076a5dcdc (at 10.151.38.79@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897901452400, cur 1591811297 expire 1591811147 last 1591811070 [1129211.563064] Lustre: Skipped 1 previous similar message [1129303.092043] Lustre: MGS: Connection restored to 547b9623-d1a5-71e9-a26a-5ea6524586b4 (at 10.151.3.31@o2ib) [1129303.092049] Lustre: Skipped 59 previous similar messages [1130039.149856] Lustre: MGS: Connection restored to e576462e-1830-4e05-6af9-b3e4107a642a (at 10.151.23.124@o2ib) [1130039.149862] Lustre: Skipped 63 previous similar messages [1130766.023231] Lustre: MGS: Connection restored to 9745ed56-678f-9470-f482-18f40f0e21ff (at 10.151.9.75@o2ib) [1130766.023235] Lustre: Skipped 297 previous similar messages [1131474.364926] Lustre: MGS: Connection restored to 481bd6cc-3df1-0543-f566-6edebc2162d4 (at 10.151.56.35@o2ib) [1131474.364932] Lustre: Skipped 103 previous similar messages [1131855.498066] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1131855.531854] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.17.131@o2ib (279): c: 32, oc: 0, rc: 32 [1132133.635753] Lustre: MGS: Connection restored to 7b834378-4135-4e02-a270-2423e77f76f3 (at 10.151.0.165@o2ib) [1132133.635759] Lustre: Skipped 27 previous similar messages [1132775.886007] Lustre: MGS: Connection restored to 1d3bf6f2-a1be-f150-caba-05909e2b7a81 (at 10.151.36.84@o2ib) [1132775.886013] Lustre: Skipped 227 previous similar messages [1132978.629356] Lustre: nbp8-MDT0000: haven't heard from client a0875726-39bf-0214-4dd8-81d426d7f9f4 (at 10.151.31.205@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897a8eb47800, cur 1591815064 expire 1591814914 last 1591814837 [1133075.542777] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1133075.576557] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.38.225@o2ib (322): c: 30, oc: 0, rc: 32 [1133077.542929] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1133077.576716] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.38.230@o2ib (325): c: 30, oc: 0, rc: 32 [1133094.543562] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [1133094.577349] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 2 previous similar messages [1133094.611121] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.33.231@o2ib (342): c: 30, oc: 0, rc: 32 [1133094.652329] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 2 previous similar messages [1133591.201591] Lustre: MGS: Connection restored to 328daad6-f92f-2ad0-50e0-f4976da7d75b (at 10.151.51.123@o2ib) [1133591.201596] Lustre: Skipped 133 previous similar messages [1134332.790625] Lustre: MGS: Connection restored to 844fc16b-2203-93b9-369f-b0b6766b291f (at 10.151.29.64@o2ib) [1134332.790632] Lustre: Skipped 147 previous similar messages [1135013.787589] Lustre: MGS: Connection restored to c2d140ab-f9bd-23ff-13ec-46bf85c90fbd (at 10.151.34.131@o2ib) [1135013.787595] Lustre: Skipped 105 previous similar messages [1135612.635628] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1135612.669409] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.19.64@o2ib (282): c: 32, oc: 0, rc: 32 [1135628.295867] Lustre: MGS: Connection restored to 81df8a59-e8bc-88a0-d563-cb62440a832e (at 10.149.14.149@o2ib313) [1135628.295873] Lustre: Skipped 59 previous similar messages [1136234.199365] Lustre: MGS: Connection restored to 193d505f-8c5a-5998-f7bb-3beda6484aea (at 10.151.14.216@o2ib) [1136234.199370] Lustre: Skipped 83 previous similar messages [1136876.343370] Lustre: MGS: Connection restored to 7e4eb75b-3b41-6902-0d22-1de796666cb2 (at 10.151.54.159@o2ib) [1136876.343375] Lustre: Skipped 501 previous similar messages [1137485.321053] Lustre: MGS: Connection restored to 595dbf58-c97d-f04e-2429-f212451b9542 (at 10.151.38.158@o2ib) [1137485.321059] Lustre: Skipped 33 previous similar messages [1138140.989230] Lustre: MGS: Connection restored to d34a7f1e-aabe-16b7-559c-465c7c0a38c6 (at 10.151.28.208@o2ib) [1138140.989235] Lustre: Skipped 121 previous similar messages [1138850.071569] Lustre: MGS: Connection restored to c50e8e72-55e9-b08a-e54f-c7735a28a937 (at 10.151.54.152@o2ib) [1138850.071575] Lustre: Skipped 609 previous similar messages [1139537.111664] Lustre: MGS: Connection restored to a4b5ec7c-e596-26cb-4a7d-2e62b9bf84a1 (at 10.151.56.180@o2ib) [1139537.111670] Lustre: Skipped 149 previous similar messages [1140393.832203] Lustre: MGS: Connection restored to e8f4e8a8-72a7-0b09-4f97-fe721a271298 (at 10.151.7.84@o2ib) [1140393.832209] Lustre: Skipped 151 previous similar messages [1141186.788991] Lustre: MGS: Connection restored to 6f92d342-50eb-aad2-fbb8-d6cfd334ddc4 (at 10.151.16.69@o2ib) [1141186.788997] Lustre: Skipped 181 previous similar messages [1141810.142485] Lustre: MGS: Connection restored to 9558b3c3-1842-d7e1-9f0d-456e260294ef (at 10.151.31.159@o2ib) [1141810.142491] Lustre: Skipped 757 previous similar messages [1142460.731173] Lustre: MGS: Connection restored to 6b7b8cac-303e-8333-bed8-ceee45cb4a37 (at 10.151.24.188@o2ib) [1142460.731178] Lustre: Skipped 167 previous similar messages [1143103.452420] Lustre: MGS: Connection restored to cfa2ade4-f63c-df7a-1ed1-87d1d0368d55 (at 10.151.19.125@o2ib) [1143103.452425] Lustre: Skipped 27 previous similar messages [1143844.576019] Lustre: MGS: Connection restored to b46a6b69-1229-41a3-8711-16156f838b5b (at 10.151.32.48@o2ib) [1143844.576024] Lustre: Skipped 287 previous similar messages [1144462.563257] Lustre: MGS: Connection restored to c50e8e72-55e9-b08a-e54f-c7735a28a937 (at 10.151.54.152@o2ib) [1144462.563262] Lustre: Skipped 191 previous similar messages [1145103.284902] Lustre: MGS: Connection restored to cb1ae972-733b-48a0-841d-b2a3f079552c (at 10.151.35.166@o2ib) [1145103.284906] Lustre: Skipped 83 previous similar messages [1145256.078859] Lustre: nbp8-MDT0000: haven't heard from client a7d1c5ec-d95d-764e-9af4-2f39c7bab1da (at 10.141.2.23@o2ib417) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a07badb000, cur 1591827341 expire 1591827191 last 1591827114 [1145256.152137] Lustre: Skipped 9 previous similar messages [1145262.083126] Lustre: MGS: haven't heard from client c55eec71-7b85-bf0e-b8a7-10fa6b4ff634 (at 10.141.2.23@o2ib417) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897d4b047000, cur 1591827347 expire 1591827197 last 1591827120 [1145821.481432] Lustre: MGS: Connection restored to 0bc8a76d-a687-1049-edc8-6b1ff5cebdb2 (at 10.151.59.220@o2ib) [1145821.481438] Lustre: Skipped 183 previous similar messages [1146524.483834] Lustre: MGS: Connection restored to 883df511-918e-d63a-c05e-00ec76e92dac (at 10.151.37.224@o2ib) [1146524.483840] Lustre: Skipped 21 previous similar messages [1146749.134591] Lustre: nbp8-MDT0000: haven't heard from client 2a42d73d-1dcd-6894-2f92-c98bce6209db (at 10.153.12.82@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8993a4a1cc00, cur 1591828834 expire 1591828684 last 1591828607 [1147145.178755] Lustre: MGS: Connection restored to 7b49f8f8-2761-99fc-4a17-2d78a2f2ea84 (at 10.141.2.33@o2ib417) [1147145.178761] Lustre: Skipped 11 previous similar messages [1147573.075770] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [1147573.109551] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.54.76@o2ib (272): c: 32, oc: 0, rc: 32 [1147962.444844] Lustre: MGS: Connection restored to 66c6f4c6-3b3b-5846-bd2d-d59523baddd6 (at 10.151.30.192@o2ib) [1147962.444850] Lustre: Skipped 129 previous similar messages [1148664.683227] Lustre: MGS: Connection restored to 34d728e8-11f0-d907-b2aa-653a1add67b6 (at 10.151.24.12@o2ib) [1148664.683233] Lustre: Skipped 189 previous similar messages [1149186.135059] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [1149186.168839] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.32.46@o2ib (303): c: 32, oc: 0, rc: 32 [1149266.566516] Lustre: MGS: Connection restored to ae5168f3-720a-eee6-9c3f-1a6c7bd2b341 (at 10.151.7.130@o2ib) [1149266.566521] Lustre: Skipped 105 previous similar messages [1150068.169462] Lustre: MGS: Connection restored to a3b24be3-e3de-90e9-0616-fb82a84e3a5b (at 10.151.51.34@o2ib) [1150068.169468] Lustre: Skipped 311 previous similar messages [1150574.185949] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1150574.219733] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.53.172@o2ib (303): c: 32, oc: 0, rc: 32 [1150773.855943] Lustre: MGS: Connection restored to b3adebfe-1118-b6d9-1f36-d342231f2de8 (at 10.151.19.69@o2ib) [1150773.855950] Lustre: Skipped 263 previous similar messages [1151399.275502] Lustre: MGS: Connection restored to 63cb3bfc-2df9-12a0-76a3-bca61cd342a3 (at 10.151.50.154@o2ib) [1151399.275508] Lustre: Skipped 109 previous similar messages [1151470.217852] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1151470.251648] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.38.148@o2ib (272): c: 32, oc: 0, rc: 32 [1151773.228992] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1151773.262780] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.54.93@o2ib (286): c: 32, oc: 0, rc: 32 [1152104.947279] Lustre: MGS: Connection restored to b0644e63-d0ee-1295-9290-92412469501e (at 10.151.55.178@o2ib) [1152104.947285] Lustre: Skipped 93 previous similar messages [1152807.230083] Lustre: MGS: Connection restored to bc9eca0a-6de4-bd87-20ca-3c420ac369d3 (at 10.151.52.144@o2ib) [1152807.230088] Lustre: Skipped 699 previous similar messages [1153422.139852] Lustre: MGS: Connection restored to 6666833b-0be6-8aa4-f71b-23e5207f4b2e (at 10.151.23.181@o2ib) [1153422.139858] Lustre: Skipped 75 previous similar messages [1154084.905880] Lustre: MGS: Connection restored to eca8a829-4d52-3aee-246e-cb948cbfa8f0 (at 10.151.24.156@o2ib) [1154084.905885] Lustre: Skipped 9 previous similar messages [1154208.318149] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1154208.351944] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.13.107@o2ib (297): c: 32, oc: 0, rc: 32 [1154210.318136] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1154210.351923] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.13.110@o2ib (303): c: 32, oc: 0, rc: 32 [1154214.318287] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1154214.352097] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [1154214.385587] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.13.118@o2ib (304): c: 32, oc: 0, rc: 32 [1154214.426794] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [1154834.956223] Lustre: MGS: Connection restored to 78287f22-37e5-32b9-3e82-1f86a81f0bd1 (at 10.151.18.43@o2ib) [1154834.956229] Lustre: Skipped 187 previous similar messages [1155490.090771] Lustre: MGS: Connection restored to 3a9909f1-a2c8-ecbc-cd30-79262dcd0a4c (at 10.149.8.223@o2ib313) [1155490.090777] Lustre: Skipped 193 previous similar messages [1156136.715848] Lustre: MGS: Connection restored to e04aa286-8ffb-33bc-034f-d27bda706436 (at 10.151.24.122@o2ib) [1156136.715854] Lustre: Skipped 145 previous similar messages [1156777.581110] Lustre: MGS: Connection restored to 9a45c937-275f-d5c7-dfb6-6a69741f0020 (at 10.141.3.199@o2ib417) [1156777.581115] Lustre: Skipped 97 previous similar messages [1157162.427225] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [1157162.461003] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.53.17@o2ib (303): c: 32, oc: 0, rc: 32 [1157432.152672] Lustre: MGS: Connection restored to d211235d-6c38-ba8e-1ed0-bb7162f9e19d (at 10.151.27.26@o2ib) [1157432.152678] Lustre: Skipped 29 previous similar messages [1157775.448562] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1157775.482349] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.37.187@o2ib (290): c: 32, oc: 0, rc: 32 [1158053.622651] Lustre: MGS: Connection restored to 7bddff31-1a70-dece-c888-eaf67df382e3 (at 10.151.35.86@o2ib) [1158053.622658] Lustre: Skipped 463 previous similar messages [1158060.151460] LNet: 83168:0:(o2iblnd_cb.c:2602:kiblnd_passive_connect()) Conn stale 10.151.37.186@o2ib version 12/12 incarnation 1589676649872820/1591840064692587 [1158687.091450] Lustre: MGS: Connection restored to f9140604-0f7a-416d-d1d1-2b2b6c71eeb1 (at 10.149.1.114@o2ib313) [1158687.091456] Lustre: Skipped 51 previous similar messages [1159396.935017] Lustre: MGS: Connection restored to 7bf1e4ea-36a0-8675-0867-a90c57d84f2a (at 10.149.3.89@o2ib313) [1159396.935023] Lustre: Skipped 113 previous similar messages [1160002.653988] Lustre: MGS: Connection restored to b2b90a42-00de-3ea2-0b8e-6101bedbd728 (at 10.149.3.40@o2ib313) [1160002.653993] Lustre: Skipped 295 previous similar messages [1160612.290866] Lustre: MGS: Connection restored to bf9c69e4-6bab-9f03-a0d5-63dd777a5ad8 (at 10.151.56.151@o2ib) [1160612.290872] Lustre: Skipped 467 previous similar messages [1161299.829112] Lustre: MGS: Connection restored to 06512bed-51d7-050b-fc12-e53c15168547 (at 10.151.2.187@o2ib) [1161299.829118] Lustre: Skipped 17 previous similar messages [1162067.266689] Lustre: MGS: Connection restored to 8681211d-0a36-85d4-aec3-429909c6879d (at 10.151.9.191@o2ib) [1162067.266695] Lustre: Skipped 7 previous similar messages [1162706.976741] Lustre: MGS: Connection restored to e04aa286-8ffb-33bc-034f-d27bda706436 (at 10.151.24.122@o2ib) [1162706.976747] Lustre: Skipped 107 previous similar messages [1163119.733469] Lustre: nbp8-MDT0000: haven't heard from client fc055929-248e-e5a9-5cc7-6ce79a08bc58 (at 10.141.2.57@o2ib417) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89964f3b6c00, cur 1591845204 expire 1591845054 last 1591844977 [1163119.806751] Lustre: Skipped 1 previous similar message [1163195.737488] Lustre: nbp8-MDT0000: haven't heard from client 3053c116-b8dc-e21e-b0fa-9badc697bd89 (at 10.149.7.101@o2ib313) in 222 seconds. I think it's dead, and I am evicting it. exp ffff899f6623d400, cur 1591845280 expire 1591845130 last 1591845058 [1163195.811060] Lustre: Skipped 1 previous similar message [1163338.185369] Lustre: MGS: Connection restored to 376340ca-f973-a5c2-a77d-e43ece77f629 (at 10.151.31.62@o2ib) [1163338.185375] Lustre: Skipped 133 previous similar messages [1163338.451721] LNet: 43884:0:(o2iblnd_cb.c:2602:kiblnd_passive_connect()) Conn stale 10.151.37.147@o2ib version 12/12 incarnation 1591806880690410/1591845357256125 [1163950.878948] Lustre: MGS: Connection restored to 9db15f64-f0e5-ef83-c162-0e521fd67b63 (at 10.151.16.75@o2ib) [1163950.878954] Lustre: Skipped 81 previous similar messages [1164115.772574] Lustre: nbp8-MDT0000: haven't heard from client 3040cd61-45e8-3f34-e0c0-e4156317160c (at 10.151.26.3@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a3ba674000, cur 1591846200 expire 1591846050 last 1591845973 [1164115.844976] Lustre: Skipped 1 previous similar message [1164259.773946] LNet: 43884:0:(o2iblnd_cb.c:2602:kiblnd_passive_connect()) Conn stale 10.151.26.3@o2ib version 12/12 incarnation 1591765846728850/1591846335317048 [1164563.098236] Lustre: MGS: Connection restored to 50213f50-8e05-5122-9f98-4d96c5dc8d6c (at 10.151.55.167@o2ib) [1164563.098242] Lustre: Skipped 21 previous similar messages [1165353.868145] Lustre: MGS: Connection restored to 8ea310d3-62a3-493b-7f89-28a43478d9ac (at 10.151.56.29@o2ib) [1165353.868150] Lustre: Skipped 117 previous similar messages [1166686.963295] Lustre: MGS: Connection restored to e501cbb4-0c12-d848-efb2-0241d5469d4d (at 10.151.55.182@o2ib) [1166686.963300] Lustre: Skipped 79 previous similar messages [1166700.775981] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1166700.809768] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.56.145@o2ib (303): c: 32, oc: 0, rc: 32 [1166786.274931] Lustre: MGS: Connection restored to 2ac85285-ce5a-514f-62f0-a1233bcfe6a5 (at 10.151.43.77@o2ib) [1166786.274937] Lustre: Skipped 3 previous similar messages [1167008.224442] Lustre: MGS: Connection restored to 1181e271-a412-496b-ad44-8bf244e4783a (at 10.151.55.158@o2ib) [1167008.224449] Lustre: Skipped 1 previous similar message [1167318.311684] Lustre: MGS: Connection restored to c03d7efa-e8a7-bd6e-986e-dc572364cafd (at 10.149.13.41@o2ib313) [1167318.311691] Lustre: Skipped 67 previous similar messages [1167348.800748] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [1167348.834535] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.3.60@o2ib (303): c: 32, oc: 0, rc: 32 [1167965.690861] Lustre: MGS: Connection restored to adbdc4de-fcb8-2c86-44c6-642e41f3d169 (at 10.151.53.41@o2ib) [1167965.690866] Lustre: Skipped 191 previous similar messages [1168535.843601] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1168535.877385] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.50.99@o2ib (214): c: 32, oc: 0, rc: 32 [1168575.583370] Lustre: MGS: Connection restored to 1ed23c0b-8492-e76a-50d5-ef70f6efbf6d (at 10.151.24.148@o2ib) [1168575.583376] Lustre: Skipped 23 previous similar messages [1168635.848285] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [1168635.882064] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.50.97@o2ib (303): c: 32, oc: 0, rc: 32 [1169143.865960] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1169143.899738] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.54.88@o2ib (298): c: 32, oc: 0, rc: 32 [1169251.863433] Lustre: MGS: Connection restored to ffc6b186-fc79-cae8-56e4-9e48190191ca (at 10.151.15.218@o2ib) [1169251.863440] Lustre: Skipped 83 previous similar messages [1169892.550771] Lustre: MGS: Connection restored to 3ba50e29-281a-2346-7450-e414786d5800 (at 10.151.23.129@o2ib) [1169892.550777] Lustre: Skipped 21 previous similar messages [1170547.463445] Lustre: MGS: Connection restored to d0efc896-5223-345a-46ba-2a5e2a43cfa6 (at 10.151.44.55@o2ib) [1170547.463450] Lustre: Skipped 35 previous similar messages [1171167.837148] Lustre: MGS: Connection restored to 507c84e3-c723-7f3e-2667-f410fb784a8a (at 10.151.0.128@o2ib) [1171167.837153] Lustre: Skipped 151 previous similar messages [1171791.669324] Lustre: MGS: Connection restored to e6ddd8ee-61d3-6d22-acc3-78e4983d0230 (at 10.149.7.227@o2ib313) [1171791.669330] Lustre: Skipped 91 previous similar messages [1172456.080814] Lustre: MGS: Connection restored to d34a7f1e-aabe-16b7-559c-465c7c0a38c6 (at 10.151.28.208@o2ib) [1172456.080820] Lustre: Skipped 323 previous similar messages [1173112.652517] Lustre: MGS: Connection restored to 25b3f01d-18bb-faef-6c82-8f5dea277c60 (at 10.151.28.232@o2ib) [1173112.652523] Lustre: Skipped 263 previous similar messages [1173735.962723] Lustre: MGS: Connection restored to e2b0bb37-e81c-cdd5-ceda-89e7343cedad (at 10.149.9.92@o2ib313) [1173735.962729] Lustre: Skipped 325 previous similar messages [1174086.138281] Lustre: nbp8-MDT0000: haven't heard from client e46a1594-6780-5113-3f34-a278121c17c3 (at 10.151.4.119@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897dbb496000, cur 1591856170 expire 1591856020 last 1591855943 [1174086.210941] Lustre: Skipped 1 previous similar message [1174100.146995] Lustre: MGS: haven't heard from client be557a3d-cf09-e867-043a-a405c0bcd956 (at 10.151.4.119@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897a2f3b6c00, cur 1591856184 expire 1591856034 last 1591855957 [1174223.053299] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1174223.087078] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.4.119@o2ib (349): c: 31, oc: 0, rc: 32 [1174385.132530] Lustre: MGS: Connection restored to dc56b0bf-7342-99e3-256a-18feb35a252d (at 10.149.15.46@o2ib313) [1174385.132536] Lustre: Skipped 60 previous similar messages [1175072.895139] Lustre: MGS: Connection restored to a8d2ea3f-7452-a9c5-c076-fb0722f2c18d (at 10.149.10.69@o2ib313) [1175072.895148] Lustre: Skipped 124 previous similar messages [1175723.276522] Lustre: MGS: Connection restored to 92fe2e1b-f270-4401-3600-9a66b1703739 (at 10.151.54.126@o2ib) [1175723.276528] Lustre: Skipped 163 previous similar messages [1176385.483720] Lustre: MGS: Connection restored to 6e57124e-fba3-98a7-adef-acf64a98e4cb (at 10.151.38.135@o2ib) [1176385.483726] Lustre: Skipped 291 previous similar messages [1177053.079802] Lustre: MGS: Connection restored to 5b275745-11ae-1ff3-99af-6d1b007fed55 (at 10.151.18.216@o2ib) [1177053.079808] Lustre: Skipped 51 previous similar messages [1177877.855690] Lustre: MGS: Connection restored to 0d611e38-bf37-8e00-5882-7c1bace1839e (at 10.151.50.102@o2ib) [1177877.855696] Lustre: Skipped 89 previous similar messages [1178158.542445] Process accounting resumed [1178600.432293] Lustre: MGS: Connection restored to ed5cdcb7-d279-3ae2-a2c0-ec6964daa1c0 (at 10.151.53.145@o2ib) [1178600.432299] Lustre: Skipped 5 previous similar messages [1179346.156062] Lustre: MGS: Connection restored to ab10de21-e80b-081b-4d44-cb743284db9b (at 10.151.17.41@o2ib) [1179346.156067] Lustre: Skipped 15 previous similar messages [1179983.863576] Lustre: MGS: Connection restored to 0aec98a6-da1c-d0f1-5c0c-e6671b3c19ba (at 10.151.11.17@o2ib) [1179983.863582] Lustre: Skipped 3 previous similar messages [1180650.242789] Lustre: MGS: Connection restored to 8681211d-0a36-85d4-aec3-429909c6879d (at 10.151.9.191@o2ib) [1180650.242795] Lustre: Skipped 15 previous similar messages [1181278.036043] Lustre: MGS: Connection restored to 21a8d1f3-f7e5-1758-af01-c28781a6c765 (at 10.151.54.131@o2ib) [1181278.036049] Lustre: Skipped 133 previous similar messages [1182045.353307] Lustre: MGS: Connection restored to 1288088f-168d-a1c7-ced4-7f8d94ba61c2 (at 10.141.2.118@o2ib417) [1182045.353313] Lustre: Skipped 73 previous similar messages [1182438.355628] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1182438.389398] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.53.66@o2ib (303): c: 32, oc: 0, rc: 32 [1182887.476110] Lustre: MGS: Connection restored to d6504519-9be1-a809-c4b8-b9e3921a7252 (at 10.149.9.190@o2ib313) [1182887.476115] Lustre: Skipped 23 previous similar messages [1183719.451304] Lustre: MGS: Connection restored to e70bce5a-a073-6004-c7f9-6e3c321fdc1c (at 10.149.14.99@o2ib313) [1183719.451309] Lustre: Skipped 67 previous similar messages [1184225.422640] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [1184225.456425] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.55.99@o2ib (302): c: 31, oc: 0, rc: 32 [1184229.421753] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1184229.455541] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.55.107@o2ib (305): c: 31, oc: 0, rc: 32 [1184231.421817] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1184231.455604] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.55.111@o2ib (308): c: 31, oc: 0, rc: 32 [1184235.421983] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1184235.455768] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.55.18@o2ib (308): c: 31, oc: 0, rc: 32 [1184246.422395] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1184246.456140] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.55.40@o2ib (319): c: 31, oc: 0, rc: 32 [1184267.424137] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [1184267.457932] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.55.82@o2ib (343): c: 31, oc: 0, rc: 32 [1184421.741364] Lustre: MGS: Connection restored to 358f679c-a018-977c-ca83-ac0c46b8eab5 (at 10.151.23.186@o2ib) [1184421.741369] Lustre: Skipped 167 previous similar messages [1184437.520513] Lustre: nbp8-MDT0000: haven't heard from client 84049af0-ba8e-d5a9-8c94-66768dadc0eb (at 10.151.2.136@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8974acf73000, cur 1591866521 expire 1591866371 last 1591866294 [1184513.524240] Lustre: nbp8-MDT0000: haven't heard from client 96fd9a76-da6a-9072-c75a-fb63c9ceb465 (at 10.151.2.174@o2ib) in 197 seconds. I think it's dead, and I am evicting it. exp ffff89a3c0384400, cur 1591866597 expire 1591866447 last 1591866400 [1184513.596955] Lustre: Skipped 7 previous similar messages [1184530.432771] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1184530.466564] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 2 previous similar messages [1184530.500344] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.2.136@o2ib (320): c: 30, oc: 0, rc: 32 [1184530.541266] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 2 previous similar messages [1184638.436828] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1184638.470622] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 3 previous similar messages [1184638.504410] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.2.151@o2ib (303): c: 30, oc: 0, rc: 32 [1184638.545332] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 3 previous similar messages [1184660.960992] LNet: 45706:0:(o2iblnd_cb.c:2602:kiblnd_passive_connect()) Conn stale 10.151.2.197@o2ib version 12/12 incarnation 1591764576384699/1591866577273681 [1184940.544260] Lustre: nbp8-MDT0000: haven't heard from client b5476dc6-5d6c-3554-9f90-a6e994b51d6b (at 10.151.2.201@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a02737e000, cur 1591867024 expire 1591866874 last 1591866797 [1184940.616968] Lustre: Skipped 7 previous similar messages [1185017.450745] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1185017.484540] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 2 previous similar messages [1185017.518312] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.2.201@o2ib (304): c: 30, oc: 0, rc: 32 [1185017.559230] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 2 previous similar messages [1185061.067274] Lustre: MGS: Connection restored to 11f8f83f-6c9e-9b10-a0a2-59853a80f854 (at 10.151.6.13@o2ib) [1185061.067280] Lustre: Skipped 35 previous similar messages [1185116.544386] Lustre: nbp8-MDT0000: haven't heard from client 49215a3c-f563-19dd-bd3b-4b9181cb98c7 (at 10.151.3.30@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8980cebe5c00, cur 1591867200 expire 1591867050 last 1591866973 [1185116.616833] Lustre: Skipped 7 previous similar messages [1185211.457897] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1185211.491689] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 3 previous similar messages [1185211.525454] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.3.30@o2ib (322): c: 31, oc: 0, rc: 32 [1185211.566088] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 3 previous similar messages [1185313.552544] Lustre: nbp8-MDT0000: haven't heard from client 7e215450-08f3-c0e2-6383-d49b327215e9 (at 10.151.6.13@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897562ab7000, cur 1591867397 expire 1591867247 last 1591867170 [1185313.624974] Lustre: Skipped 9 previous similar messages [1185771.049785] Lustre: MGS: Connection restored to 160fc507-f15b-8c47-568d-b0626b649fdc (at 10.149.2.129@o2ib313) [1185771.049791] Lustre: Skipped 62 previous similar messages [1186463.219605] Lustre: MGS: Connection restored to be896443-e400-ea39-3fc5-4787a3d96afb (at 10.151.10.148@o2ib) [1186463.219610] Lustre: Skipped 607 previous similar messages [1187192.060977] Lustre: MGS: Connection restored to cdfcfed0-4a96-a00d-5260-600ec85d1569 (at 10.151.9.16@o2ib) [1187192.060983] Lustre: Skipped 29 previous similar messages [1187815.554139] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1187815.587948] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 8 previous similar messages [1187815.621720] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.0.48@o2ib (304): c: 32, oc: 0, rc: 32 [1187815.662346] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 8 previous similar messages [1187865.637627] Lustre: MGS: Connection restored to e7d60ef3-f262-7555-1e4e-62dec98d4c0d (at 10.151.47.73@o2ib) [1187865.637632] Lustre: Skipped 133 previous similar messages [1188410.575999] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1188410.609787] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.24.143@o2ib (302): c: 32, oc: 0, rc: 32 [1188518.153229] Lustre: MGS: Connection restored to 44b6579d-6c4f-b381-07e7-77023a73cdca (at 10.151.36.67@o2ib) [1188518.153234] Lustre: Skipped 133 previous similar messages [1189183.057766] Lustre: MGS: Connection restored to 47402325-8a25-4a37-371e-8228960b3263 (at 10.151.2.51@o2ib) [1189183.057772] Lustre: Skipped 197 previous similar messages [1189784.878254] Lustre: MGS: Connection restored to b382c3d0-c73f-96ec-559d-3103650b82c8 (at 10.151.55.179@o2ib) [1189784.878260] Lustre: Skipped 81 previous similar messages [1190583.623787] Lustre: MGS: Connection restored to ede42a6c-9d91-5980-df03-02d9708448a9 (at 10.149.14.19@o2ib313) [1190583.623793] Lustre: Skipped 23 previous similar messages [1191338.136310] Lustre: MGS: Connection restored to bd27729b-9aea-ca43-8f97-140f9575c053 (at 10.151.30.69@o2ib) [1191338.136315] Lustre: Skipped 419 previous similar messages [1192991.866691] Lustre: MGS: Connection restored to 334c4ef2-5b45-49a5-f7ad-06f2b4fbb222 (at 10.151.35.21@o2ib) [1192991.866697] Lustre: Skipped 163 previous similar messages [1193442.756896] Lustre: MGS: Connection restored to 9d21eda3-8f0d-0cd0-a1c7-6d6779bb9357 (at 10.151.10.118@o2ib) [1193442.756902] Lustre: Skipped 29 previous similar messages [1193666.462710] Lustre: MGS: Connection restored to ab4ae0d9-731c-7a8c-acf5-c31d954d290b (at 10.151.37.123@o2ib) [1193666.462715] Lustre: Skipped 241 previous similar messages [1194061.206949] Lustre: MGS: Connection restored to 7254d0d2-948e-e40d-fb1e-02b516106534 (at 10.149.15.71@o2ib313) [1194061.206955] Lustre: Skipped 193 previous similar messages [1194368.892702] Lustre: nbp8-MDT0000: haven't heard from client 5e55ffdb-93da-054e-638b-fff42751a0e6 (at 10.141.6.53@o2ib417) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899f7e3c0c00, cur 1591876452 expire 1591876302 last 1591876225 [1194368.965973] Lustre: Skipped 7 previous similar messages [1194694.848819] Lustre: MGS: Connection restored to 9d4a467e-6118-e8bf-86cb-a43caceaf7f9 (at 10.153.10.12@o2ib233) [1194694.848824] Lustre: Skipped 67 previous similar messages [1195411.650169] Lustre: MGS: Connection restored to 13d35757-3696-33ee-e44f-cff9d98a6edf (at 10.149.1.164@o2ib313) [1195411.650175] Lustre: Skipped 2499 previous similar messages [1196393.528427] Lustre: MGS: Connection restored to 9e1f58d0-acc9-498f-989f-4588d2ba53a5 (at 10.149.15.62@o2ib313) [1196393.528432] Lustre: Skipped 329 previous similar messages [1197155.872984] Lustre: MGS: Connection restored to f60ff9de-2f22-be6a-1fc0-e7b2b8739136 (at 10.151.54.151@o2ib) [1197155.872990] Lustre: Skipped 959 previous similar messages [1198241.067386] Lustre: MGS: Connection restored to 9c65b6a4-0c74-a44a-26da-b732a2f363fc (at 10.151.7.71@o2ib) [1198241.067391] Lustre: Skipped 223 previous similar messages [1198860.031413] Lustre: MGS: Connection restored to 1d271b4c-4f1f-16ba-1776-69afafdee991 (at 10.151.28.94@o2ib) [1198860.031419] Lustre: Skipped 1 previous similar message [1199956.206632] Lustre: MGS: Connection restored to 68bf2ead-cd41-1a9c-4120-6c05bb913c49 (at 10.149.12.157@o2ib313) [1199956.206638] Lustre: Skipped 191 previous similar messages [1200788.910053] Lustre: MGS: Connection restored to b8d95100-2b1b-f0e1-a23e-6691c941ea1d (at 10.149.3.214@o2ib313) [1200788.910059] Lustre: Skipped 187 previous similar messages [1202076.018507] Lustre: MGS: Connection restored to bd67eebe-76f4-ceaa-162d-5088c609a2f0 (at 10.141.7.59@o2ib417) [1202076.018513] Lustre: Skipped 71 previous similar messages [1202271.034239] Lustre: MGS: Connection restored to 75e14dc5-81c8-ee59-f32c-8a8f7ffffc4f (at 10.151.32.41@o2ib) [1202271.034244] Lustre: Skipped 5 previous similar messages [1202677.993528] Lustre: MGS: Connection restored to ed07eb39-6762-be17-4826-391c5b94a912 (at 10.151.35.48@o2ib) [1202677.993534] Lustre: Skipped 531 previous similar messages [1203143.796921] Lustre: MGS: Connection restored to 0439b7e9-be09-01ea-fcb7-b52dd00e6bbb (at 10.151.3.52@o2ib) [1203143.796927] Lustre: Skipped 71 previous similar messages [1203795.023764] Lustre: MGS: Connection restored to ed71334e-02c0-8898-c74d-aed5a51ca31e (at 10.149.6.47@o2ib313) [1203795.023769] Lustre: Skipped 303 previous similar messages [1204556.329714] Lustre: MGS: Connection restored to a041e8bc-df77-f027-14ae-2d51c857c6e8 (at 10.151.33.200@o2ib) [1204556.329718] Lustre: Skipped 121 previous similar messages [1205165.240850] Lustre: MGS: Connection restored to 4febddb7-20d1-cdb6-6562-55d819b59217 (at 10.149.8.217@o2ib313) [1205165.240855] Lustre: Skipped 157 previous similar messages [1205883.718073] Lustre: MGS: Connection restored to abcc727a-ffa9-b452-131d-58219112ee2e (at 10.149.9.193@o2ib313) [1205883.718078] Lustre: Skipped 177 previous similar messages [1206381.238748] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1206381.272535] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.46.51@o2ib (303): c: 32, oc: 0, rc: 32 [1206398.239399] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1206398.273198] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [1206398.306685] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.50.172@o2ib (304): c: 32, oc: 0, rc: 32 [1206398.347893] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [1206423.240288] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1206423.274079] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [1206423.307567] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.53.60@o2ib (292): c: 32, oc: 0, rc: 32 [1206423.348487] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [1206605.096342] Lustre: MGS: Connection restored to dc82c09d-cce4-4839-aaf4-b561803f1a3c (at 10.151.35.126@o2ib) [1206605.096347] Lustre: Skipped 327 previous similar messages [1207243.270539] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1207243.304325] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.46.58@o2ib (303): c: 32, oc: 0, rc: 32 [1207251.447404] Lustre: MGS: Connection restored to fd8baeeb-6822-1b16-bf68-b98bafc520a2 (at 10.151.24.161@o2ib) [1207251.447410] Lustre: Skipped 187 previous similar messages [1207556.371687] Lustre: MGS: haven't heard from client 7fd1c536-a8b2-dff0-a05b-5a1fbe814249 (at 10.151.27.19@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a3ab25fc00, cur 1591889639 expire 1591889489 last 1591889412 [1207556.441813] Lustre: Skipped 1 previous similar message [1207913.275713] Lustre: MGS: Connection restored to dd9fa453-03b2-e69a-1f97-dd51973a0f2f (at 10.141.2.45@o2ib417) [1207913.275718] Lustre: Skipped 57 previous similar messages [1208210.397132] Lustre: MGS: haven't heard from client 7fd1c536-a8b2-dff0-a05b-5a1fbe814249 (at 10.151.27.19@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a010e2cc00, cur 1591890293 expire 1591890143 last 1591890066 [1208580.669671] Lustre: MGS: Connection restored to 10b6f743-c649-58e9-7c3e-12abc25adcd0 (at 10.151.39.112@o2ib) [1208580.669677] Lustre: Skipped 103 previous similar messages [1208857.418989] Lustre: nbp8-MDT0000: haven't heard from client 5a020cb8-4958-073f-256f-c345d8883b3e (at 10.151.27.19@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a3276a1400, cur 1591890940 expire 1591890790 last 1591890713 [1209073.430487] Lustre: MGS: haven't heard from client 7fd1c536-a8b2-dff0-a05b-5a1fbe814249 (at 10.151.27.19@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897259537800, cur 1591891156 expire 1591891006 last 1591890929 [1209183.176153] Lustre: MGS: Connection restored to 211389ea-0dfb-aa86-f6f0-9be827bab283 (at 10.141.3.19@o2ib417) [1209183.176159] Lustre: Skipped 32 previous similar messages [1209257.434814] Lustre: nbp8-MDT0000: haven't heard from client 5a020cb8-4958-073f-256f-c345d8883b3e (at 10.151.27.19@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a010e2a400, cur 1591891340 expire 1591891190 last 1591891113 [1209801.306464] Lustre: MGS: Connection restored to 66fae42f-8bef-6f1b-6967-4279fb40aec1 (at 10.151.56.131@o2ib) [1209801.306470] Lustre: Skipped 73 previous similar messages [1210817.030277] LNet: 4180:0:(o2iblnd_cb.c:2602:kiblnd_passive_connect()) Conn stale 10.151.27.19@o2ib version 12/12 incarnation 1590085907425217/1591892894643483 [1210817.084253] Lustre: MGS: Connection restored to 7fd1c536-a8b2-dff0-a05b-5a1fbe814249 (at 10.151.27.19@o2ib) [1210817.084257] Lustre: Skipped 57 previous similar messages [1211449.431523] Lustre: MGS: Connection restored to 188f4164-4e56-0d28-baa9-af3beaaee61d (at 10.151.49.232@o2ib) [1211449.431529] Lustre: Skipped 77 previous similar messages [1212221.658850] Lustre: MGS: Connection restored to c244d7f5-0bf0-47a9-c75f-ecfa076678e6 (at 10.151.18.53@o2ib) [1212221.658855] Lustre: Skipped 53 previous similar messages [1212891.615971] Lustre: MGS: Connection restored to 7717b050-c91c-e04f-89e9-c199a14f3200 (at 10.149.3.7@o2ib313) [1212891.615977] Lustre: Skipped 13 previous similar messages [1213505.332749] Lustre: MGS: Connection restored to a082d163-ff02-4109-15be-b860c13fb9f9 (at 10.151.33.146@o2ib) [1213505.332755] Lustre: Skipped 219 previous similar messages [1214140.177004] Lustre: MGS: Connection restored to 5f476d21-99b3-661d-6a27-f24fd19bcf4f (at 10.151.54.70@o2ib) [1214140.177009] Lustre: Skipped 213 previous similar messages [1214902.008185] Lustre: MGS: Connection restored to bd67a9e6-d853-1004-508c-435743387b14 (at 10.149.9.243@o2ib313) [1214902.008191] Lustre: Skipped 81 previous similar messages [1215449.572585] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1215449.606372] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.4.153@o2ib (325): c: 31, oc: 0, rc: 32 [1215456.572862] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1215456.606649] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.5.113@o2ib (332): c: 31, oc: 0, rc: 32 [1215465.574240] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [1215465.608032] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.5.31@o2ib (341): c: 31, oc: 0, rc: 32 [1215524.003231] Lustre: MGS: Connection restored to 96e55d1f-2741-d69e-e50a-bb5f996a274b (at 10.151.2.105@o2ib) [1215524.003236] Lustre: Skipped 263 previous similar messages [1216129.186349] Lustre: MGS: Connection restored to e8f4e8a8-72a7-0b09-4f97-fe721a271298 (at 10.151.7.84@o2ib) [1216129.186355] Lustre: Skipped 169 previous similar messages [1216731.982261] Lustre: MGS: Connection restored to 92e73483-9cce-5be1-598f-ea0fb7f6c158 (at 10.151.47.65@o2ib) [1216731.982266] Lustre: Skipped 185 previous similar messages [1217140.634693] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1217140.668479] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.22.240@o2ib (300): c: 32, oc: 0, rc: 32 [1217143.634802] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1217143.668583] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.19.206@o2ib (238): c: 32, oc: 0, rc: 32 [1217147.634975] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1217147.668764] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.22.52@o2ib (303): c: 32, oc: 0, rc: 32 [1217175.636047] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1217175.669831] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [1217175.703322] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.19.170@o2ib (284): c: 32, oc: 0, rc: 32 [1217175.744529] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [1217200.636908] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1217200.670684] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 2 previous similar messages [1217200.704472] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.19.219@o2ib (299): c: 32, oc: 0, rc: 32 [1217200.745678] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 2 previous similar messages [1217212.637415] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1217212.671199] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 6 previous similar messages [1217212.704976] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.19.143@o2ib (304): c: 32, oc: 0, rc: 32 [1217212.746190] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 6 previous similar messages [1217234.638233] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1217234.672004] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 3 previous similar messages [1217234.705776] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.19.187@o2ib (304): c: 32, oc: 0, rc: 32 [1217234.746983] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 3 previous similar messages [1217609.087715] Lustre: MGS: Connection restored to dbcedce6-a0e4-dec2-0d7b-6abe96063990 (at 10.151.49.171@o2ib) [1217609.087720] Lustre: Skipped 59 previous similar messages [1218248.569955] Lustre: MGS: Connection restored to 7e3ab9b2-8eef-06ef-5ddc-65fabd3767a2 (at 10.151.28.97@o2ib) [1218248.569961] Lustre: Skipped 215 previous similar messages [1219049.146632] Lustre: MGS: Connection restored to 1c0f817c-3480-2986-6f7e-20ce9794d9ab (at 10.151.37.93@o2ib) [1219049.146637] Lustre: Skipped 65 previous similar messages [1219724.064851] Lustre: MGS: Connection restored to 82226dfe-fe47-f8c4-b91f-76ab91536f95 (at 10.151.8.189@o2ib) [1219724.064857] Lustre: Skipped 643 previous similar messages [1220431.441284] Lustre: MGS: Connection restored to cc82af0d-df74-bb9e-a937-f1e621bbbedc (at 10.151.10.236@o2ib) [1220431.441289] Lustre: Skipped 179 previous similar messages [1221214.022176] Lustre: MGS: Connection restored to bb019e78-5cbf-186b-2863-02099c5b2441 (at 10.151.23.10@o2ib) [1221214.022182] Lustre: Skipped 149 previous similar messages [1221275.786202] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1221275.819983] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 2 previous similar messages [1221275.853761] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.29.153@o2ib (304): c: 32, oc: 0, rc: 32 [1221275.894967] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 2 previous similar messages [1221823.151731] Lustre: MGS: Connection restored to 6b70b08e-aa20-7acb-0e34-9b2a6b6423f8 (at 10.149.1.45@o2ib313) [1221823.151737] Lustre: Skipped 25 previous similar messages [1222564.721871] Lustre: MGS: Connection restored to ed066bd9-0bab-4267-6106-18de508c016d (at 10.149.9.34@o2ib313) [1222564.721877] Lustre: Skipped 141 previous similar messages [1223391.426753] Lustre: MGS: Connection restored to 44396008-8470-2e98-ca5a-d708f7dc8f21 (at 10.151.54.122@o2ib) [1223391.426759] Lustre: Skipped 97 previous similar messages [1224049.948769] Lustre: MGS: Connection restored to d792b5db-e078-5ec7-3be6-e8f4da1137d7 (at 10.149.9.83@o2ib313) [1224049.948775] Lustre: Skipped 739 previous similar messages [1224424.901759] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1224424.935546] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.54.52@o2ib (303): c: 32, oc: 0, rc: 32 [1224703.959967] Lustre: MGS: Connection restored to 2e6cc585-bccc-b5ad-4570-27e2d8acd02e (at 10.149.9.84@o2ib313) [1224703.959973] Lustre: Skipped 277 previous similar messages [1225034.924091] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1225034.957880] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.53.214@o2ib (224): c: 32, oc: 0, rc: 32 [1225037.925199] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [1225037.958983] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 2 previous similar messages [1225037.992757] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.1.99@o2ib (304): c: 32, oc: 0, rc: 32 [1225038.033384] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 2 previous similar messages [1225042.924461] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [1225042.958246] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [1225042.991745] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.59.210@o2ib (268): c: 32, oc: 0, rc: 32 [1225043.032950] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [1225055.924924] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1225055.958708] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.59.236@o2ib (303): c: 32, oc: 0, rc: 32 [1225071.925538] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [1225071.959324] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.28.225@o2ib (299): c: 32, oc: 0, rc: 32 [1225090.926202] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1225090.959987] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [1225090.993488] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.1.105@o2ib (304): c: 32, oc: 0, rc: 32 [1225091.034401] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [1225378.666425] Lustre: MGS: Connection restored to 6206d401-b38d-1f40-14ad-beacb7f96aa1 (at 10.151.49.216@o2ib) [1225378.666430] Lustre: Skipped 239 previous similar messages [1226036.798989] Lustre: MGS: Connection restored to 56042238-17ff-e43c-7a9f-3d2465e58f19 (at 10.151.3.47@o2ib) [1226036.798995] Lustre: Skipped 199 previous similar messages [1226039.961156] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1226039.994950] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 3 previous similar messages [1226040.028718] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.30.134@o2ib (304): c: 32, oc: 0, rc: 32 [1226040.069918] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 3 previous similar messages [1226066.962035] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1226066.995815] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.35.119@o2ib (303): c: 32, oc: 0, rc: 32 [1226232.968231] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1226233.002023] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [1226233.035510] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.46.60@o2ib (304): c: 32, oc: 0, rc: 32 [1226233.076431] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [1226637.462517] Lustre: MGS: Connection restored to 547b9623-d1a5-71e9-a26a-5ea6524586b4 (at 10.151.3.31@o2ib) [1226637.462522] Lustre: Skipped 81 previous similar messages [1227302.065359] Lustre: MGS: Connection restored to 81123f69-287f-05d1-3877-ff32a08a7988 (at 10.151.6.182@o2ib) [1227302.065365] Lustre: Skipped 117 previous similar messages [1228079.679460] Lustre: MGS: Connection restored to 91deb8cf-6d6c-b821-8a39-ab152f355a49 (at 10.151.35.194@o2ib) [1228079.679466] Lustre: Skipped 49 previous similar messages [1228791.596105] Lustre: MGS: Connection restored to 09af0810-d3df-a8aa-eb43-dfff707dbd89 (at 10.151.48.223@o2ib) [1228791.596111] Lustre: Skipped 41 previous similar messages [1229409.763319] Lustre: MGS: Connection restored to 42d0e820-e0c1-f8a2-ecc5-f11c6fd5c663 (at 10.149.3.11@o2ib313) [1229409.763325] Lustre: Skipped 81 previous similar messages [1230010.237445] Lustre: MGS: Connection restored to 03aa1959-ee29-6944-278e-fd78497a9bcb (at 10.149.12.143@o2ib313) [1230010.237451] Lustre: Skipped 291 previous similar messages [1230689.808260] Lustre: MGS: Connection restored to 7e3ab9b2-8eef-06ef-5ddc-65fabd3767a2 (at 10.151.28.97@o2ib) [1230689.808264] Lustre: Skipped 185 previous similar messages [1231553.217195] Lustre: MGS: Connection restored to ad54b03b-55dc-fb01-1127-9fa6507a5267 (at 10.151.31.228@o2ib) [1231553.217201] Lustre: Skipped 179 previous similar messages [1232339.303701] Lustre: MGS: Connection restored to 25c9cb27-29ed-5e91-274c-831c5a1dae2a (at 10.151.19.40@o2ib) [1232339.303707] Lustre: Skipped 99 previous similar messages [1233032.437059] Lustre: MGS: Connection restored to 5ef20329-07e2-be6c-68d4-e47bf38f59dd (at 10.151.36.53@o2ib) [1233032.437065] Lustre: Skipped 9 previous similar messages [1233385.323677] Lustre: nbp8-MDT0000: haven't heard from client ca0e9813-60b3-73ed-01a9-c8e2bb67f034 (at 10.153.17.56@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8983d7f6a400, cur 1591915467 expire 1591915317 last 1591915240 [1233647.874269] Lustre: MGS: Connection restored to c392ba68-8603-56b2-e9a6-88d2c8b58eff (at 10.151.1.226@o2ib) [1233647.874275] Lustre: Skipped 535 previous similar messages [1234547.210579] Lustre: MGS: Connection restored to 9805192f-4684-b163-f68a-a0e12d278da8 (at 10.151.31.116@o2ib) [1234547.210585] Lustre: Skipped 89 previous similar messages [1235394.515872] Lustre: MGS: Connection restored to 06ca9284-da8c-feaf-8520-b650ed3d379a (at 10.151.8.35@o2ib) [1235394.515877] Lustre: Skipped 365 previous similar messages [1236049.763387] Lustre: MGS: Connection restored to e8f4e8a8-72a7-0b09-4f97-fe721a271298 (at 10.151.7.84@o2ib) [1236049.763393] Lustre: Skipped 67 previous similar messages [1236768.356538] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1236768.390317] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.30.178@o2ib (303): c: 32, oc: 0, rc: 32 [1236804.471764] Lustre: MGS: Connection restored to cfd65234-a1c6-a41d-a9f5-92798a0b911e (at 10.151.3.35@o2ib) [1236804.471774] Lustre: Skipped 31 previous similar messages [1237437.732246] Lustre: MGS: Connection restored to c25ff042-a40c-74df-9a6b-e7d68c8cb7b5 (at 10.141.6.106@o2ib417) [1237437.732251] Lustre: Skipped 253 previous similar messages [1237480.382583] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1237480.416376] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.28.194@o2ib (229): c: 32, oc: 0, rc: 32 [1238681.614020] Lustre: MGS: Connection restored to 8e76d51f-ea3e-8e4f-f510-31ff9cb5593e (at 10.141.2.12@o2ib417) [1238681.614026] Lustre: Skipped 377 previous similar messages [1238778.303183] Lustre: MGS: Connection restored to 2331a165-6364-a9ca-8de7-212d145b8fa8 (at 10.151.56.51@o2ib) [1238778.303195] Lustre: Skipped 39 previous similar messages [1238930.676115] Lustre: MGS: Connection restored to e093138f-3cb2-c2da-1092-853ea0dcb2ec (at 10.151.37.106@o2ib) [1238930.676120] Lustre: Skipped 1 previous similar message [1239478.471522] Lustre: MGS: Connection restored to 25c9cb27-29ed-5e91-274c-831c5a1dae2a (at 10.151.19.40@o2ib) [1239478.471528] Lustre: Skipped 3 previous similar messages [1239792.557467] Lustre: nbp8-MDT0000: haven't heard from client 9727839d-a331-9a03-519f-0e6e73e48453 (at 10.151.4.71@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a38cc17400, cur 1591921874 expire 1591921724 last 1591921647 [1239792.629874] Lustre: Skipped 1 previous similar message [1239901.471478] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1239901.505263] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.4.71@o2ib (335): c: 30, oc: 0, rc: 32 [1240122.479576] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1240122.513358] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.30.18@o2ib (303): c: 32, oc: 0, rc: 32 [1240209.247842] Lustre: MGS: Connection restored to ed066bd9-0bab-4267-6106-18de508c016d (at 10.149.9.34@o2ib313) [1240209.247847] Lustre: Skipped 495 previous similar messages [1240865.312435] Lustre: MGS: Connection restored to 1bad2a1c-71b2-6d68-8275-de185a0bd104 (at 10.149.12.95@o2ib313) [1240865.312440] Lustre: Skipped 65 previous similar messages [1241382.525712] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1241382.559499] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.31.160@o2ib (268): c: 32, oc: 0, rc: 32 [1241778.379381] Lustre: MGS: Connection restored to 5cc284fe-7fb7-a495-a4f9-4c923c0dfa1c (at 10.151.7.46@o2ib) [1241778.379387] Lustre: Skipped 259 previous similar messages [1242393.478946] Lustre: MGS: Connection restored to fda8e5ee-e754-938d-ae05-838c5769e465 (at 10.151.54.121@o2ib) [1242393.478952] Lustre: Skipped 249 previous similar messages [1242964.583773] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1242964.617565] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.36.126@o2ib (282): c: 32, oc: 0, rc: 32 [1242965.583711] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1242965.617480] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.36.127@o2ib (293): c: 32, oc: 0, rc: 32 [1243076.056474] Lustre: MGS: Connection restored to 437c7112-ab63-f565-4d81-de661cfb5e4f (at 10.151.46.145@o2ib) [1243076.056479] Lustre: Skipped 231 previous similar messages [1243684.617882] Lustre: MGS: Connection restored to ff30373f-cce8-8ca6-bc95-d35fb99246e8 (at 10.149.9.28@o2ib313) [1243684.617888] Lustre: Skipped 143 previous similar messages [1244559.259025] Lustre: MGS: Connection restored to 02a8565c-c940-8e45-bb6d-351f8d9a71cd (at 10.149.15.35@o2ib313) [1244559.259031] Lustre: Skipped 33 previous similar messages [1245212.353003] Lustre: MGS: Connection restored to b845d414-afd1-9ee4-341e-b208a2671f7a (at 10.151.24.170@o2ib) [1245212.353008] Lustre: Skipped 27 previous similar messages [1245219.666419] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1245219.700200] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.14.167@o2ib (280): c: 32, oc: 0, rc: 32 [1245957.965976] Lustre: MGS: Connection restored to ca5085de-db70-1a7a-8e10-15b78079ff89 (at 10.151.37.191@o2ib) [1245957.965982] Lustre: Skipped 157 previous similar messages [1246831.666834] Lustre: MGS: Connection restored to 1da6cc0c-9c54-1268-ba44-fdacb9d7a6fb (at 10.149.14.35@o2ib313) [1246831.666839] Lustre: Skipped 103 previous similar messages [1247629.777209] Lustre: MGS: Connection restored to 47f579d7-8d88-9057-ddbd-82991a61a86d (at 10.151.23.86@o2ib) [1247629.777215] Lustre: Skipped 453 previous similar messages [1248297.746019] Lustre: MGS: Connection restored to f506748f-fc30-8318-3370-ec13c1c7c70a (at 10.149.9.0@o2ib313) [1248297.746025] Lustre: Skipped 277 previous similar messages [1248453.785239] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1248453.819020] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.28.122@o2ib (295): c: 32, oc: 0, rc: 32 [1248978.129629] Lustre: MGS: Connection restored to cca92246-0d19-973f-17c5-998925877e3a (at 10.149.5.126@o2ib313) [1248978.129635] Lustre: Skipped 239 previous similar messages [1249202.812910] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1249202.846680] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.37.226@o2ib (301): c: 32, oc: 0, rc: 32 [1249607.969936] Lustre: MGS: Connection restored to cbd2650c-00f5-9263-b013-63f10410dfae (at 10.151.44.172@o2ib) [1249607.969942] Lustre: Skipped 153 previous similar messages [1250459.329798] Lustre: MGS: Connection restored to 52f47afb-86af-9fd2-2bae-c9124eaa3536 (at 10.151.56.17@o2ib) [1250459.329804] Lustre: Skipped 13 previous similar messages [1250566.952769] Lustre: nbp8-MDT0000: haven't heard from client 3d598bfe-394b-80c9-d48e-05109a36fb66 (at 10.151.30.214@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899f442d8400, cur 1591932648 expire 1591932498 last 1591932421 [1250567.025749] Lustre: Skipped 1 previous similar message [1250674.867125] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [1250674.900919] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.30.214@o2ib (334): c: 30, oc: 0, rc: 32 [1251158.885012] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1251158.918771] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.35.206@o2ib (291): c: 32, oc: 0, rc: 32 [1251230.108340] Lustre: MGS: Connection restored to 7e3ab9b2-8eef-06ef-5ddc-65fabd3767a2 (at 10.151.28.97@o2ib) [1251230.108346] Lustre: Skipped 101 previous similar messages [1252000.845257] Lustre: MGS: Connection restored to fd3edfc0-ca0c-e650-9cb2-d21373c10fad (at 10.151.9.37@o2ib) [1252000.845262] Lustre: Skipped 9 previous similar messages [1252319.927709] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1252319.961497] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.10.140@o2ib (303): c: 32, oc: 0, rc: 32 [1252608.809492] Lustre: MGS: Connection restored to fc731f24-ba66-6942-91cd-63e30251407c (at 10.151.14.51@o2ib) [1252608.809498] Lustre: Skipped 47 previous similar messages [1253282.359985] Lustre: MGS: Connection restored to defe6612-f030-95ba-d398-766f955a4501 (at 10.151.56.62@o2ib) [1253282.359991] Lustre: Skipped 25 previous similar messages [1253884.671344] Lustre: MGS: Connection restored to 0aec98a6-da1c-d0f1-5c0c-e6671b3c19ba (at 10.151.11.17@o2ib) [1253884.671350] Lustre: Skipped 149 previous similar messages [1254166.995966] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1254167.029760] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.10.98@o2ib (303): c: 32, oc: 0, rc: 32 [1254593.933825] Lustre: MGS: Connection restored to 785fa531-43c5-070a-972d-2ccef6fa78ec (at 10.151.10.192@o2ib) [1254593.933830] Lustre: Skipped 101 previous similar messages [1255279.785306] Lustre: MGS: Connection restored to 7e3ab9b2-8eef-06ef-5ddc-65fabd3767a2 (at 10.151.28.97@o2ib) [1255279.785311] Lustre: Skipped 357 previous similar messages [1255759.054725] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1255759.088499] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.54.98@o2ib (281): c: 32, oc: 0, rc: 32 [1255995.651721] Lustre: MGS: Connection restored to 96effde2-2310-ca44-efec-8594e43a18f2 (at 10.151.11.94@o2ib) [1255995.651726] Lustre: Skipped 1 previous similar message [1256053.065538] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [1256053.099324] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.45.161@o2ib (295): c: 32, oc: 0, rc: 32 [1256617.248907] Lustre: MGS: Connection restored to 53be8f65-513a-6a45-7eae-92dfddc9fd64 (at 10.151.4.186@o2ib) [1256617.248913] Lustre: Skipped 167 previous similar messages [1257226.426583] Lustre: MGS: Connection restored to 17a30872-899a-b60a-7234-141d8f573ba6 (at 10.151.56.53@o2ib) [1257226.426588] Lustre: Skipped 169 previous similar messages [1257853.270714] Lustre: MGS: Connection restored to a7816ba7-2143-6724-aa6b-3357de8a8cc2 (at 10.151.31.53@o2ib) [1257853.270719] Lustre: Skipped 345 previous similar messages [1257982.649326] LNet: 45706:0:(o2iblnd_cb.c:2602:kiblnd_passive_connect()) Conn stale 10.151.37.135@o2ib version 12/12 incarnation 1589208268426647/1591939997299542 [1258466.966455] Lustre: MGS: Connection restored to 60251d0a-03ed-3d84-6031-c3e7f4175eee (at 10.151.4.153@o2ib) [1258466.966461] Lustre: Skipped 213 previous similar messages [1259327.696868] Lustre: MGS: Connection restored to 7e3ab9b2-8eef-06ef-5ddc-65fabd3767a2 (at 10.151.28.97@o2ib) [1259327.696874] Lustre: Skipped 223 previous similar messages [1259518.193443] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1259518.227230] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.30.124@o2ib (303): c: 32, oc: 0, rc: 32 [1259531.282910] Lustre: nbp8-MDT0000: haven't heard from client 3fbf0f74-e661-670d-2e2f-81f47c3b6de5 (at 10.151.26.3@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a3a66fcc00, cur 1591941612 expire 1591941462 last 1591941385 [1259531.355324] Lustre: Skipped 1 previous similar message [1259981.469071] Lustre: MGS: Connection restored to 57df24cf-2516-27b2-e386-5b9c8b97e6bf (at 10.151.19.208@o2ib) [1259981.469077] Lustre: Skipped 5 previous similar messages [1260056.214377] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1260056.248159] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.26.3@o2ib (303): c: 30, oc: 0, rc: 32 [1260830.696591] Lustre: MGS: Connection restored to 6e181442-c786-dad8-5372-f85caf7543a1 (at 10.149.14.43@o2ib313) [1260830.696597] Lustre: Skipped 43 previous similar messages [1261644.330193] Lustre: MGS: Connection restored to e4f417d7-1bd3-6aa3-f9e5-f7a6d62a24a9 (at 10.149.6.63@o2ib313) [1261644.330198] Lustre: Skipped 203 previous similar messages [1262225.293133] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1262225.326922] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.55.149@o2ib (294): c: 32, oc: 0, rc: 32 [1262228.293240] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1262228.327033] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.55.155@o2ib (290): c: 32, oc: 0, rc: 32 [1262318.562296] Lustre: MGS: Connection restored to f11c50f6-88f2-487f-9e73-6869d416e7a5 (at 10.151.4.47@o2ib) [1262318.562302] Lustre: Skipped 171 previous similar messages [1262971.413029] Lustre: MGS: haven't heard from client 272c0dc7-273e-4e83-5161-616a39d85af2 (at 10.151.4.100@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a034374800, cur 1591945052 expire 1591944902 last 1591944825 [1262971.483166] Lustre: Skipped 1 previous similar message [1263063.323862] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1263063.357632] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.7.74@o2ib (303): c: 30, oc: 0, rc: 32 [1263064.323833] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1263064.357615] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.11.162@o2ib (302): c: 30, oc: 0, rc: 32 [1263066.323897] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1263066.357687] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 6 previous similar messages [1263066.391466] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.11.166@o2ib (305): c: 30, oc: 0, rc: 32 [1263066.432665] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 6 previous similar messages [1263069.324089] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1263069.357890] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 8 previous similar messages [1263069.391655] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.10.25@o2ib (308): c: 30, oc: 0, rc: 32 [1263069.432571] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 8 previous similar messages [1263074.324207] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1263074.357991] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 3 previous similar messages [1263074.391769] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.6.48@o2ib (323): c: 30, oc: 0, rc: 32 [1263074.432404] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 3 previous similar messages [1263083.324550] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1263083.358336] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 17 previous similar messages [1263083.392391] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.8.160@o2ib (324): c: 30, oc: 0, rc: 32 [1263083.433304] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 17 previous similar messages [1263100.325174] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1263100.358975] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 22 previous similar messages [1263100.393042] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.9.241@o2ib (341): c: 30, oc: 0, rc: 32 [1263100.433956] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 22 previous similar messages [1263224.576237] Lustre: MGS: Connection restored to b006db7d-713d-dee8-37a5-b5d5dfba7dad (at 10.151.32.69@o2ib) [1263224.576243] Lustre: Skipped 247 previous similar messages [1263781.328292] Process accounting resumed [1264355.674472] Lustre: MGS: Connection restored to b261ef7a-f993-e2cf-4b84-80af52320e47 (at 10.151.11.239@o2ib) [1264355.674478] Lustre: Skipped 51 previous similar messages [1265107.793260] Lustre: MGS: Connection restored to e8f4e8a8-72a7-0b09-4f97-fe721a271298 (at 10.151.7.84@o2ib) [1265107.793266] Lustre: Skipped 67 previous similar messages [1265861.263634] Lustre: MGS: Connection restored to eda7d6a7-d16c-76eb-61d6-71eb61832f80 (at 10.149.10.68@o2ib313) [1265861.263640] Lustre: Skipped 533 previous similar messages [1266548.021746] Lustre: MGS: Connection restored to c8bb7d45-95f5-bcdf-78d7-b1a9918d2830 (at 10.151.16.44@o2ib) [1266548.021752] Lustre: Skipped 565 previous similar messages [1266660.545498] Lustre: MGS: haven't heard from client 2cd3086e-144e-29c4-fc0b-85b448db9169 (at 10.153.17.232@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897753638800, cur 1591948741 expire 1591948591 last 1591948514 [1266660.616766] Lustre: Skipped 155 previous similar messages [1267148.773108] Lustre: MGS: Connection restored to afecfc2a-cd22-1e7b-3e35-563b8b7cebf8 (at 10.151.9.228@o2ib) [1267148.773113] Lustre: Skipped 13 previous similar messages [1268193.389901] Lustre: MGS: Connection restored to 4c4dcf10-fbfd-77cf-28fc-25a3837530f8 (at 10.151.24.127@o2ib) [1268193.389907] Lustre: Skipped 97 previous similar messages [1268467.610472] Lustre: nbp8-MDT0000: haven't heard from client 732de7f3-9df2-cc01-21ad-541b5397c4b2 (at 10.151.19.10@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89787f13b400, cur 1591950548 expire 1591950398 last 1591950321 [1268467.683172] Lustre: Skipped 1 previous similar message [1268481.366032] LNet: 32505:0:(o2iblnd_cb.c:2602:kiblnd_passive_connect()) Conn stale 10.151.19.10@o2ib version 12/12 incarnation 1591658101109332/1591950489416287 [1268549.524401] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1268549.558195] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 15 previous similar messages [1268549.592268] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.24.128@o2ib (304): c: 30, oc: 0, rc: 32 [1268549.633477] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 15 previous similar messages [1268555.524635] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1268555.558421] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.46.164@o2ib (308): c: 30, oc: 0, rc: 32 [1268564.524948] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1268564.558741] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 2 previous similar messages [1268564.592527] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.47.27@o2ib (319): c: 30, oc: 0, rc: 32 [1268564.633447] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 2 previous similar messages [1268937.657862] Lustre: MGS: Connection restored to d69ea688-11a3-8115-a352-8054c6fde02b (at 10.151.37.45@o2ib) [1268937.657868] Lustre: Skipped 37 previous similar messages [1269572.626862] Lustre: MGS: Connection restored to e2de0369-2e01-cd45-754b-9ea34749f1b5 (at 10.151.16.197@o2ib) [1269572.626868] Lustre: Skipped 83 previous similar messages [1270489.490316] Lustre: MGS: Connection restored to 90b9b506-24a3-e4a9-280b-1f29f30d9cf6 (at 10.151.8.179@o2ib) [1270489.490322] Lustre: Skipped 101 previous similar messages [1271206.851591] Lustre: MGS: Connection restored to aee43e9a-0a22-e194-3845-f8c672b3c201 (at 10.151.6.61@o2ib) [1271206.851596] Lustre: Skipped 57 previous similar messages [1272163.003374] Lustre: MGS: Connection restored to 0c6ea744-2978-b9bb-43d2-389fb11858b6 (at 10.151.56.10@o2ib) [1272163.003379] Lustre: Skipped 77 previous similar messages [1272881.445311] Lustre: MGS: Connection restored to 49a43936-9b6b-bb2d-d83f-952361430429 (at 10.151.36.173@o2ib) [1272881.445317] Lustre: Skipped 99 previous similar messages [1273514.797778] Lustre: MGS: Connection restored to 0aa39863-1a34-d380-cb8f-52f5a87dfae9 (at 10.151.55.168@o2ib) [1273514.797784] Lustre: Skipped 49 previous similar messages [1274933.419906] Lustre: MGS: Connection restored to 12394f56-8a7a-719e-80d6-892e55f84326 (at 10.149.8.16@o2ib313) [1274933.419912] Lustre: Skipped 33 previous similar messages [1275503.438682] Lustre: MGS: Connection restored to 7e3ab9b2-8eef-06ef-5ddc-65fabd3767a2 (at 10.151.28.97@o2ib) [1275503.438688] Lustre: Skipped 41 previous similar messages [1276779.053005] Lustre: MGS: Connection restored to 0a898fd9-4e1b-db79-ef57-41202b3cea60 (at 10.141.5.98@o2ib417) [1276779.053010] Lustre: Skipped 1 previous similar message [1277031.226822] Lustre: MGS: Connection restored to a9ecfd92-ae50-2b6a-d6e0-8cc4626caa18 (at 10.149.1.208@o2ib313) [1277031.226827] Lustre: Skipped 19 previous similar messages [1277093.050887] Lustre: MGS: Connection restored to f916e044-b548-c8e2-917b-ec2f0f5588e8 (at 10.149.1.232@o2ib313) [1277093.050893] Lustre: Skipped 1 previous similar message [1277270.080865] Lustre: MGS: Connection restored to da2b994c-4108-732a-fe74-ffd6e18b6d78 (at 10.151.30.34@o2ib) [1277270.080871] Lustre: Skipped 85 previous similar messages [1277576.228289] Lustre: MGS: Connection restored to e2121bb8-8ab5-7ec8-fd53-0d6856e6a93f (at 10.149.2.122@o2ib313) [1277576.228295] Lustre: Skipped 85 previous similar messages [1277886.170125] Lustre: MGS: Connection restored to ad4dc9c0-6085-72da-71fd-bb0443510a94 (at 10.149.8.226@o2ib313) [1277886.170130] Lustre: Skipped 365 previous similar messages [1278510.824988] Lustre: MGS: Connection restored to 5453029d-3bac-afc7-521c-fcd60ee38d38 (at 10.141.3.48@o2ib417) [1278510.824994] Lustre: Skipped 181 previous similar messages [1279130.050350] Lustre: MGS: Connection restored to d55c6679-e7b7-71e9-2504-8949b23368a9 (at 10.151.10.76@o2ib) [1279130.050354] Lustre: Skipped 113 previous similar messages [1280080.194311] Lustre: MGS: Connection restored to 86a1022b-c98a-e891-bfd6-b4e089dfe366 (at 10.149.10.65@o2ib313) [1280080.194317] Lustre: Skipped 49 previous similar messages [1280843.535452] Lustre: MGS: Connection restored to fd12613c-a05f-d922-8cce-4cc04aefc3bb (at 10.149.15.33@o2ib313) [1280843.535458] Lustre: Skipped 15 previous similar messages [1281446.209863] Lustre: MGS: Connection restored to cdce4b45-2a9d-b2d3-e51e-00ff93434056 (at 10.151.10.35@o2ib) [1281446.209869] Lustre: Skipped 301 previous similar messages [1282072.529112] Lustre: MGS: Connection restored to 8d0d2229-a082-5e48-3be4-8d9a8d475817 (at 10.151.56.38@o2ib) [1282072.529118] Lustre: Skipped 97 previous similar messages [1282688.995320] Lustre: MGS: Connection restored to f71a07a6-ad4a-6919-4845-c052916b77a8 (at 10.151.19.19@o2ib) [1282688.995326] Lustre: Skipped 13 previous similar messages [1283609.071009] Lustre: MGS: Connection restored to 7e3ab9b2-8eef-06ef-5ddc-65fabd3767a2 (at 10.151.28.97@o2ib) [1283609.071015] Lustre: Skipped 193 previous similar messages [1284384.377382] Lustre: MGS: Connection restored to bd9ba461-200b-1442-d888-cad91bededbd (at 10.149.6.51@o2ib313) [1284384.377388] Lustre: Skipped 61 previous similar messages [1285194.618006] Lustre: MGS: Connection restored to 82142305-0d4a-e4a7-ee3e-2e4e40389030 (at 10.151.5.34@o2ib) [1285194.618011] Lustre: Skipped 107 previous similar messages [1285632.153427] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1285632.187229] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 3 previous similar messages [1285632.221001] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.34.120@o2ib (304): c: 32, oc: 0, rc: 32 [1285632.262210] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 3 previous similar messages [1285865.501726] Lustre: MGS: Connection restored to 93f465bd-d702-ea6c-4fa1-07c3b441193f (at 10.151.53.75@o2ib) [1285865.501731] Lustre: Skipped 173 previous similar messages [1286574.114626] Lustre: MGS: Connection restored to 16275719-7362-66f5-9a42-bed74520ba23 (at 10.151.54.136@o2ib) [1286574.114632] Lustre: Skipped 357 previous similar messages [1287217.855083] Lustre: MGS: Connection restored to cd9f5758-b185-83d6-0394-bc7dfed38c74 (at 10.151.8.70@o2ib) [1287217.855087] Lustre: Skipped 39 previous similar messages [1288014.928328] Lustre: MGS: Connection restored to 80d95b86-8ad6-d170-a831-231d3a953fce (at 10.151.36.195@o2ib) [1288014.928334] Lustre: Skipped 203 previous similar messages [1288616.913968] Lustre: MGS: Connection restored to 7e881fb8-6801-f214-3e61-405071649556 (at 10.151.19.13@o2ib) [1288616.913974] Lustre: Skipped 209 previous similar messages [1289223.876045] Lustre: MGS: Connection restored to ebca7ceb-0008-cfc0-99f8-3099bda6b90c (at 10.141.3.212@o2ib417) [1289223.876051] Lustre: Skipped 219 previous similar messages [1289920.286829] Lustre: MGS: Connection restored to 35ef73e1-31c7-74fb-320d-99d9949fd333 (at 10.151.0.144@o2ib) [1289920.286834] Lustre: Skipped 33 previous similar messages [1291106.427164] Lustre: MGS: Connection restored to 3ed0dc6a-c93c-8584-0104-69a87d67ba42 (at 10.151.47.99@o2ib) [1291106.427170] Lustre: Skipped 131 previous similar messages [1291747.434058] Lustre: MGS: Connection restored to 9e9d8e71-e32f-5b4c-aaa4-0b8482e4fc47 (at 10.151.9.91@o2ib) [1291747.434064] Lustre: Skipped 431 previous similar messages [1292620.380400] Lustre: MGS: Connection restored to 7d60999e-ee95-0d24-14ab-c6b6483506a6 (at 10.149.14.106@o2ib313) [1292620.380406] Lustre: Skipped 41 previous similar messages [1293766.034224] Lustre: MGS: Connection restored to b7df03de-38f0-4b7e-c627-60fd8e03b250 (at 10.151.52.177@o2ib) [1293766.034229] Lustre: Skipped 235 previous similar messages [1294370.013294] Lustre: MGS: Connection restored to f707eab3-7a12-b4b5-9989-021b0c16d912 (at 10.141.2.23@o2ib417) [1294370.013300] Lustre: Skipped 223 previous similar messages [1295012.464446] Lustre: MGS: Connection restored to c0f11bb6-96ec-d4d7-d30a-dc9211d8ae95 (at 10.151.44.73@o2ib) [1295012.464451] Lustre: Skipped 315 previous similar messages [1295618.330102] Lustre: MGS: Connection restored to a7e107f2-f059-aa06-cdba-27cb268c181c (at 10.151.16.157@o2ib) [1295618.330108] Lustre: Skipped 73 previous similar messages [1296275.604425] Lustre: MGS: Connection restored to 205df9bd-3934-c435-c62f-6f33386e2922 (at 10.141.3.207@o2ib417) [1296275.604430] Lustre: Skipped 45 previous similar messages [1296940.205112] Lustre: MGS: Connection restored to 02b0a0cb-6761-3a8c-b65e-92c71336cd2a (at 10.151.45.107@o2ib) [1296940.205118] Lustre: Skipped 83 previous similar messages [1297590.786375] Lustre: MGS: Connection restored to 422519c4-fa22-29a9-fe52-aec3f57066f4 (at 10.151.29.231@o2ib) [1297590.786381] Lustre: Skipped 189 previous similar messages [1298265.157304] Lustre: MGS: Connection restored to 2df1234a-09c4-e50e-79fd-6cdc094f4292 (at 10.141.6.160@o2ib417) [1298265.157310] Lustre: Skipped 451 previous similar messages [1299345.206669] Lustre: MGS: Connection restored to 90c81a2b-403e-bbfb-f39d-d84aac56d439 (at 10.151.11.179@o2ib) [1299345.206675] Lustre: Skipped 381 previous similar messages [1299962.679853] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1299962.713636] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.39.229@o2ib (303): c: 32, oc: 0, rc: 32 [1299969.681092] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [1299969.714865] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.39.243@o2ib (303): c: 32, oc: 0, rc: 32 [1299981.680523] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1299981.714310] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.40.213@o2ib (243): c: 32, oc: 0, rc: 32 [1300086.811011] Lustre: MGS: Connection restored to 747bada8-f866-68bc-39da-29305e40ced5 (at 10.151.51.137@o2ib) [1300086.811016] Lustre: Skipped 105 previous similar messages [1300766.266986] Lustre: MGS: Connection restored to 8f0c29d9-7b29-18a3-7f97-aebe0a163f55 (at 10.151.6.10@o2ib) [1300766.266992] Lustre: Skipped 183 previous similar messages [1301435.133338] Lustre: MGS: Connection restored to 842e253b-125a-e145-8e52-38fb5063f79f (at 10.151.2.35@o2ib) [1301435.133343] Lustre: Skipped 63 previous similar messages [1301496.737249] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1301496.771033] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.38.18@o2ib (299): c: 32, oc: 0, rc: 32 [1302170.857795] Lustre: MGS: Connection restored to 0c8ae8d7-561c-2268-8c3c-b2d3081c9624 (at 10.149.1.3@o2ib313) [1302170.857801] Lustre: Skipped 63 previous similar messages [1302789.223055] Lustre: MGS: Connection restored to a141b8d5-a3d7-12a1-064e-ade3828a7a05 (at 10.151.51.165@o2ib) [1302789.223061] Lustre: Skipped 67 previous similar messages [1303412.003300] Lustre: MGS: Connection restored to 9bc59d73-043a-40bc-4ba1-6f7f11eb89ba (at 10.151.18.128@o2ib) [1303412.003306] Lustre: Skipped 413 previous similar messages [1304034.202983] Lustre: MGS: Connection restored to 68fd668c-0a43-520f-6b11-cce831dfd469 (at 10.151.55.164@o2ib) [1304034.202989] Lustre: Skipped 251 previous similar messages [1304649.741180] Lustre: MGS: Connection restored to 116bddf6-e565-feb5-97b0-0b23eeef491d (at 10.151.3.182@o2ib) [1304649.741195] Lustre: Skipped 165 previous similar messages [1305273.753436] Lustre: MGS: Connection restored to 8cd2a015-bddd-7fa9-d40d-b6dc54760a1f (at 10.149.9.200@o2ib313) [1305273.753442] Lustre: Skipped 275 previous similar messages [1305344.877779] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1305344.911564] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.6.49@o2ib (303): c: 32, oc: 0, rc: 32 [1305874.012579] Lustre: nbp8-MDT0000: Connection restored to ffe143ab-f84f-6197-8560-56bf2514d7f5 (at 10.151.0.184@o2ib) [1305874.012584] Lustre: Skipped 754 previous similar messages [1306477.060052] Lustre: MGS: Connection restored to e8f4e8a8-72a7-0b09-4f97-fe721a271298 (at 10.151.7.84@o2ib) [1306477.060058] Lustre: Skipped 236 previous similar messages [1307098.266273] Lustre: MGS: Connection restored to b3cb8e72-6f1e-8167-bbc6-4f634340b390 (at 10.149.9.44@o2ib313) [1307098.266279] Lustre: Skipped 199 previous similar messages [1307859.206833] Lustre: MGS: Connection restored to 7e3ab9b2-8eef-06ef-5ddc-65fabd3767a2 (at 10.151.28.97@o2ib) [1307859.206837] Lustre: Skipped 163 previous similar messages [1308608.107002] Lustre: MGS: Connection restored to ebca7ceb-0008-cfc0-99f8-3099bda6b90c (at 10.141.3.212@o2ib417) [1308608.107008] Lustre: Skipped 369 previous similar messages [1308962.010768] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1308962.044554] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.28.136@o2ib (284): c: 32, oc: 0, rc: 32 [1309321.943783] Lustre: MGS: Connection restored to 9ad7a368-726b-33a9-640c-18d74b1de994 (at 10.151.24.141@o2ib) [1309321.943789] Lustre: Skipped 181 previous similar messages [1309969.456800] Lustre: MGS: Connection restored to 25235a61-d976-4763-5fef-4be1eb893064 (at 10.151.8.219@o2ib) [1309969.456806] Lustre: Skipped 143 previous similar messages [1310713.715685] Lustre: MGS: Connection restored to 57bcbe81-cf30-f42b-910a-ca548ccb4bab (at 10.151.49.129@o2ib) [1310713.715691] Lustre: Skipped 457 previous similar messages [1311524.352109] Lustre: MGS: Connection restored to d6281c34-b77f-f04e-2c6b-ec69a56c7edf (at 10.151.29.211@o2ib) [1311524.352115] Lustre: Skipped 89 previous similar messages [1312227.942224] Lustre: MGS: Connection restored to 51aa82c9-6d50-19b4-ae0e-9086e1192334 (at 10.153.10.17@o2ib233) [1312227.942230] Lustre: Skipped 85 previous similar messages [1312918.081590] Lustre: MGS: Connection restored to 9ce536bb-365b-df8e-5bea-749572f7b693 (at 10.151.44.136@o2ib) [1312918.081595] Lustre: Skipped 663 previous similar messages [1313377.264578] Lustre: nbp8-MDT0000: haven't heard from client 90efb129-e945-5f19-730b-8563570c94b7 (at 10.153.12.197@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89915d7ae400, cur 1591995456 expire 1591995306 last 1591995229 [1313377.338417] Lustre: Skipped 19 previous similar messages [1313533.061365] Lustre: MGS: Connection restored to 4fdf7bb7-8295-87cd-27c4-05c7a6a40ef4 (at 10.151.34.193@o2ib) [1313533.061371] Lustre: Skipped 7 previous similar messages [1314137.527656] Lustre: MGS: Connection restored to 47559b20-a6cd-30f6-1d4c-fdbc4a71cbbb (at 10.149.7.237@o2ib313) [1314137.527662] Lustre: Skipped 99 previous similar messages [1314813.856312] Lustre: MGS: Connection restored to 1dd13078-6736-c106-cb46-7bc6e9c83d9f (at 10.149.4.0@o2ib313) [1314813.856318] Lustre: Skipped 139 previous similar messages [1315441.517318] Lustre: MGS: Connection restored to 9ad7a368-726b-33a9-640c-18d74b1de994 (at 10.151.24.141@o2ib) [1315441.517324] Lustre: Skipped 149 previous similar messages [1316313.441682] Lustre: MGS: Connection restored to ac5c2c6e-abc5-aceb-662c-9aaa057abb00 (at 10.151.54.165@o2ib) [1316313.441687] Lustre: Skipped 85 previous similar messages [1316967.053200] Lustre: MGS: Connection restored to 5900e68c-26df-1936-feac-68df729bacf5 (at 10.151.46.17@o2ib) [1316967.053206] Lustre: Skipped 207 previous similar messages [1317734.066332] Lustre: MGS: Connection restored to 7c0009e0-0049-dae3-8b5b-cfa804b992c6 (at 10.149.3.98@o2ib313) [1317734.066337] Lustre: Skipped 39 previous similar messages [1317932.341085] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1317932.374860] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.8.169@o2ib (236): c: 32, oc: 0, rc: 32 [1318354.736231] Lustre: MGS: Connection restored to 0b835169-08b1-029f-032d-33c02d3fde7b (at 10.151.19.194@o2ib) [1318354.736236] Lustre: Skipped 137 previous similar messages [1318963.721087] Lustre: MGS: Connection restored to 0773b5f7-08d7-b1cf-ba3d-7ce46d84f19b (at 10.149.10.8@o2ib313) [1318963.721092] Lustre: Skipped 79 previous similar messages [1319597.554939] Lustre: MGS: Connection restored to 90330827-a430-0e8e-54f1-f10454b281e7 (at 10.151.29.186@o2ib) [1319597.554945] Lustre: Skipped 75 previous similar messages [1320250.418367] Lustre: MGS: Connection restored to 4f429eaf-a926-a646-113f-9553ddc8dbb5 (at 10.151.23.205@o2ib) [1320250.418373] Lustre: Skipped 401 previous similar messages [1320882.152246] Lustre: MGS: Connection restored to 22790662-46a4-862d-5a4d-6a765d072d42 (at 10.151.54.132@o2ib) [1320882.152252] Lustre: Skipped 605 previous similar messages [1321492.826050] Lustre: MGS: Connection restored to 8ec3315b-1795-289f-24bd-66acea56aed9 (at 10.151.53.51@o2ib) [1321492.826054] Lustre: Skipped 73 previous similar messages [1322183.454621] Lustre: MGS: Connection restored to d21ed641-4c11-643f-7863-e06c8c928934 (at 10.149.16.19@o2ib313) [1322183.454627] Lustre: Skipped 29 previous similar messages [1322814.483909] Lustre: MGS: Connection restored to a59a26d9-fec9-7d40-3586-58dd9e50dad4 (at 10.151.2.91@o2ib) [1322814.483915] Lustre: Skipped 101 previous similar messages [1323597.359460] Lustre: MGS: Connection restored to 81800497-3eb5-9f68-fe68-e4e5a4924d2f (at 10.151.23.34@o2ib) [1323597.359466] Lustre: Skipped 95 previous similar messages [1324105.658207] Lustre: nbp8-MDT0000: haven't heard from client 1cebd4e0-fc7e-c809-4ebb-a108b3b07198 (at 10.151.31.41@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897587bfd000, cur 1592006184 expire 1592006034 last 1592005957 [1324105.730905] Lustre: Skipped 3 previous similar messages [1324190.571063] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1324190.604851] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.31.29@o2ib (311): c: 30, oc: 0, rc: 32 [1324191.571192] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1324191.604966] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.31.32@o2ib (312): c: 30, oc: 0, rc: 32 [1324196.571293] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1324196.605080] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.31.41@o2ib (317): c: 30, oc: 0, rc: 32 [1324201.571550] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1324201.605344] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [1324201.638836] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.31.52@o2ib (323): c: 30, oc: 0, rc: 32 [1324201.679755] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [1324327.295530] Lustre: MGS: Connection restored to d91cdd0d-9042-40a3-1bc7-8dc1e22cfa8b (at 10.151.38.23@o2ib) [1324327.295535] Lustre: Skipped 541 previous similar messages [1324666.685316] Lustre: nbp8-MDT0000: haven't heard from client deecd0c0-e0a4-2776-ca7d-7ab94a5c2adf (at 10.151.31.32@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a3b34cb400, cur 1592006745 expire 1592006595 last 1592006518 [1324666.758009] Lustre: Skipped 9 previous similar messages [1324745.592595] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [1324745.626381] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.31.29@o2ib (304): c: 30, oc: 0, rc: 32 [1324780.593773] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [1324780.627572] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [1324780.661067] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.38.23@o2ib (309): c: 30, oc: 0, rc: 32 [1324780.701981] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [1324799.593551] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1324799.627311] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 4 previous similar messages [1324799.661085] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.30.90@o2ib (341): c: 30, oc: 0, rc: 32 [1324799.702000] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 4 previous similar messages [1324942.763410] Lustre: MGS: Connection restored to 07ec5ceb-8659-0100-b937-242152951e8c (at 10.151.24.147@o2ib) [1324942.763416] Lustre: Skipped 153 previous similar messages [1325580.966142] Lustre: MGS: Connection restored to a86f3324-ec5f-9c4a-4a63-1cf24473421a (at 10.151.30.244@o2ib) [1325580.966156] Lustre: Skipped 527 previous similar messages [1326191.773547] Lustre: MGS: Connection restored to 2af63f60-d7e1-d4cd-5c6b-7711782ac8de (at 10.149.9.64@o2ib313) [1326191.773553] Lustre: Skipped 279 previous similar messages [1326861.857779] Lustre: MGS: Connection restored to 91bffc85-701c-9d9b-b27f-06523fb9b1e2 (at 10.151.55.161@o2ib) [1326861.857785] Lustre: Skipped 67 previous similar messages [1327463.170095] Lustre: MGS: Connection restored to d11a256e-08c2-3404-266b-746299e25e7f (at 10.151.6.201@o2ib) [1327463.170100] Lustre: Skipped 147 previous similar messages [1328088.933768] Lustre: MGS: Connection restored to 7e3ab9b2-8eef-06ef-5ddc-65fabd3767a2 (at 10.151.28.97@o2ib) [1328088.933773] Lustre: Skipped 177 previous similar messages [1328539.838815] LNet: 41791:0:(o2iblnd_cb.c:2602:kiblnd_passive_connect()) Conn stale 10.151.37.214@o2ib version 12/12 incarnation 1591790110184767/1592010556676894 [1328539.886648] LNet: 41791:0:(o2iblnd_cb.c:2602:kiblnd_passive_connect()) Skipped 1 previous similar message [1328550.821511] Lustre: nbp8-MDT0000: haven't heard from client d0dd0457-6224-73c7-bf08-89b479d1e59f (at 10.151.37.214@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897dc654ac00, cur 1592010629 expire 1592010479 last 1592010402 [1328550.894492] Lustre: Skipped 15 previous similar messages [1328689.367248] Lustre: MGS: Connection restored to 0e2edb1a-3783-2a2c-f67b-475a59575691 (at 10.151.2.31@o2ib) [1328689.367253] Lustre: Skipped 1629 previous similar messages [1329306.925116] Lustre: MGS: Connection restored to f0b6cfba-d913-5dc3-7549-339105906008 (at 10.151.45.99@o2ib) [1329306.925121] Lustre: Skipped 103 previous similar messages [1330005.693119] Lustre: MGS: Connection restored to 2ece8ad5-12c9-70e7-78a0-0e660947cd49 (at 10.151.13.178@o2ib) [1330005.693125] Lustre: Skipped 565 previous similar messages [1330122.879163] Lustre: nbp8-MDT0000: haven't heard from client 84d63d54-7a27-363c-3dcb-1f8f35def1db (at 10.151.59.207@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a406d87400, cur 1592012201 expire 1592012051 last 1592011974 [1330122.952128] Lustre: Skipped 3 previous similar messages [1330127.881340] Lustre: MGS: haven't heard from client 13d6f54b-7d18-36cf-5f0e-706cef56e93b (at 10.151.56.204@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897fb491c000, cur 1592012206 expire 1592012056 last 1592011979 [1330127.951729] Lustre: Skipped 1 previous similar message [1330204.792224] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1330204.826010] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.56.204@o2ib (302): c: 31, oc: 0, rc: 32 [1330236.794421] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [1330236.828209] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.59.207@o2ib (333): c: 31, oc: 0, rc: 32 [1330544.805714] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [1330544.839499] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.43.71@o2ib (274): c: 32, oc: 0, rc: 32 [1330677.767405] Lustre: MGS: Connection restored to 95fad7ae-0c8a-93d8-ab9b-d66a57168f9d (at 10.151.29.205@o2ib) [1330677.767411] Lustre: Skipped 195 previous similar messages [1331396.157031] Lustre: MGS: Connection restored to 7ef5207b-87e4-f4c9-7f55-65fa0302a665 (at 10.151.38.140@o2ib) [1331396.157036] Lustre: Skipped 183 previous similar messages [1332055.650531] Lustre: MGS: Connection restored to 91bffc85-701c-9d9b-b27f-06523fb9b1e2 (at 10.151.55.161@o2ib) [1332055.650537] Lustre: Skipped 175 previous similar messages [1332749.571319] Lustre: MGS: Connection restored to 35fef62e-2f9c-4cbe-fec7-167b43c70a5b (at 10.151.38.143@o2ib) [1332749.571324] Lustre: Skipped 9 previous similar messages [1333485.189735] Lustre: MGS: Connection restored to 1f933efa-9bd9-efae-26e2-47b55d044ef7 (at 10.151.47.184@o2ib) [1333485.189740] Lustre: Skipped 9 previous similar messages [1334225.116210] Lustre: MGS: Connection restored to f60ff9de-2f22-be6a-1fc0-e7b2b8739136 (at 10.151.54.151@o2ib) [1334225.116216] Lustre: Skipped 25 previous similar messages [1334844.264788] Lustre: MGS: Connection restored to 28e1c1ff-409a-1af9-2980-1051d7649810 (at 10.151.6.32@o2ib) [1334844.264793] Lustre: Skipped 67 previous similar messages [1335631.078153] Lustre: MGS: Connection restored to ac5c2c6e-abc5-aceb-662c-9aaa057abb00 (at 10.151.54.165@o2ib) [1335631.078159] Lustre: Skipped 773 previous similar messages [1336284.052672] Lustre: MGS: Connection restored to 37661759-e76a-09c8-9e6b-2d913685e930 (at 10.151.35.31@o2ib) [1336284.052677] Lustre: Skipped 335 previous similar messages [1337086.697443] Lustre: MGS: Connection restored to c81ceb9a-57db-738f-4859-b8577b295a5a (at 10.149.3.97@o2ib313) [1337086.697449] Lustre: Skipped 39 previous similar messages [1337728.068816] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1337728.102593] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.55.154@o2ib (303): c: 32, oc: 0, rc: 32 [1337915.165739] Lustre: nbp8-MDT0000: haven't heard from client a9dbd05c-d18c-fb51-810e-369e4c1a5bfa (at 10.151.31.156@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89827d549c00, cur 1592019993 expire 1592019843 last 1592019766 [1337915.238704] Lustre: Skipped 1 previous similar message [1338041.080279] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1338041.114062] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.31.156@o2ib (352): c: 30, oc: 0, rc: 32 [1338231.053933] Lustre: MGS: Connection restored to c8389b21-ab77-fc80-6fac-b0a1857b9be9 (at 10.151.33.99@o2ib) [1338231.053938] Lustre: Skipped 57 previous similar messages [1338994.317978] Lustre: MGS: Connection restored to b260ae90-6c04-9c95-15cb-c5d3eecf0a79 (at 10.151.48.13@o2ib) [1338994.317984] Lustre: Skipped 365 previous similar messages [1339668.389191] Lustre: MGS: Connection restored to 730b0707-4ba1-b0a3-4aa1-f2eb65832693 (at 10.151.0.131@o2ib) [1339668.389197] Lustre: Skipped 31 previous similar messages [1340013.152978] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1340013.186765] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.54.132@o2ib (293): c: 32, oc: 0, rc: 32 [1340313.423216] Lustre: MGS: Connection restored to a4e6229e-974d-5ae9-ec60-e651a22902fc (at 10.151.14.188@o2ib) [1340313.423221] Lustre: Skipped 445 previous similar messages [1341365.936258] Lustre: MGS: Connection restored to 8a7bbcc7-801d-9288-83ca-475b677dda41 (at 10.151.36.124@o2ib) [1341365.936264] Lustre: Skipped 117 previous similar messages [1341968.485890] Lustre: MGS: Connection restored to ed71334e-02c0-8898-c74d-aed5a51ca31e (at 10.149.6.47@o2ib313) [1341968.485897] Lustre: Skipped 43 previous similar messages [1342579.813793] Lustre: MGS: Connection restored to fd3edfc0-ca0c-e650-9cb2-d21373c10fad (at 10.151.9.37@o2ib) [1342579.813799] Lustre: Skipped 65 previous similar messages [1343184.490927] Lustre: MGS: Connection restored to cef6baf0-a1ac-aee2-2e6c-4dce436d65a9 (at 10.151.8.64@o2ib) [1343184.490933] Lustre: Skipped 181 previous similar messages [1343887.277714] Lustre: MGS: Connection restored to 0e54d02d-a4d0-c998-eaab-ed1ffd952cd6 (at 10.151.38.237@o2ib) [1343887.277720] Lustre: Skipped 99 previous similar messages [1344544.488381] Lustre: MGS: Connection restored to 13ac01cf-05a3-c74f-4c61-30344b0aa07b (at 10.151.46.177@o2ib) [1344544.488387] Lustre: Skipped 49 previous similar messages [1345224.345707] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [1345224.379494] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.54.151@o2ib (301): c: 32, oc: 0, rc: 32 [1345341.826958] Lustre: MGS: Connection restored to 88a9ca71-cc20-82bb-7d87-6ca071ee74c5 (at 10.149.15.190@o2ib313) [1345341.826964] Lustre: Skipped 7 previous similar messages [1346223.779398] Lustre: MGS: Connection restored to 660ba076-5154-32dc-b42f-66b5cd9ac67f (at 10.149.9.222@o2ib313) [1346223.779404] Lustre: Skipped 137 previous similar messages [1346829.386189] Lustre: MGS: Connection restored to 6d5ca307-b728-7c58-8f07-1f601acba2ed (at 10.151.24.21@o2ib) [1346829.386194] Lustre: Skipped 73 previous similar messages [1347745.454358] Lustre: MGS: Connection restored to 51203d9d-13be-9e02-38e4-da5055ebbee4 (at 10.151.38.68@o2ib) [1347745.454365] Lustre: Skipped 29 previous similar messages [1348352.141910] Lustre: MGS: Connection restored to 7e3ab9b2-8eef-06ef-5ddc-65fabd3767a2 (at 10.151.28.97@o2ib) [1348352.141916] Lustre: Skipped 105 previous similar messages [1349164.054479] Lustre: MGS: Connection restored to 04e9d2b1-31a0-a115-00dc-ae8961a2728e (at 10.151.37.25@o2ib) [1349164.054485] Lustre: Skipped 17 previous similar messages [1350164.840187] Lustre: MGS: Connection restored to e8f4e8a8-72a7-0b09-4f97-fe721a271298 (at 10.151.7.84@o2ib) [1350164.840193] Lustre: Skipped 49 previous similar messages [1350185.349273] Process accounting resumed [1350820.470936] Lustre: MGS: Connection restored to 9cad6e28-f656-dded-fc81-8f4ba3fb7be7 (at 10.141.7.49@o2ib417) [1350820.470941] Lustre: Skipped 151 previous similar messages [1351812.290049] Lustre: MGS: Connection restored to 338d5461-eeed-d07c-8126-52a525217570 (at 10.151.14.171@o2ib) [1351812.290054] Lustre: Skipped 141 previous similar messages [1352613.359566] Lustre: MGS: Connection restored to 1f25fbbf-c2a4-4eaa-efe1-7b6e9dd2074e (at 10.151.49.92@o2ib) [1352613.359572] Lustre: Skipped 165 previous similar messages [1353273.608373] Lustre: MGS: Connection restored to 0ef69f83-228b-e68d-b1da-44ad1e3f6056 (at 10.149.8.212@o2ib313) [1353273.608379] Lustre: Skipped 103 previous similar messages [1353985.101519] Lustre: MGS: Connection restored to 94c3239b-c1f8-6598-a1a8-3979ea5de17e (at 10.151.17.47@o2ib) [1353985.101525] Lustre: Skipped 85 previous similar messages [1354679.403363] Lustre: MGS: Connection restored to c3ad4cf8-48d2-d8ff-07d6-6c2c79de41fb (at 10.151.46.175@o2ib) [1354679.403369] Lustre: Skipped 63 previous similar messages [1356331.370332] Lustre: MGS: Connection restored to f760c0c3-cfee-ff98-6787-6511bc51045d (at 10.151.16.136@o2ib) [1356331.370337] Lustre: Skipped 169 previous similar messages [1356447.363088] Lustre: MGS: Connection restored to 7e3ab9b2-8eef-06ef-5ddc-65fabd3767a2 (at 10.151.28.97@o2ib) [1356447.363094] Lustre: Skipped 43 previous similar messages [1357158.166658] Lustre: MGS: Connection restored to 064721c5-bfe1-78cb-6715-f8b67995f990 (at 10.149.9.35@o2ib313) [1357158.166663] Lustre: Skipped 19 previous similar messages [1358199.329643] Lustre: MGS: Connection restored to 13933e69-5f9b-7b7a-2693-fc1bbc1569cb (at 10.151.32.10@o2ib) [1358199.329649] Lustre: Skipped 59 previous similar messages [1358276.305643] Lustre: MGS: Connection restored to d1a8eb99-3df9-8134-e1ae-768c910f8519 (at 10.149.1.78@o2ib313) [1358276.305649] Lustre: Skipped 239 previous similar messages [1358771.385647] Lustre: MGS: Connection restored to 9570049d-2c13-9daf-a0f6-650a73255625 (at 10.149.9.105@o2ib313) [1358771.385653] Lustre: Skipped 47 previous similar messages [1359533.216843] Lustre: MGS: Connection restored to 6ce568a7-e258-71b4-8236-207cc90fa8e3 (at 10.151.11.97@o2ib) [1359533.216849] Lustre: Skipped 73 previous similar messages [1359974.974922] Lustre: nbp8-MDT0000: haven't heard from client 52419bc7-a7f6-1f7b-5876-c3b1af02c991 (at 10.151.10.43@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89960c2ed000, cur 1592042052 expire 1592041902 last 1592041825 [1359975.047612] Lustre: Skipped 1 previous similar message [1360062.853644] Lustre: MGS: Connection restored to b879d486-90b6-e98d-a0f6-7efd1b704fe9 (at 10.151.28.100@o2ib) [1360062.853649] Lustre: Skipped 1 previous similar message [1360064.889064] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1360064.922853] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.12.183@o2ib (305): c: 30, oc: 0, rc: 32 [1360065.889047] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1360065.922849] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.12.184@o2ib (305): c: 30, oc: 0, rc: 32 [1360067.889103] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1360067.922872] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [1360067.956361] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.12.188@o2ib (308): c: 30, oc: 0, rc: 32 [1360067.997566] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [1360070.889207] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1360070.923001] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 2 previous similar messages [1360070.956775] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.12.194@o2ib (312): c: 30, oc: 0, rc: 32 [1360070.997974] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 2 previous similar messages [1360092.890015] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1360092.923799] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 2 previous similar messages [1360092.957573] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.10.43@o2ib (345): c: 30, oc: 0, rc: 32 [1360092.998491] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 2 previous similar messages [1360107.890590] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1360107.924370] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.12.167@o2ib (346): c: 30, oc: 0, rc: 32 [1360437.356006] Lustre: MGS: Connection restored to 1ca6c330-f149-a58f-69f8-4c6c1cbdab5b (at 10.141.2.5@o2ib417) [1360437.356012] Lustre: Skipped 11 previous similar messages [1361035.926199] Lustre: MGS: Connection restored to 047fd9df-a01b-ebf1-b73e-c7d0be607061 (at 10.151.24.146@o2ib) [1361035.926205] Lustre: Skipped 3 previous similar messages [1361258.977802] Lustre: MGS: Connection restored to 199671bb-7b30-93cb-9541-11952b2e9720 (at 10.149.15.227@o2ib313) [1361258.977808] Lustre: Skipped 7 previous similar messages [1361826.953524] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1361826.987317] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 3 previous similar messages [1361827.021096] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.52.132@o2ib (304): c: 32, oc: 0, rc: 32 [1361827.062305] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 3 previous similar messages [1361958.663910] Lustre: MGS: Connection restored to e7211b94-cb22-a945-ee99-be2dfdf9820c (at 10.149.16.54@o2ib313) [1361958.663915] Lustre: Skipped 93 previous similar messages [1362658.914015] Lustre: MGS: Connection restored to f667ac5c-614a-4f58-df05-d1fa1cfb0135 (at 10.149.9.31@o2ib313) [1362658.914028] Lustre: Skipped 129 previous similar messages [1363587.840100] Lustre: MGS: Connection restored to 781aa790-f031-de61-e376-f489a75d8eed (at 10.151.54.118@o2ib) [1363587.840106] Lustre: Skipped 59 previous similar messages [1364317.135264] Lustre: nbp8-MDT0000: haven't heard from client 5aede946-7ee3-cdd1-fda9-49f72facbe54 (at 10.151.38.135@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a3ff6c4000, cur 1592046394 expire 1592046244 last 1592046167 [1364317.208243] Lustre: Skipped 27 previous similar messages [1364430.050234] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [1364430.084028] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.38.135@o2ib (339): c: 30, oc: 0, rc: 32 [1364552.891966] Lustre: MGS: Connection restored to 7e3ab9b2-8eef-06ef-5ddc-65fabd3767a2 (at 10.151.28.97@o2ib) [1364552.891972] Lustre: Skipped 97 previous similar messages [1365153.029484] Lustre: MGS: Connection restored to a344d52c-f494-300c-6f24-f127bfcb5c67 (at 10.151.3.126@o2ib) [1365153.029489] Lustre: Skipped 1 previous similar message [1365784.271344] Lustre: MGS: Connection restored to 3d789c2e-d33e-2d73-9f73-15a6327028f3 (at 10.151.3.122@o2ib) [1365784.271350] Lustre: Skipped 721 previous similar messages [1366404.753104] Lustre: MGS: Connection restored to 041ecd90-5245-4515-2fa8-a752500f34ba (at 10.149.16.41@o2ib313) [1366404.753109] Lustre: Skipped 109 previous similar messages [1366735.134124] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1366735.167911] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.8.204@o2ib (301): c: 32, oc: 0, rc: 32 [1366742.134388] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1366742.168172] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.23.216@o2ib (284): c: 32, oc: 0, rc: 32 [1366784.136043] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1366784.169822] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.10.94@o2ib (303): c: 32, oc: 0, rc: 32 [1366792.136290] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1366792.170076] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [1366792.203564] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.16.190@o2ib (274): c: 32, oc: 0, rc: 32 [1366792.244764] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [1366818.137279] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [1366818.171065] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.5.28@o2ib (303): c: 32, oc: 0, rc: 32 [1367019.643260] Lustre: MGS: Connection restored to 000e2457-e80b-7bac-2e6c-9e9282e25268 (at 10.151.6.77@o2ib) [1367019.643265] Lustre: Skipped 123 previous similar messages [1367253.250600] Lustre: MGS: haven't heard from client ae5f610b-c143-b8ee-9baf-257d8f7c0c37 (at 10.151.10.94@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89681b75a400, cur 1592049330 expire 1592049180 last 1592049103 [1367253.320731] Lustre: Skipped 1 previous similar message [1367340.156388] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1367340.190181] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [1367340.223670] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.10.94@o2ib (312): c: 31, oc: 0, rc: 32 [1367340.264584] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [1367866.170642] Lustre: MGS: Connection restored to 194c025a-3583-a5f3-1733-2048b65d5a76 (at 10.151.19.84@o2ib) [1367866.170648] Lustre: Skipped 49 previous similar messages [1368556.217402] Lustre: MGS: Connection restored to 6c045588-b0d0-7587-e81a-8fb0e4f61d95 (at 10.151.46.42@o2ib) [1368556.217407] Lustre: Skipped 41 previous similar messages [1369480.429252] Lustre: MGS: Connection restored to 89e2c365-fcb2-5a16-8e07-a8c10bd4f953 (at 10.149.14.5@o2ib313) [1369480.429258] Lustre: Skipped 5 previous similar messages [1370102.073653] Lustre: MGS: Connection restored to 62a330f4-eeb4-59ce-5957-5acce799830e (at 10.151.34.98@o2ib) [1370102.073659] Lustre: Skipped 627 previous similar messages [1370979.005698] Lustre: MGS: Connection restored to 87c10dba-06ea-dfc2-bfb5-fe84d0fa66ff (at 10.151.13.63@o2ib) [1370979.005703] Lustre: Skipped 55 previous similar messages [1371804.150819] Lustre: MGS: Connection restored to 194f87cf-2dfd-9e22-a39a-2a1fe7314c52 (at 10.151.35.119@o2ib) [1371804.150825] Lustre: Skipped 23 previous similar messages [1372500.581843] Lustre: MGS: Connection restored to 4fd6ac9c-fef0-b139-533b-162ffaebcd2d (at 10.151.31.212@o2ib) [1372500.581849] Lustre: Skipped 13 previous similar messages [1373630.758077] Lustre: MGS: Connection restored to 00010a90-a9f4-ff01-1ac5-626ce7470ed4 (at 10.151.44.93@o2ib) [1373630.758084] Lustre: Skipped 11 previous similar messages [1374439.206825] Lustre: MGS: Connection restored to 79960ef8-f288-a1a5-588f-0c8c728a9d6d (at 10.149.14.114@o2ib313) [1374439.206831] Lustre: Skipped 155 previous similar messages [1374635.520283] Lustre: MGS: haven't heard from client 090f53e9-4728-aa98-4898-10e0b7d31365 (at 10.151.27.23@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a387845400, cur 1592056712 expire 1592056562 last 1592056485 [1374635.590406] Lustre: Skipped 3 previous similar messages [1374642.516550] Lustre: nbp8-MDT0000: haven't heard from client 538dc196-c8b5-d5ec-e312-2bc60646af8b (at 10.151.27.23@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897c42874800, cur 1592056719 expire 1592056569 last 1592056492 [1374663.430398] LNet: 77837:0:(o2iblnd_cb.c:2602:kiblnd_passive_connect()) Conn stale 10.151.27.23@o2ib version 12/12 incarnation 1590129565769723/1592056734455370 [1374663.477935] LNet: 77837:0:(o2iblnd_cb.c:2602:kiblnd_passive_connect()) Skipped 1 previous similar message [1375223.150952] Lustre: MGS: Connection restored to b47d6a12-83e7-6369-dbaa-77e65ecbfa6c (at 10.151.55.96@o2ib) [1375223.150958] Lustre: Skipped 133 previous similar messages [1375937.984702] Lustre: MGS: Connection restored to 7254d0d2-948e-e40d-fb1e-02b516106534 (at 10.149.15.71@o2ib313) [1375937.984708] Lustre: Skipped 81 previous similar messages [1376739.138407] Lustre: MGS: Connection restored to 7e3ab9b2-8eef-06ef-5ddc-65fabd3767a2 (at 10.151.28.97@o2ib) [1376739.138412] Lustre: Skipped 19 previous similar messages [1377632.323603] Lustre: MGS: Connection restored to 8fe3d7c0-b5cb-814a-faa8-79b99f674488 (at 10.151.36.119@o2ib) [1377632.323608] Lustre: Skipped 3 previous similar messages [1377952.547728] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1377952.581528] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [1377952.615020] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.44.91@o2ib (304): c: 32, oc: 0, rc: 32 [1377952.655934] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [1378408.236169] Lustre: MGS: Connection restored to 50014ec2-8371-ec58-49fe-20e23770ef85 (at 10.151.28.206@o2ib) [1378408.236175] Lustre: Skipped 281 previous similar messages [1378746.576910] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1378746.610695] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.44.62@o2ib (303): c: 32, oc: 0, rc: 32 [1379012.575452] Lustre: MGS: Connection restored to f0a2a298-faf5-a153-1c33-33bfe7f95f10 (at 10.149.1.20@o2ib313) [1379012.575457] Lustre: Skipped 43 previous similar messages [1379664.642123] Lustre: MGS: Connection restored to 3942d62e-e023-1e0b-81d1-b8330a62f5f0 (at 10.149.10.45@o2ib313) [1379664.642128] Lustre: Skipped 79 previous similar messages [1380361.063387] Lustre: MGS: Connection restored to 0a465548-46a8-8f3f-514b-ca61f8babd14 (at 10.141.6.95@o2ib417) [1380361.063393] Lustre: Skipped 425 previous similar messages [1380970.399257] Lustre: MGS: Connection restored to d8522f51-9eb8-3e86-a1b0-60ed40952839 (at 10.151.3.179@o2ib) [1380970.399263] Lustre: Skipped 81 previous similar messages [1381814.166294] Lustre: MGS: Connection restored to a0f7c022-aec8-3c10-c32d-5f087e6f9a03 (at 10.151.37.126@o2ib) [1381814.166300] Lustre: Skipped 729 previous similar messages [1382519.557307] Lustre: MGS: Connection restored to bf1f48d7-5c75-7edd-f9b8-79a53048f73b (at 10.151.47.22@o2ib) [1382519.557313] Lustre: Skipped 15 previous similar messages [1383173.182359] Lustre: MGS: Connection restored to e8f4e8a8-72a7-0b09-4f97-fe721a271298 (at 10.151.7.84@o2ib) [1383173.182365] Lustre: Skipped 175 previous similar messages [1384654.812388] Lustre: MGS: Connection restored to 907b55f3-f455-8f9d-ecfa-1a42fa8045c5 (at 10.151.45.105@o2ib) [1384654.812393] Lustre: Skipped 139 previous similar messages [1384843.676016] Lustre: MGS: Connection restored to 7e3ab9b2-8eef-06ef-5ddc-65fabd3767a2 (at 10.151.28.97@o2ib) [1384843.676021] Lustre: Skipped 39 previous similar messages [1385429.777793] Lustre: MGS: Connection restored to 2bbe1ad8-f61c-0faa-17a9-ab1c89910a55 (at 10.151.55.82@o2ib) [1385429.777799] Lustre: Skipped 1 previous similar message [1385850.307416] Lustre: MGS: Connection restored to a793fb57-d10b-2041-bf04-befe96d56c28 (at 10.151.55.114@o2ib) [1385850.307422] Lustre: Skipped 151 previous similar messages [1386566.085034] Lustre: MGS: Connection restored to 85222724-004f-61a4-5c14-81d3db44c5ca (at 10.151.3.39@o2ib) [1386566.085040] Lustre: Skipped 131 previous similar messages [1387410.047290] Lustre: MGS: Connection restored to cfd65234-a1c6-a41d-a9f5-92798a0b911e (at 10.151.3.35@o2ib) [1387410.047296] Lustre: Skipped 255 previous similar messages [1388021.928039] Lustre: MGS: Connection restored to cd9f5758-b185-83d6-0394-bc7dfed38c74 (at 10.151.8.70@o2ib) [1388021.928045] Lustre: Skipped 387 previous similar messages [1388642.806885] Lustre: MGS: Connection restored to 0fc69a1c-b1ee-628e-fe46-7b6324d82031 (at 10.151.12.149@o2ib) [1388642.806892] Lustre: Skipped 281 previous similar messages [1388822.035973] Lustre: nbp8-MDT0000: haven't heard from client 7eeeb692-878d-94cb-c8f9-217c7042cf0c (at 10.151.1.48@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a3f32e4000, cur 1592070898 expire 1592070748 last 1592070671 [1388892.949266] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1388892.983041] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.1.53@o2ib (279): c: 32, oc: 0, rc: 32 [1388896.949399] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1388896.983180] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.1.61@o2ib (299): c: 32, oc: 0, rc: 32 [1388899.949427] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1388899.983205] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.1.66@o2ib (303): c: 30, oc: 0, rc: 32 [1388902.949538] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1388902.983322] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [1388903.016802] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.1.72@o2ib (306): c: 30, oc: 0, rc: 32 [1388903.057433] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [1388939.950990] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1388939.984776] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 9 previous similar messages [1388940.018544] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.1.46@o2ib (344): c: 30, oc: 0, rc: 32 [1388940.059172] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 9 previous similar messages [1388948.951219] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1388948.984989] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 14 previous similar messages [1388949.019042] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.1.63@o2ib (348): c: 30, oc: 0, rc: 32 [1388949.059670] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 14 previous similar messages [1389542.772808] Lustre: MGS: Connection restored to 5b8c5586-a8d5-f154-b3b9-8b83cbe96d33 (at 10.151.4.93@o2ib) [1389542.772813] Lustre: Skipped 89 previous similar messages [1390336.228117] Lustre: MGS: Connection restored to 25fc276c-f702-f1c3-3d6d-d0d9b2a831be (at 10.151.7.53@o2ib) [1390336.228132] Lustre: Skipped 259 previous similar messages [1390942.404965] Lustre: MGS: Connection restored to d241e15f-cf93-e88e-16cd-4d485bea372c (at 10.149.14.174@o2ib313) [1390942.404971] Lustre: Skipped 109 previous similar messages [1391982.367831] Lustre: MGS: Connection restored to a5000f76-b630-e3e6-937f-89732301666c (at 10.151.35.103@o2ib) [1391982.367837] Lustre: Skipped 69 previous similar messages [1392755.190079] Lustre: MGS: haven't heard from client 9a22eb28-2f49-184b-581b-14d131e2b3e0 (at 10.151.28.223@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897c7bf13000, cur 1592074831 expire 1592074681 last 1592074604 [1392755.260468] Lustre: Skipped 67 previous similar messages [1392769.180651] Lustre: nbp8-MDT0000: haven't heard from client 97a018cb-473d-6d21-5af6-3ddb2533a808 (at 10.151.28.223@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899fefadc800, cur 1592074845 expire 1592074695 last 1592074618 [1392769.259843] Lustre: MGS: Connection restored to 05d9c1b2-3fb0-6488-eb80-1f181c49a1f0 (at 10.141.6.185@o2ib417) [1392769.259847] Lustre: Skipped 173 previous similar messages [1392889.097032] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [1392889.130801] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 6 previous similar messages [1392889.164573] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.28.223@o2ib (347): c: 26, oc: 0, rc: 32 [1392889.205787] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 6 previous similar messages [1393453.827411] Lustre: MGS: Connection restored to 000e2457-e80b-7bac-2e6c-9e9282e25268 (at 10.151.6.77@o2ib) [1393453.827417] Lustre: Skipped 213 previous similar messages [1394236.244653] Lustre: MGS: haven't heard from client 3b7591c1-171d-8b24-ef17-0d24bfebc703 (at 10.151.33.207@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff898009817000, cur 1592076312 expire 1592076162 last 1592076085 [1394247.237586] Lustre: nbp8-MDT0000: haven't heard from client 1f3342ca-221e-2b94-802d-223c09e68f26 (at 10.151.33.207@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff898009816c00, cur 1592076323 expire 1592076173 last 1592076096 [1394329.149053] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1394329.182843] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.33.207@o2ib (308): c: 32, oc: 0, rc: 32 [1394345.148056] Lustre: MGS: Connection restored to 58cb0626-c080-45ce-3eab-8e37b3e0456b (at 10.151.8.185@o2ib) [1394345.148062] Lustre: Skipped 17 previous similar messages [1395081.004185] Lustre: MGS: Connection restored to aec4ebbf-b9d5-a232-0513-99426e17af64 (at 10.151.18.52@o2ib) [1395081.004190] Lustre: Skipped 81 previous similar messages [1395683.141179] Lustre: MGS: Connection restored to 442b9118-ef33-91ac-8315-5bf1838972a7 (at 10.151.8.68@o2ib) [1395683.141185] Lustre: Skipped 15 previous similar messages [1396495.578676] Lustre: MGS: Connection restored to 25fc276c-f702-f1c3-3d6d-d0d9b2a831be (at 10.151.7.53@o2ib) [1396495.578682] Lustre: Skipped 17 previous similar messages [1397266.879997] Lustre: MGS: Connection restored to b27d8a93-f4e3-f8b3-2e13-74734d6fe5b7 (at 10.149.9.110@o2ib313) [1397266.880002] Lustre: Skipped 11 previous similar messages [1397940.482211] Lustre: MGS: Connection restored to a1e56010-61e7-d875-34d3-12907ad3b907 (at 10.149.7.183@o2ib313) [1397940.482217] Lustre: Skipped 75 previous similar messages [1398642.631228] Lustre: MGS: Connection restored to 3e3c3a74-55b6-9121-cb25-258d669e542b (at 10.151.5.101@o2ib) [1398642.631234] Lustre: Skipped 263 previous similar messages [1399382.480041] Lustre: MGS: Connection restored to aa8255d2-878a-679f-4732-45aff153d6fd (at 10.151.46.19@o2ib) [1399382.480046] Lustre: Skipped 37 previous similar messages [1400041.567697] Lustre: MGS: Connection restored to 103e93ee-2218-c21b-6b6b-2447415cfc93 (at 10.151.56.115@o2ib) [1400041.567702] Lustre: Skipped 727 previous similar messages [1400697.385874] Lustre: MGS: Connection restored to 9d9462ef-317f-954b-23dd-556beedadc6a (at 10.151.11.195@o2ib) [1400697.385879] Lustre: Skipped 3 previous similar messages [1401456.194960] Lustre: MGS: Connection restored to f15d1991-5820-6603-a809-234699c9a8b9 (at 10.151.45.189@o2ib) [1401456.194966] Lustre: Skipped 141 previous similar messages [1402222.945314] Lustre: MGS: Connection restored to 000e2457-e80b-7bac-2e6c-9e9282e25268 (at 10.151.6.77@o2ib) [1402222.945320] Lustre: Skipped 43 previous similar messages [1402947.232309] Lustre: MGS: Connection restored to 202522c8-9f84-c78e-c801-667ce7fe478c (at 10.151.35.35@o2ib) [1402947.232315] Lustre: Skipped 63 previous similar messages [1404000.979044] Lustre: MGS: Connection restored to 583c5b78-ed7f-f639-f056-87b39fa07d8c (at 10.151.30.176@o2ib) [1404000.979050] Lustre: Skipped 35 previous similar messages [1404645.126396] Lustre: MGS: Connection restored to 43b146f5-82a9-8d93-aa94-8c2863d8d0fb (at 10.151.33.66@o2ib) [1404645.126401] Lustre: Skipped 35 previous similar messages [1404667.530817] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [1404667.564612] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.29.193@o2ib (236): c: 32, oc: 0, rc: 32 [1404687.530438] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1404687.564232] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.43.82@o2ib (288): c: 32, oc: 0, rc: 32 [1405302.796721] Lustre: MGS: Connection restored to e8f4e8a8-72a7-0b09-4f97-fe721a271298 (at 10.151.7.84@o2ib) [1405302.796727] Lustre: Skipped 143 previous similar messages [1405918.206043] Lustre: MGS: Connection restored to e78f1aa0-ceb2-a3a9-aed7-445f0c482121 (at 10.151.19.191@o2ib) [1405918.206049] Lustre: Skipped 21 previous similar messages [1406542.010578] Lustre: MGS: Connection restored to 452efcb7-7627-bce2-1d65-ec156827488c (at 10.151.7.70@o2ib) [1406542.010584] Lustre: Skipped 53 previous similar messages [1407217.701359] Lustre: MGS: Connection restored to d03f47f7-0d04-3e96-eaf5-3356daa14143 (at 10.151.8.51@o2ib) [1407217.701364] Lustre: Skipped 5 previous similar messages [1408106.735086] Lustre: MGS: Connection restored to ad8cb262-0906-cba2-2490-0f5885d788e2 (at 10.151.18.10@o2ib) [1408106.735092] Lustre: Skipped 281 previous similar messages [1408758.636882] Lustre: MGS: Connection restored to ea462a93-7ae4-1a21-88d4-dc0ef6aafa15 (at 10.149.15.124@o2ib313) [1408758.636888] Lustre: Skipped 143 previous similar messages [1409366.379244] Lustre: MGS: Connection restored to 16c9d6b9-0b0b-68b0-084f-60e7ecae8df0 (at 10.151.49.136@o2ib) [1409366.379249] Lustre: Skipped 103 previous similar messages [1410002.918956] Lustre: MGS: Connection restored to 3e3c3a74-55b6-9121-cb25-258d669e542b (at 10.151.5.101@o2ib) [1410002.918961] Lustre: Skipped 57 previous similar messages [1410606.221267] Lustre: MGS: Connection restored to 6609ad8d-4189-ba2b-669f-c379cd628d3b (at 10.149.12.158@o2ib313) [1410606.221273] Lustre: Skipped 5 previous similar messages [1411323.809695] Lustre: MGS: Connection restored to dfb52300-ce7e-7caa-1c19-62dfd5d7650c (at 10.151.39.108@o2ib) [1411323.809701] Lustre: Skipped 81 previous similar messages [1412000.315546] Lustre: MGS: Connection restored to 60f94c13-6f4d-9245-72c1-3a1d2e49038d (at 10.151.3.41@o2ib) [1412000.315552] Lustre: Skipped 367 previous similar messages [1412842.232402] Lustre: MGS: Connection restored to bb019e78-5cbf-186b-2863-02099c5b2441 (at 10.151.23.10@o2ib) [1412842.232407] Lustre: Skipped 261 previous similar messages [1413354.847952] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1413354.881739] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.4.30@o2ib (303): c: 32, oc: 0, rc: 32 [1413630.612234] Lustre: MGS: Connection restored to 2222bfbc-6612-753a-6147-2d89a488098e (at 10.151.14.90@o2ib) [1413630.612241] Lustre: Skipped 697 previous similar messages [1414338.738798] Lustre: MGS: Connection restored to eb275f06-9ba3-d95c-2036-7edc217bbaf5 (at 10.151.53.42@o2ib) [1414338.738803] Lustre: Skipped 679 previous similar messages [1415608.236134] Lustre: MGS: Connection restored to 781aa790-f031-de61-e376-f489a75d8eed (at 10.151.54.118@o2ib) [1415608.236141] Lustre: Skipped 119 previous similar messages [1416036.771920] Lustre: MGS: Connection restored to 8fd18b61-e10d-b023-dd15-2b7973d6fb51 (at 10.151.55.17@o2ib) [1416036.771926] Lustre: Skipped 17 previous similar messages [1416352.525037] Lustre: MGS: Connection restored to b27d8a93-f4e3-f8b3-2e13-74734d6fe5b7 (at 10.149.9.110@o2ib313) [1416352.525043] Lustre: Skipped 33 previous similar messages [1416833.143821] Lustre: MGS: Connection restored to 722752a0-051a-6358-d2f2-574aaed5ea0e (at 10.151.42.155@o2ib) [1416833.143827] Lustre: Skipped 153 previous similar messages [1417441.143178] Lustre: MGS: Connection restored to edba9f51-cf87-3622-27cf-2856ae5b2587 (at 10.151.3.51@o2ib) [1417441.143184] Lustre: Skipped 407 previous similar messages [1418113.648516] Lustre: MGS: Connection restored to 81df8a59-e8bc-88a0-d563-cb62440a832e (at 10.149.14.149@o2ib313) [1418113.648522] Lustre: Skipped 199 previous similar messages [1419091.206227] Lustre: MGS: Connection restored to e7c0c45e-3885-a530-7fd8-8b4d4a2e6045 (at 10.151.3.40@o2ib) [1419091.206233] Lustre: Skipped 61 previous similar messages [1419758.359381] Lustre: MGS: Connection restored to 6873ce44-79d7-2b1a-0051-4f3849c71206 (at 10.151.30.90@o2ib) [1419758.359386] Lustre: Skipped 147 previous similar messages [1420387.811657] Lustre: MGS: Connection restored to c7a65991-5805-d447-e1bc-a85ab7ae8e26 (at 10.151.45.124@o2ib) [1420387.811663] Lustre: Skipped 503 previous similar messages [1420994.298964] Lustre: MGS: Connection restored to b91e2ad0-266d-bac8-313a-0965ad7d6803 (at 10.151.49.95@o2ib) [1420994.298970] Lustre: Skipped 35 previous similar messages [1421252.137687] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1421252.171465] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.57.235@o2ib (314): c: 31, oc: 0, rc: 32 [1421263.138186] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1421263.171956] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [1421263.205450] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.58.204@o2ib (330): c: 31, oc: 0, rc: 32 [1421263.246656] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [1421277.138603] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1421277.172383] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.58.231@o2ib (343): c: 31, oc: 0, rc: 32 [1421617.866005] Lustre: MGS: Connection restored to 25a90e87-6a1d-ed04-7c8e-9434dd558de9 (at 10.141.6.119@o2ib417) [1421617.866011] Lustre: Skipped 201 previous similar messages [1422237.383181] Lustre: MGS: Connection restored to 943b3ee5-fc44-904f-1750-0995663eded3 (at 10.151.54.135@o2ib) [1422237.383187] Lustre: Skipped 327 previous similar messages [1423089.388947] Lustre: MGS: Connection restored to 8b4fa7a0-dec8-022a-9d92-9d9dc0dbf539 (at 10.151.3.170@o2ib) [1423089.388953] Lustre: Skipped 19 previous similar messages [1423693.558301] Lustre: MGS: Connection restored to b23268b8-7787-d26e-403a-da9ace664697 (at 10.151.23.197@o2ib) [1423693.558306] Lustre: Skipped 167 previous similar messages [1424718.594143] Lustre: MGS: Connection restored to df32c7c6-fec0-1dee-20f9-b6663e53e203 (at 10.151.0.57@o2ib) [1424718.594148] Lustre: Skipped 1 previous similar message [1425330.628111] Lustre: MGS: Connection restored to 7e3ab9b2-8eef-06ef-5ddc-65fabd3767a2 (at 10.151.28.97@o2ib) [1425330.628116] Lustre: Skipped 21 previous similar messages [1426666.307500] Lustre: MGS: Connection restored to 38d4e283-944f-6ed9-9c5f-a79657660b19 (at 10.151.33.187@o2ib) [1426666.307506] Lustre: Skipped 29 previous similar messages [1427594.916599] Lustre: MGS: Connection restored to cac79241-51bb-2aff-10e1-aa3d8155ec94 (at 10.151.2.33@o2ib) [1427594.916604] Lustre: Skipped 23 previous similar messages [1427793.390502] Lustre: MGS: Connection restored to a8a5a740-b7e8-25ed-6d05-d9144387cc20 (at 10.151.36.132@o2ib) [1427793.390507] Lustre: Skipped 1 previous similar message [1428440.837639] Lustre: MGS: Connection restored to c3e41625-7382-ced5-cd88-3ba3c8b9a5f1 (at 10.151.32.22@o2ib) [1428440.837645] Lustre: Skipped 1 previous similar message [1428447.526505] Lustre: MGS: Connection restored to 3b7f05e0-c745-7209-4e5e-15f647456d2f (at 10.151.33.116@o2ib) [1428447.526511] Lustre: Skipped 1 previous similar message [1428452.326268] Lustre: nbp8-MDT0000: Connection restored to 4b98ae3c-7c49-1ceb-d876-7705a65735c2 (at 10.151.28.87@o2ib) [1428452.326273] Lustre: Skipped 251 previous similar messages [1428810.296416] Lustre: MGS: Connection restored to e97c7aa7-9d96-8806-afa1-3d5c3e609419 (at 10.151.38.200@o2ib) [1428810.296422] Lustre: Skipped 57 previous similar messages [1429253.758252] Lustre: MGS: Connection restored to 0e9ac13d-a20c-0f6b-844c-4ca4d70a187d (at 10.151.49.114@o2ib) [1429253.758257] Lustre: Skipped 179 previous similar messages [1429377.737229] Lustre: MGS: Connection restored to 7e3ab9b2-8eef-06ef-5ddc-65fabd3767a2 (at 10.151.28.97@o2ib) [1429377.737235] Lustre: Skipped 7 previous similar messages [1429466.831003] Lustre: MGS: Connection restored to 7b707d12-42e7-db53-22a5-8638c997d307 (at 10.151.37.96@o2ib) [1429466.831009] Lustre: Skipped 29 previous similar messages [1429750.679138] Lustre: MGS: Connection restored to cb6163c2-0455-cca3-3f84-26cf8bc24846 (at 10.151.4.30@o2ib) [1429750.679144] Lustre: Skipped 17 previous similar messages [1431446.807833] Lustre: MGS: Connection restored to 468bc333-415b-297d-d005-614c180deca7 (at 10.151.11.55@o2ib) [1431446.807838] Lustre: Skipped 311 previous similar messages [1431630.564901] Lustre: MGS: Connection restored to 32a67060-739f-d506-c266-32aa46b1ad96 (at 10.151.54.148@o2ib) [1431630.564906] Lustre: Skipped 27 previous similar messages [1431782.372947] Lustre: MGS: Connection restored to 537a72bc-1d5b-a000-ba93-037dfddb2fa4 (at 10.149.9.194@o2ib313) [1431782.372953] Lustre: Skipped 5 previous similar messages [1432298.492220] Lustre: MGS: Connection restored to 2bfa45ce-61c6-8dbb-3934-0e11a28db24a (at 10.149.1.69@o2ib313) [1432298.492225] Lustre: Skipped 371 previous similar messages [1432994.083383] Lustre: MGS: Connection restored to 1443b15f-30d9-9cd2-b47b-d2e86371287f (at 10.149.11.181@o2ib313) [1432994.083389] Lustre: Skipped 109 previous similar messages [1434197.076713] Lustre: MGS: Connection restored to 503b419a-d44e-65bb-f096-0af2983b95f4 (at 10.151.55.118@o2ib) [1434197.076719] Lustre: Skipped 63 previous similar messages [1435240.996327] Lustre: MGS: Connection restored to 0e79e130-42d0-ec3d-b04e-c9aae7889139 (at 10.151.10.93@o2ib) [1435240.996332] Lustre: Skipped 23 previous similar messages [1435568.752533] Lustre: nbp8-MDT0000: haven't heard from client dcff34d0-e6fa-8a1b-cf6e-04f6a3c532a1 (at 10.151.3.234@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff898283237400, cur 1592117643 expire 1592117493 last 1592117416 [1435632.665615] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1435632.699400] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.53.22@o2ib (303): c: 32, oc: 0, rc: 32 [1435650.666379] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1435650.700146] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.3.234@o2ib (308): c: 30, oc: 0, rc: 32 [1435661.666683] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1435661.700453] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.15.11@o2ib (308): c: 30, oc: 0, rc: 32 [1435923.155534] Lustre: MGS: Connection restored to 55948856-744f-1487-bd78-1001a8290df3 (at 10.151.3.234@o2ib) [1435923.155539] Lustre: Skipped 21 previous similar messages [1436528.789714] Lustre: MGS: haven't heard from client c45359cf-f911-a477-fd7a-598ee4212272 (at 10.151.15.21@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89922ffc2800, cur 1592118603 expire 1592118453 last 1592118376 [1436528.859869] Lustre: Skipped 3 previous similar messages [1436529.789779] Lustre: nbp8-MDT0000: haven't heard from client 8cf7927c-f610-c048-2b0e-3a6fef757ea3 (at 10.151.15.21@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89813d013800, cur 1592118604 expire 1592118454 last 1592118377 [1436598.846482] Lustre: MGS: Connection restored to 9bde665a-7b50-d628-3a6d-766a2b0deb49 (at 10.151.15.25@o2ib) [1436598.846488] Lustre: Skipped 133 previous similar messages [1436625.702025] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1436625.735817] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.15.21@o2ib (321): c: 31, oc: 0, rc: 32 [1436681.807177] Lustre: MGS: haven't heard from client 8f517394-2099-4b3b-e669-92b06cbdcfe9 (at 10.151.9.136@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8997b7320800, cur 1592118756 expire 1592118606 last 1592118529 [1436757.801681] Lustre: MGS: haven't heard from client 739652dd-95cf-31d0-7a5f-73770fd9ca89 (at 10.151.15.17@o2ib) in 223 seconds. I think it's dead, and I am evicting it. exp ffff896b2da4a400, cur 1592118832 expire 1592118682 last 1592118609 [1436757.871808] Lustre: Skipped 1 previous similar message [1436767.690154] Process accounting resumed [1436773.707471] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1436773.741257] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.9.136@o2ib (317): c: 31, oc: 0, rc: 32 [1436876.711148] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1436876.744935] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.15.17@o2ib (341): c: 30, oc: 0, rc: 32 [1437466.332100] Lustre: MGS: Connection restored to 7e3ab9b2-8eef-06ef-5ddc-65fabd3767a2 (at 10.151.28.97@o2ib) [1437466.332106] Lustre: Skipped 91 previous similar messages [1438072.344835] Lustre: MGS: Connection restored to 469924b1-c96c-95d1-5ab8-5dd603f6bb75 (at 10.151.2.126@o2ib) [1438072.344841] Lustre: Skipped 57 previous similar messages [1438313.866352] Lustre: nbp8-MDT0000: haven't heard from client d552c270-2c1b-bddb-ef32-bf03c41f6b6d (at 10.151.2.22@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897e04e1b000, cur 1592120388 expire 1592120238 last 1592120161 [1438313.938743] Lustre: Skipped 1 previous similar message [1438396.766806] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1438396.800587] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.2.22@o2ib (309): c: 30, oc: 0, rc: 32 [1438398.767876] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [1438398.801668] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.2.26@o2ib (304): c: 30, oc: 0, rc: 32 [1438399.768018] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [1438399.801790] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.2.29@o2ib (305): c: 30, oc: 0, rc: 32 [1438418.768619] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1438418.802413] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 2 previous similar messages [1438418.836200] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.2.66@o2ib (325): c: 30, oc: 0, rc: 32 [1438418.876835] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 2 previous similar messages [1438492.770320] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1438492.804112] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.2.113@o2ib (341): c: 30, oc: 0, rc: 32 [1438918.876002] Lustre: nbp8-MDT0000: haven't heard from client 36c588c0-55e7-0ea2-6774-56cb3c010f48 (at 10.151.3.53@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8973c5747c00, cur 1592120993 expire 1592120843 last 1592120766 [1438918.948415] Lustre: Skipped 15 previous similar messages [1438994.789706] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [1438994.823500] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [1438994.856994] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.3.53@o2ib (303): c: 30, oc: 0, rc: 32 [1438994.897622] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [1439033.790140] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1439033.823923] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.1.239@o2ib (334): c: 30, oc: 0, rc: 32 [1439077.244056] Lustre: MGS: Connection restored to a652f244-366c-e7d7-8f1b-1a98d30ccc72 (at 10.149.14.12@o2ib313) [1439077.244062] Lustre: Skipped 75 previous similar messages [1439700.915966] Lustre: MGS: Connection restored to 2ef2eaf3-34a2-0f30-748a-94cba4cbe9cd (at 10.149.16.71@o2ib313) [1439700.915972] Lustre: Skipped 425 previous similar messages [1440385.992609] Lustre: MGS: Connection restored to 231564b5-968c-c926-8a70-4284c6e34551 (at 10.153.10.14@o2ib233) [1440385.992621] Lustre: Skipped 399 previous similar messages [1440988.299800] Lustre: MGS: Connection restored to a457f2c9-3b00-2b44-b52e-1bf99a80c295 (at 10.141.5.214@o2ib417) [1440988.299806] Lustre: Skipped 2431 previous similar messages [1441589.071785] Lustre: MGS: Connection restored to 7fc63384-a679-5fd2-0354-b62ec70e9168 (at 10.149.14.87@o2ib313) [1441589.071791] Lustre: Skipped 226 previous similar messages [1442208.965117] Lustre: MGS: Connection restored to 14046a6d-534c-8409-1e72-51b9da6da8f1 (at 10.151.3.71@o2ib) [1442208.965123] Lustre: Skipped 231 previous similar messages [1442540.007955] Lustre: nbp8-MDT0000: haven't heard from client 087bcf79-e538-3ea3-0ffd-9481e05854f6 (at 10.151.3.234@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897932fe2000, cur 1592124614 expire 1592124464 last 1592124387 [1442540.080645] Lustre: Skipped 3 previous similar messages [1442619.921608] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1442619.955390] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.3.234@o2ib (306): c: 30, oc: 0, rc: 32 [1442630.921908] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1442630.955691] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.15.11@o2ib (311): c: 30, oc: 0, rc: 32 [1442880.092690] Lustre: MGS: Connection restored to 15896e2d-4e9c-f383-adfc-728a779ac9da (at 10.151.3.235@o2ib) [1442880.092696] Lustre: Skipped 945 previous similar messages [1443742.149279] Lustre: MGS: Connection restored to 4dfcd5e1-025a-bbf4-e906-e976e70a797d (at 10.151.34.126@o2ib) [1443742.149284] Lustre: Skipped 1 previous similar message [1443964.060753] Lustre: MGS: haven't heard from client 1f0bbe56-f216-5cb6-6b6f-b33ed6cd84ba (at 10.151.4.119@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8979c2a7e800, cur 1592126038 expire 1592125888 last 1592125811 [1443964.130862] Lustre: Skipped 3 previous similar messages [1444053.974141] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1444054.007929] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.4.119@o2ib (316): c: 30, oc: 0, rc: 32 [1444441.499847] Lustre: MGS: Connection restored to 32a13c32-8cbb-54f0-0860-58f10db333e6 (at 10.151.0.204@o2ib) [1444441.499852] Lustre: Skipped 17 previous similar messages [1445040.100490] Lustre: nbp8-MDT0000: haven't heard from client 73ba4842-c4cf-8fd2-2cda-4439a5b6158b (at 10.151.36.163@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8999b3bf5400, cur 1592127114 expire 1592126964 last 1592126887 [1445040.173494] Lustre: Skipped 1 previous similar message [1445042.757257] Lustre: MGS: Connection restored to 4f40a53c-43ac-d13f-a93b-47357c96c9fc (at 10.151.3.158@o2ib) [1445042.757263] Lustre: Skipped 239 previous similar messages [1445128.013537] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1445128.047332] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.3.99@o2ib (303): c: 32, oc: 0, rc: 32 [1445136.013760] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [1445136.047534] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.37.96@o2ib (311): c: 30, oc: 0, rc: 32 [1445139.014975] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [1445139.048760] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [1445139.082249] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.36.157@o2ib (325): c: 30, oc: 0, rc: 32 [1445139.123451] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [1445142.013948] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1445142.047724] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 3 previous similar messages [1445142.081496] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.37.108@o2ib (319): c: 30, oc: 0, rc: 32 [1445142.122697] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 3 previous similar messages [1445148.014266] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1445148.048044] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 2 previous similar messages [1445148.081811] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.37.20@o2ib (333): c: 30, oc: 0, rc: 32 [1445148.122733] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 2 previous similar messages [1445158.014535] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1445158.048315] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 7 previous similar messages [1445158.082083] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.37.39@o2ib (335): c: 30, oc: 0, rc: 32 [1445158.123003] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 7 previous similar messages [1445175.015187] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1445175.048959] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 15 previous similar messages [1445175.083020] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.37.73@o2ib (351): c: 30, oc: 0, rc: 32 [1445175.123933] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 15 previous similar messages [1445341.110521] Lustre: nbp8-MDT0000: haven't heard from client b57a86c6-2585-d51f-a660-28ded9f9a5d2 (at 10.151.3.154@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a3ff886c00, cur 1592127415 expire 1592127265 last 1592127188 [1445341.183251] Lustre: Skipped 71 previous similar messages [1445459.025598] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1445459.059372] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 2 previous similar messages [1445459.093153] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.3.154@o2ib (345): c: 30, oc: 0, rc: 32 [1445459.134066] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 2 previous similar messages [1445724.794759] Lustre: MGS: Connection restored to f7009779-bfa8-c702-b247-e09b244ef954 (at 10.151.54.150@o2ib) [1445724.794765] Lustre: Skipped 43 previous similar messages [1446358.672080] Lustre: MGS: Connection restored to aa36cf04-f7d0-1dad-af54-2e9558f38b88 (at 10.149.15.43@o2ib313) [1446358.672086] Lustre: Skipped 287 previous similar messages [1447342.718986] Lustre: MGS: Connection restored to b930058a-3d9c-8e6e-ac53-15d5bc088355 (at 10.151.33.84@o2ib) [1447342.718992] Lustre: Skipped 933 previous similar messages [1448130.600031] Lustre: MGS: Connection restored to 9d2d2c9b-8bf4-144c-2483-590e261c8c66 (at 10.141.2.141@o2ib417) [1448130.600037] Lustre: Skipped 229 previous similar messages [1448834.828120] Lustre: MGS: Connection restored to 2a413cba-84d7-8405-6e2c-0ca8fe5cb429 (at 10.151.3.225@o2ib) [1448834.828127] Lustre: Skipped 149 previous similar messages [1449629.708607] Lustre: MGS: Connection restored to 7e3ab9b2-8eef-06ef-5ddc-65fabd3767a2 (at 10.151.28.97@o2ib) [1449629.708619] Lustre: Skipped 821 previous similar messages [1450346.110454] Lustre: MGS: Connection restored to 3557b774-90d4-ce5b-c7c8-f7c2e0678d91 (at 10.149.3.112@o2ib313) [1450346.110459] Lustre: Skipped 109 previous similar messages [1451126.965256] Lustre: MGS: Connection restored to 55948856-744f-1487-bd78-1001a8290df3 (at 10.151.3.234@o2ib) [1451126.965262] Lustre: Skipped 667 previous similar messages [1451788.689935] Lustre: MGS: Connection restored to dd8a6541-e24d-aae9-b6e2-cbc204d1069f (at 10.151.0.172@o2ib) [1451788.689941] Lustre: Skipped 35 previous similar messages [1452390.762390] Lustre: MGS: Connection restored to fe4b77bc-b2f5-70ce-3c16-9f5696c97e20 (at 10.141.2.14@o2ib417) [1452390.762395] Lustre: Skipped 99 previous similar messages [1453255.690901] Lustre: MGS: Connection restored to b930058a-3d9c-8e6e-ac53-15d5bc088355 (at 10.151.33.84@o2ib) [1453255.690907] Lustre: Skipped 151 previous similar messages [1453904.553954] Lustre: MGS: Connection restored to 084bee6b-ec4f-00bd-3d1e-1f737f3f7e2d (at 10.151.35.137@o2ib) [1453904.553961] Lustre: Skipped 221 previous similar messages [1454702.332773] Lustre: MGS: Connection restored to 6c17f618-2d06-e981-b330-8cabff742991 (at 10.149.2.222@o2ib313) [1454702.332779] Lustre: Skipped 129 previous similar messages [1455545.163418] Lustre: MGS: Connection restored to 10e121aa-7dc5-f8c8-4ddf-59094b4dc3b3 (at 10.151.14.176@o2ib) [1455545.163424] Lustre: Skipped 757 previous similar messages [1456323.734746] Lustre: MGS: Connection restored to 03636d43-3fa8-877a-89bd-3494956943fc (at 10.149.11.161@o2ib313) [1456323.734751] Lustre: Skipped 61 previous similar messages [1457422.345951] Lustre: MGS: Connection restored to 7ba125db-5cee-1fd1-dff1-70b03e137241 (at 10.141.2.1@o2ib417) [1457422.345956] Lustre: Skipped 159 previous similar messages [1458068.483950] Lustre: MGS: Connection restored to aabaed7e-f697-84f4-1eb3-7184f4cd0875 (at 10.149.12.174@o2ib313) [1458068.483956] Lustre: Skipped 351 previous similar messages [1458668.506058] Lustre: MGS: Connection restored to 99b62cf6-5272-31bc-6564-d02713593cf4 (at 10.151.50.110@o2ib) [1458668.506064] Lustre: Skipped 5 previous similar messages [1459550.365006] Lustre: MGS: Connection restored to 982498cf-7cec-73fc-b910-5794ac24fce1 (at 10.141.3.96@o2ib417) [1459550.365012] Lustre: Skipped 149 previous similar messages [1460152.075785] Lustre: MGS: Connection restored to 934e3348-4b8e-b3cd-a666-cf2dab0bb5f4 (at 10.151.10.121@o2ib) [1460152.075791] Lustre: Skipped 245 previous similar messages [1461806.552190] Lustre: MGS: Connection restored to 7e3ab9b2-8eef-06ef-5ddc-65fabd3767a2 (at 10.151.28.97@o2ib) [1461806.552195] Lustre: Skipped 3 previous similar messages [1462114.881383] Lustre: MGS: Connection restored to e209360d-feff-e507-d2f3-f9c9bb02a681 (at 10.141.2.138@o2ib417) [1462114.881388] Lustre: Skipped 1 previous similar message [1462560.002826] Lustre: MGS: Connection restored to b28d8a9e-22a0-1703-dc7a-37724d9b59d9 (at 10.151.11.209@o2ib) [1462560.002831] Lustre: Skipped 639 previous similar messages [1462969.052223] Lustre: MGS: Connection restored to f4d45193-26e2-3c60-0740-2ec2604d37f9 (at 10.149.5.13@o2ib313) [1462969.052228] Lustre: Skipped 1 previous similar message [1463765.304291] Lustre: MGS: Connection restored to ea462a93-7ae4-1a21-88d4-dc0ef6aafa15 (at 10.149.15.124@o2ib313) [1463765.304297] Lustre: Skipped 413 previous similar messages [1464593.716024] Lustre: MGS: Connection restored to 06ca9284-da8c-feaf-8520-b650ed3d379a (at 10.151.8.35@o2ib) [1464593.716032] Lustre: Skipped 209 previous similar messages [1465342.781147] Lustre: MGS: Connection restored to e316ee0b-b686-0c57-b2d7-51c53f93f360 (at 10.141.2.188@o2ib417) [1465342.781156] Lustre: Skipped 109 previous similar messages [1466010.046182] Lustre: MGS: Connection restored to 672f5146-477f-c0ad-1db3-b32db52b737a (at 10.153.10.86@o2ib233) [1466010.046188] Lustre: Skipped 3 previous similar messages [1466651.896415] Lustre: MGS: haven't heard from client 8c3d7b12-4637-e136-765b-8bd7f427e44b (at 10.153.17.192@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897dbb4a2800, cur 1592148725 expire 1592148575 last 1592148498 [1466651.967682] Lustre: Skipped 15 previous similar messages [1466818.922664] Lustre: MGS: Connection restored to 1d09530b-bea8-d782-ed93-18b711cf1e57 (at 10.151.28.39@o2ib) [1466818.922669] Lustre: Skipped 669 previous similar messages [1467730.638539] Lustre: MGS: Connection restored to 1e3c8c80-050e-f68e-da69-2fdeabc394ea (at 10.151.23.194@o2ib) [1467730.638545] Lustre: Skipped 71 previous similar messages [1468396.321080] Lustre: MGS: Connection restored to 378b41ee-c419-3f59-41dc-3b5d390a3aff (at 10.151.35.134@o2ib) [1468396.321085] Lustre: Skipped 45 previous similar messages [1469888.198245] Lustre: MGS: Connection restored to 7f6351fe-0f58-ee8e-383a-52b2d115fef6 (at 10.149.11.67@o2ib313) [1469888.198251] Lustre: Skipped 7 previous similar messages [1470097.807460] Lustre: MGS: Connection restored to 5e89f64c-ef31-49ad-a872-706f27ed46b2 (at 10.151.39.115@o2ib) [1470097.807466] Lustre: Skipped 61 previous similar messages [1470358.744097] Lustre: MGS: Connection restored to 4b6a7d12-4e75-915e-01ab-6405b0f2ec32 (at 10.151.38.216@o2ib) [1470358.744102] Lustre: Skipped 13 previous similar messages [1471006.056085] Lustre: nbp8-MDT0000: haven't heard from client f0f94607-877c-d0b0-467a-424324089c53 (at 10.153.16.47@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899ff4bcac00, cur 1592153079 expire 1592152929 last 1592152852 [1471006.129640] Lustre: Skipped 1 previous similar message [1471008.057811] Lustre: MGS: haven't heard from client 6ad8adbe-834c-0bb4-2b20-93e4eaee5022 (at 10.153.16.47@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89991d688400, cur 1592153081 expire 1592152931 last 1592152854 [1471008.128781] Lustre: Skipped 1 previous similar message [1471474.409305] Lustre: MGS: Connection restored to ebc483da-cf36-90db-80fb-d15802a91627 (at 10.151.12.220@o2ib) [1471474.409311] Lustre: Skipped 1011 previous similar messages [1471670.347573] Lustre: MGS: Connection restored to fe10dc02-8ee5-df33-6cc2-9f3eab2dd3c1 (at 10.141.5.144@o2ib417) [1471670.347578] Lustre: Skipped 1 previous similar message [1471757.267658] Lustre: MGS: Connection restored to 47fc7a36-f569-c415-5345-985f9af11063 (at 10.149.14.14@o2ib313) [1471757.267664] Lustre: Skipped 101 previous similar messages [1471917.682613] Lustre: MGS: Connection restored to 441cab8f-3eec-93f3-5546-d0f01d26b0ee (at 10.149.14.54@o2ib313) [1471917.682619] Lustre: Skipped 39 previous similar messages [1472257.100181] Lustre: MGS: Connection restored to 683ca5ac-36a7-d66e-0496-efcb39123583 (at 10.151.8.221@o2ib) [1472257.100187] Lustre: Skipped 3 previous similar messages [1472753.121673] Lustre: MGS: haven't heard from client a234f8ae-ae53-2544-214f-2d1fdc26c7b5 (at 10.153.16.107@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897f20aa5400, cur 1592154826 expire 1592154676 last 1592154599 [1472753.192926] Lustre: Skipped 1 previous similar message [1473147.533693] Lustre: MGS: Connection restored to 6d281757-4b3b-1ab3-a57b-c9316f193db7 (at 10.151.12.120@o2ib) [1473147.533699] Lustre: Skipped 113 previous similar messages [1473740.154576] Lustre: nbp8-MDT0000: haven't heard from client c7b474bc-1847-e4d8-efc0-e33a40de6749 (at 10.153.17.203@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8975efed8c00, cur 1592155813 expire 1592155663 last 1592155586 [1473740.228417] Lustre: Skipped 1 previous similar message [1473978.804209] Lustre: MGS: Connection restored to 9c47b3b0-6367-cf13-a2c6-09f0d1f1d9c7 (at 10.149.15.247@o2ib313) [1473978.804215] Lustre: Skipped 239 previous similar messages [1474383.180397] Lustre: MGS: haven't heard from client e85f5c45-dcc0-4daf-5f56-7364c771e12b (at 10.153.17.203@o2ib233) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897e96244c00, cur 1592156456 expire 1592156306 last 1592156229 [1474383.251645] Lustre: Skipped 1 previous similar message [1474857.130060] Lustre: MGS: Connection restored to 739ed43c-e636-fa5b-925b-24c8bcb008f3 (at 10.149.9.191@o2ib313) [1474857.130065] Lustre: Skipped 65 previous similar messages [1475679.224593] Lustre: MGS: Connection restored to 775a0410-ef22-d30e-3662-79bb529f578e (at 10.151.3.140@o2ib) [1475679.224601] Lustre: Skipped 103 previous similar messages [1476300.400889] Lustre: MGS: Connection restored to e4f417d7-1bd3-6aa3-f9e5-f7a6d62a24a9 (at 10.149.6.63@o2ib313) [1476300.400895] Lustre: Skipped 75 previous similar messages [1477260.741991] Lustre: MGS: Connection restored to 75e14dc5-81c8-ee59-f32c-8a8f7ffffc4f (at 10.151.32.41@o2ib) [1477260.741997] Lustre: Skipped 59 previous similar messages [1477989.779815] Lustre: MGS: Connection restored to e03425fe-754a-37b1-dce8-bd1553e00e46 (at 10.151.7.99@o2ib) [1477989.779821] Lustre: Skipped 201 previous similar messages [1478655.075290] Lustre: MGS: Connection restored to e490ac58-e260-5cdb-cd9f-7a02e1370793 (at 10.149.14.64@o2ib313) [1478655.075296] Lustre: Skipped 239 previous similar messages [1479587.884070] Lustre: MGS: Connection restored to 950b269d-8789-07a7-221c-6a0fdde32f67 (at 10.151.36.126@o2ib) [1479587.884076] Lustre: Skipped 17 previous similar messages [1480497.313587] Lustre: MGS: Connection restored to ed71334e-02c0-8898-c74d-aed5a51ca31e (at 10.149.6.47@o2ib313) [1480497.313593] Lustre: Skipped 111 previous similar messages [1481130.345937] Lustre: MGS: Connection restored to 754fd4ad-250f-da52-8c98-5066a1ca2eb2 (at 10.149.11.178@o2ib313) [1481130.345942] Lustre: Skipped 67 previous similar messages [1481742.704442] Lustre: MGS: Connection restored to c167a5ce-467e-01dc-9fc4-55d517dca0f4 (at 10.151.2.238@o2ib) [1481742.704448] Lustre: Skipped 121 previous similar messages [1482355.481687] Lustre: MGS: Connection restored to 5c721e20-8bb3-687a-e4de-04423192c342 (at 10.149.4.8@o2ib313) [1482355.481693] Lustre: Skipped 71 previous similar messages [1482955.801930] Lustre: MGS: Connection restored to 5eafeff5-9a9f-99b1-0eff-dae2a1f2b762 (at 10.151.8.96@o2ib) [1482955.801936] Lustre: Skipped 254 previous similar messages [1484236.679284] Lustre: MGS: Connection restored to 7ec41a49-ad28-6755-09ed-874681eb33dc (at 10.151.47.241@o2ib) [1484236.679290] Lustre: Skipped 129 previous similar messages [1484444.196652] Lustre: MGS: Connection restored to 669db2e5-ee46-ebc5-9bf0-d18e8845726b (at 10.149.15.55@o2ib313) [1484444.196657] Lustre: Skipped 3 previous similar messages [1485125.705799] Lustre: MGS: Connection restored to 8a78965f-3076-735f-834e-b24564745d49 (at 10.149.11.99@o2ib313) [1485125.705806] Lustre: Skipped 59 previous similar messages [1485647.343694] Lustre: MGS: Connection restored to ad4dc9c0-6085-72da-71fd-bb0443510a94 (at 10.149.8.226@o2ib313) [1485647.343700] Lustre: Skipped 63 previous similar messages [1486288.562318] Lustre: MGS: Connection restored to 7e3ab9b2-8eef-06ef-5ddc-65fabd3767a2 (at 10.151.28.97@o2ib) [1486288.562324] Lustre: Skipped 67 previous similar messages [1487498.877800] Lustre: MGS: Connection restored to f81fae66-1ca5-b47a-f754-10e422139c93 (at 10.151.28.204@o2ib) [1487498.877805] Lustre: Skipped 3 previous similar messages [1487874.088547] Lustre: MGS: Connection restored to 03cdded8-3e97-5d42-7b8e-908354f81ba6 (at 10.151.56.201@o2ib) [1487874.088552] Lustre: Skipped 7 previous similar messages [1488220.862050] Lustre: MGS: Connection restored to fa4c9591-48ef-8281-b181-4159c5c23fca (at 10.151.0.89@o2ib) [1488220.862056] Lustre: Skipped 7 previous similar messages [1488711.703519] Lustre: nbp8-MDT0000: haven't heard from client 40369657-ac02-8368-daa3-259f3ffb04b8 (at 10.151.2.160@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897f1b823000, cur 1592170784 expire 1592170634 last 1592170557 [1488711.776191] Lustre: Skipped 1 previous similar message [1488815.618401] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1488815.652172] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 7 previous similar messages [1488815.685954] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.2.160@o2ib (331): c: 30, oc: 0, rc: 32 [1488815.726876] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 7 previous similar messages [1488891.426699] Lustre: MGS: Connection restored to 5fa7ded7-fbf6-4cfa-b7f1-2b8dfd1d8c4f (at 10.141.6.8@o2ib417) [1488891.426705] Lustre: Skipped 45 previous similar messages [1489546.276246] Lustre: MGS: Connection restored to 7ea6a74d-99cf-b378-d191-d47876ff0b02 (at 10.151.14.134@o2ib) [1489546.276252] Lustre: Skipped 143 previous similar messages [1490373.264661] Lustre: MGS: Connection restored to 7e3ab9b2-8eef-06ef-5ddc-65fabd3767a2 (at 10.151.28.97@o2ib) [1490373.264667] Lustre: Skipped 21 previous similar messages [1491489.585335] Lustre: MGS: Connection restored to 064721c5-bfe1-78cb-6715-f8b67995f990 (at 10.149.9.35@o2ib313) [1491489.585339] Lustre: Skipped 63 previous similar messages [1493036.742519] Lustre: MGS: Connection restored to f81fae66-1ca5-b47a-f754-10e422139c93 (at 10.151.28.204@o2ib) [1493036.742525] Lustre: Skipped 59 previous similar messages [1493155.236857] Lustre: MGS: Connection restored to aadae798-ade3-e5d6-a30a-e14fec678004 (at 10.151.16.159@o2ib) [1493155.236863] Lustre: Skipped 59 previous similar messages [1493377.240980] Lustre: MGS: Connection restored to be486df2-4e1a-f133-377c-eeb725d85e49 (at 10.149.9.95@o2ib313) [1493377.240986] Lustre: Skipped 7 previous similar messages [1493873.282394] Lustre: MGS: Connection restored to d8d0f282-a64f-744d-7d98-1704f6dd1822 (at 10.149.14.0@o2ib313) [1493873.282400] Lustre: Skipped 259 previous similar messages [1494590.542612] Lustre: MGS: Connection restored to dd513b5e-9132-b2ee-0f21-cf3def5d482e (at 10.151.30.138@o2ib) [1494590.542618] Lustre: Skipped 63 previous similar messages [1495208.967653] Lustre: MGS: Connection restored to 2e16af16-0843-8bbe-438a-ff7f52782dee (at 10.149.14.63@o2ib313) [1495208.967659] Lustre: Skipped 469 previous similar messages [1495834.166939] Lustre: MGS: Connection restored to f221e0a1-a356-089d-cf2b-f5f607b82fc6 (at 10.151.37.79@o2ib) [1495834.166945] Lustre: Skipped 373 previous similar messages [1496455.393346] Lustre: MGS: Connection restored to 2e37f244-c094-d9ab-aa95-8e7446f81d3b (at 10.151.43.44@o2ib) [1496455.393351] Lustre: Skipped 459 previous similar messages [1497065.311152] Lustre: MGS: Connection restored to 441cab8f-3eec-93f3-5546-d0f01d26b0ee (at 10.149.14.54@o2ib313) [1497065.311158] Lustre: Skipped 85 previous similar messages [1498057.662476] Lustre: MGS: Connection restored to 7748a140-e11d-e413-fe3b-752abcdd4c98 (at 10.149.9.188@o2ib313) [1498057.662482] Lustre: Skipped 207 previous similar messages [1498674.499457] Lustre: MGS: Connection restored to 781aa790-f031-de61-e376-f489a75d8eed (at 10.151.54.118@o2ib) [1498674.499463] Lustre: Skipped 139 previous similar messages [1499382.187538] Lustre: MGS: Connection restored to e8f4e8a8-72a7-0b09-4f97-fe721a271298 (at 10.151.7.84@o2ib) [1499382.187545] Lustre: Skipped 1 previous similar message [1500054.580036] Lustre: MGS: Connection restored to b763dc73-9f1d-3bcb-e9fb-a9c77f13e8e8 (at 10.151.6.28@o2ib) [1500054.580041] Lustre: Skipped 139 previous similar messages [1500732.305943] Lustre: MGS: Connection restored to f01c08e4-6290-a7d0-26aa-53281465d869 (at 10.151.18.102@o2ib) [1500732.305948] Lustre: Skipped 3 previous similar messages [1501466.172889] Lustre: MGS: haven't heard from client e68a1c43-9458-8872-281a-106604a11710 (at 10.151.12.126@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8979c1bf4400, cur 1592183538 expire 1592183388 last 1592183311 [1501466.243273] Lustre: Skipped 1 previous similar message [1501543.084735] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [1501543.118508] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.12.126@o2ib (303): c: 30, oc: 0, rc: 32 [1501567.085644] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1501567.119431] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.13.20@o2ib (320): c: 30, oc: 0, rc: 32 [1501572.085750] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1501572.119532] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.0.226@o2ib (325): c: 30, oc: 0, rc: 32 [1501578.086065] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1501578.119849] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [1501578.153340] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.14.190@o2ib (331): c: 30, oc: 0, rc: 32 [1501578.194549] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [1501595.086595] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1501595.120379] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 2 previous similar messages [1501595.154152] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.14.122@o2ib (347): c: 30, oc: 0, rc: 32 [1501595.195360] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 2 previous similar messages [1501632.922875] Lustre: MGS: Connection restored to 53367e41-bfd6-4279-a9ef-fe9adfda0155 (at 10.151.34.26@o2ib) [1501632.922881] Lustre: Skipped 55 previous similar messages [1502290.202457] Lustre: MGS: haven't heard from client aaba5bdd-53f7-4e2e-ce94-86581913dd53 (at 10.151.4.50@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897c7bf12800, cur 1592184362 expire 1592184212 last 1592184135 [1502290.272282] Lustre: Skipped 15 previous similar messages [1502330.022153] Lustre: MGS: Connection restored to 3462f68f-aa12-670a-393c-d4c8eb50bb61 (at 10.151.4.44@o2ib) [1502330.022158] Lustre: Skipped 369 previous similar messages [1502375.115128] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1502375.148915] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.4.102@o2ib (304): c: 30, oc: 0, rc: 32 [1502378.115181] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1502378.148975] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.13.126@o2ib (308): c: 30, oc: 0, rc: 32 [1502400.116020] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1502400.149805] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 3 previous similar messages [1502400.183585] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.4.50@o2ib (337): c: 30, oc: 0, rc: 32 [1502400.224214] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 3 previous similar messages [1503131.670547] Lustre: MGS: Connection restored to 3ac9dfc9-9a28-618c-343b-daed568b1979 (at 10.151.7.55@o2ib) [1503131.670552] Lustre: Skipped 173 previous similar messages [1503414.154121] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [1503414.187914] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 2 previous similar messages [1503414.221688] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.54.187@o2ib (304): c: 32, oc: 0, rc: 32 [1503414.262893] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 2 previous similar messages [1503791.537934] Lustre: MGS: Connection restored to b763dc73-9f1d-3bcb-e9fb-a9c77f13e8e8 (at 10.151.6.28@o2ib) [1503791.537940] Lustre: Skipped 217 previous similar messages [1505335.231879] Lustre: MGS: Connection restored to b27d8a93-f4e3-f8b3-2e13-74734d6fe5b7 (at 10.149.9.110@o2ib313) [1505335.231884] Lustre: Skipped 17 previous similar messages [1505493.778586] Lustre: MGS: Connection restored to c2bf2336-1e0a-a6ae-897e-af927c56651b (at 10.151.3.34@o2ib) [1505493.778592] Lustre: Skipped 59 previous similar messages [1505659.544474] Lustre: MGS: Connection restored to 0b233af1-c7c4-0039-c1ba-50615e3d4e23 (at 10.151.36.60@o2ib) [1505659.544480] Lustre: Skipped 63 previous similar messages [1505977.185964] Lustre: MGS: Connection restored to fd12613c-a05f-d922-8cce-4cc04aefc3bb (at 10.149.15.33@o2ib313) [1505977.185970] Lustre: Skipped 13 previous similar messages [1506741.092416] Lustre: MGS: Connection restored to 7e3ab9b2-8eef-06ef-5ddc-65fabd3767a2 (at 10.151.28.97@o2ib) [1506741.092422] Lustre: Skipped 173 previous similar messages [1507425.337436] Lustre: MGS: Connection restored to b047216f-c234-9a0c-d447-d42039571595 (at 10.149.11.198@o2ib313) [1507425.337442] Lustre: Skipped 283 previous similar messages [1508182.451439] Lustre: MGS: Connection restored to ce22c860-136b-9a63-13c1-1aaa124219f4 (at 10.151.24.34@o2ib) [1508182.451445] Lustre: Skipped 341 previous similar messages [1509183.132947] Lustre: MGS: Connection restored to 3076a2aa-cf10-565b-3ae2-362781878ac7 (at 10.151.4.56@o2ib) [1509183.132953] Lustre: Skipped 117 previous similar messages [1509810.387577] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [1509810.421362] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.54.51@o2ib (303): c: 32, oc: 0, rc: 32 [1509864.282650] Lustre: MGS: Connection restored to ccd92ed3-3388-72e0-013e-81bfad043fb1 (at 10.151.38.155@o2ib) [1509864.282656] Lustre: Skipped 11 previous similar messages [1510557.600816] Lustre: MGS: Connection restored to 03636d43-3fa8-877a-89bd-3494956943fc (at 10.149.11.161@o2ib313) [1510557.600821] Lustre: Skipped 9 previous similar messages [1511175.695596] Lustre: MGS: Connection restored to c0117fa7-c949-89c7-c6c1-ae2476332177 (at 10.151.3.53@o2ib) [1511175.695603] Lustre: Skipped 63 previous similar messages [1511652.455279] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1511652.489065] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.54.98@o2ib (303): c: 32, oc: 0, rc: 32 [1512640.406419] Lustre: MGS: Connection restored to 8b4fa7a0-dec8-022a-9d92-9d9dc0dbf539 (at 10.151.3.170@o2ib) [1512640.406425] Lustre: Skipped 111 previous similar messages [1513306.886566] Lustre: MGS: Connection restored to 64c02b5f-47ad-c743-545c-6512f5b12ae1 (at 10.151.37.105@o2ib) [1513306.886572] Lustre: Skipped 173 previous similar messages [1513323.516754] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1513323.550542] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.57.147@o2ib (299): c: 32, oc: 0, rc: 32 [1513603.158336] Lustre: MGS: Connection restored to a308e4a6-2233-d1b7-e418-87155740af67 (at 10.151.5.73@o2ib) [1513603.158341] Lustre: Skipped 89 previous similar messages [1513973.253684] Lustre: MGS: Connection restored to ad21b351-e495-fb61-ce7c-9d00a9527292 (at 10.151.3.36@o2ib) [1513973.253694] Lustre: Skipped 67 previous similar messages [1514577.478153] Lustre: MGS: Connection restored to 4be696b1-1892-f9f5-e6ef-ca4d0bf9b4f0 (at 10.149.5.9@o2ib313) [1514577.478158] Lustre: Skipped 75 previous similar messages [1515182.825648] Lustre: MGS: Connection restored to 7748a140-e11d-e413-fe3b-752abcdd4c98 (at 10.149.9.188@o2ib313) [1515182.825654] Lustre: Skipped 125 previous similar messages [1516206.623150] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1516206.656937] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.8.76@o2ib (303): c: 32, oc: 0, rc: 32 [1516962.477328] Lustre: MGS: Connection restored to 6873ce44-79d7-2b1a-0051-4f3849c71206 (at 10.151.30.90@o2ib) [1516962.477334] Lustre: Skipped 231 previous similar messages [1517084.114866] Lustre: MGS: Connection restored to dc56b0bf-7342-99e3-256a-18feb35a252d (at 10.149.15.46@o2ib313) [1517084.114872] Lustre: Skipped 13 previous similar messages [1518456.810994] Lustre: MGS: Connection restored to ed066bd9-0bab-4267-6106-18de508c016d (at 10.149.9.34@o2ib313) [1518456.811000] Lustre: Skipped 155 previous similar messages [1518599.596325] Lustre: MGS: Connection restored to 441cab8f-3eec-93f3-5546-d0f01d26b0ee (at 10.149.14.54@o2ib313) [1518599.596331] Lustre: Skipped 59 previous similar messages [1518838.089834] Lustre: MGS: Connection restored to b27d8a93-f4e3-f8b3-2e13-74734d6fe5b7 (at 10.149.9.110@o2ib313) [1518838.089835] Lustre: MGS: Connection restored to a3fd8e6e-277e-a291-1a88-d4e46de5d08c (at 10.149.9.91@o2ib313) [1518838.089838] Lustre: Skipped 119 previous similar messages [1518946.352200] Lustre: MGS: Connection restored to 7e3ab9b2-8eef-06ef-5ddc-65fabd3767a2 (at 10.151.28.97@o2ib) [1518946.352206] Lustre: Skipped 58 previous similar messages [1519665.999087] Lustre: MGS: Connection restored to 036e3026-6198-f8ad-1f59-2b8a40c3dd07 (at 10.149.10.47@o2ib313) [1519665.999093] Lustre: Skipped 1 previous similar message [1520280.408354] Lustre: MGS: Connection restored to 4cd0a9ff-be49-c12b-a7c3-3d93f7b96167 (at 10.151.32.194@o2ib) [1520280.408359] Lustre: Skipped 119 previous similar messages [1522095.969664] Lustre: MGS: Connection restored to 75102e82-b863-6587-32ed-df83142cc4d1 (at 10.151.37.130@o2ib) [1522095.969671] Lustre: Skipped 441 previous similar messages [1522440.413587] Lustre: MGS: Connection restored to f9c716a6-e44d-03f6-881c-64aa6d2c389e (at 10.149.3.147@o2ib313) [1522440.413593] Lustre: Skipped 77 previous similar messages [1522787.092123] Lustre: MGS: Connection restored to 7896e46a-3d03-e2ed-8888-5aadd7e25450 (at 10.151.37.72@o2ib) [1522787.092129] Lustre: Skipped 1 previous similar message [1523770.724259] Process accounting resumed [1524037.104091] Lustre: MGS: Connection restored to 988ecea8-9a13-45ae-4660-edd811a32751 (at 10.149.4.87@o2ib313) [1524037.104097] Lustre: Skipped 171 previous similar messages [1524161.561402] Lustre: MGS: Connection restored to 1e19f508-0d63-83d9-3450-944aa555d517 (at 10.151.37.110@o2ib) [1524161.561408] Lustre: Skipped 19 previous similar messages [1524353.670210] Lustre: MGS: Connection restored to b5dc840c-6eba-7061-b36a-c335cc05e780 (at 10.151.29.165@o2ib) [1524353.670215] Lustre: Skipped 9 previous similar messages [1524593.090730] Lustre: MGS: Connection restored to 7bb92d4d-44b4-c413-71f9-8d95ece41e2f (at 10.149.9.123@o2ib313) [1524593.090736] Lustre: Skipped 83 previous similar messages [1524948.615472] Lustre: MGS: Connection restored to 832d6ff6-dd33-cbce-23c6-cd6b65e0b652 (at 10.151.7.106@o2ib) [1524948.615478] Lustre: Skipped 59 previous similar messages [1527053.213905] Lustre: MGS: Connection restored to 7e3ab9b2-8eef-06ef-5ddc-65fabd3767a2 (at 10.151.28.97@o2ib) [1527053.213911] Lustre: Skipped 139 previous similar messages [1527623.134568] Lustre: nbp8-MDT0000: haven't heard from client c6e2285c-2f40-e6b5-b38d-54eef11acb61 (at 10.149.14.55@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a301afbc00, cur 1592209694 expire 1592209544 last 1592209467 [1527623.208104] Lustre: Skipped 15 previous similar messages [1527706.709218] Lustre: MGS: Connection restored to 7aaee50c-779e-a1d7-b0ed-07a1e8a62337 (at 10.153.10.93@o2ib233) [1527706.709223] Lustre: Skipped 1 previous similar message [1527717.046902] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1527717.080688] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.42.150@o2ib (303): c: 32, oc: 0, rc: 32 [1530003.454322] Lustre: MGS: Connection restored to c7ea8796-7410-08ba-0d24-56934015a45a (at 10.149.1.241@o2ib313) [1530003.454327] Lustre: Skipped 77 previous similar messages [1530114.903596] Lustre: MGS: Connection restored to 1d68534d-8888-4523-d902-8bdb133549f0 (at 10.149.12.109@o2ib313) [1530114.903602] Lustre: Skipped 387 previous similar messages [1530460.565330] Lustre: MGS: Connection restored to e175d4e3-1e46-a1b5-aaee-e3cc5a96d23e (at 10.149.15.147@o2ib313) [1530460.565335] Lustre: Skipped 59 previous similar messages [1530555.782228] Lustre: MGS: Connection restored to e78be4b8-9f98-c078-263c-bc3ff6487a37 (at 10.149.1.200@o2ib313) [1530555.782233] Lustre: Skipped 61 previous similar messages [1530969.440494] Lustre: MGS: Connection restored to f77afdde-d8bf-9973-0d69-5a7d8593892f (at 10.141.6.131@o2ib417) [1530969.440499] Lustre: Skipped 37 previous similar messages [1531308.689971] Lustre: MGS: Connection restored to aeb269d8-ec7e-5b81-5100-33256f0743fb (at 10.149.13.55@o2ib313) [1531308.689976] Lustre: Skipped 3 previous similar messages [1532036.187762] Lustre: MGS: Connection restored to efdf875b-22b9-3a73-6479-fc3e22246a85 (at 10.149.11.193@o2ib313) [1532036.187767] Lustre: Skipped 723 previous similar messages [1532887.359772] Lustre: MGS: Connection restored to dc5f2de5-a847-ccfe-ecd2-de848465febc (at 10.149.12.170@o2ib313) [1532887.359777] Lustre: Skipped 249 previous similar messages [1534928.981531] Lustre: MGS: Connection restored to f402ec9d-56a5-d5b7-419d-9c98f247b250 (at 10.151.32.31@o2ib) [1534928.981537] Lustre: Skipped 461 previous similar messages [1535140.721489] Lustre: MGS: Connection restored to 7e3ab9b2-8eef-06ef-5ddc-65fabd3767a2 (at 10.151.28.97@o2ib) [1535140.721495] Lustre: Skipped 399 previous similar messages [1535530.772364] Lustre: MGS: Connection restored to c3e41625-7382-ced5-cd88-3ba3c8b9a5f1 (at 10.151.32.22@o2ib) [1535530.772370] Lustre: Skipped 1 previous similar message [1536841.184173] Lustre: MGS: Connection restored to 13933e69-5f9b-7b7a-2693-fc1bbc1569cb (at 10.151.32.10@o2ib) [1536841.184179] Lustre: Skipped 51 previous similar messages [1537034.974522] Lustre: MGS: Connection restored to dc09575e-42ee-7c10-032c-042c4c3d80c9 (at 10.149.2.32@o2ib313) [1537034.974527] Lustre: Skipped 197 previous similar messages [1538290.726426] Lustre: MGS: Connection restored to cb6163c2-0455-cca3-3f84-26cf8bc24846 (at 10.151.4.30@o2ib) [1538290.726432] Lustre: Skipped 1 previous similar message [1538405.201570] Lustre: MGS: Connection restored to 31b7378c-d105-3385-26a6-e4ded45fff6f (at 10.151.18.42@o2ib) [1538405.201576] Lustre: Skipped 63 previous similar messages [1538464.037863] Lustre: MGS: Connection restored to 15753a52-7258-d72e-05e1-5bd43bff4107 (at 10.151.3.99@o2ib) [1538464.037869] Lustre: Skipped 55 previous similar messages [1538521.008042] Lustre: MGS: Connection restored to 99ec4e5c-c1b0-bd5d-98c3-d51a164d3ad0 (at 10.151.29.17@o2ib) [1538521.008048] Lustre: Skipped 39 previous similar messages [1539205.639505] Lustre: MGS: Connection restored to 7e3ab9b2-8eef-06ef-5ddc-65fabd3767a2 (at 10.151.28.97@o2ib) [1539205.639510] Lustre: Skipped 85 previous similar messages [1539389.229641] Lustre: MGS: Connection restored to f667ac5c-614a-4f58-df05-d1fa1cfb0135 (at 10.149.9.31@o2ib313) [1539389.229647] Lustre: Skipped 1 previous similar message [1539721.432561] Lustre: MGS: Connection restored to f0eb3ae7-965f-b8e6-c0ba-ba2a2d0815de (at 10.151.37.48@o2ib) [1539721.432570] Lustre: Skipped 141 previous similar messages [1540478.014907] Lustre: MGS: Connection restored to be486df2-4e1a-f133-377c-eeb725d85e49 (at 10.149.9.95@o2ib313) [1540478.014912] Lustre: Skipped 67 previous similar messages [1541241.076749] Lustre: MGS: Connection restored to 18ac275c-4cea-c249-5ddf-ed3ca2bed372 (at 10.141.2.0@o2ib417) [1541241.076755] Lustre: Skipped 313 previous similar messages [1541861.141869] Lustre: MGS: Connection restored to ceffa0dd-795b-f19d-daa6-64a3156745c2 (at 10.151.32.225@o2ib) [1541861.141875] Lustre: Skipped 1 previous similar message [1542493.955967] Lustre: MGS: Connection restored to 15764b8d-c6e1-29aa-a345-f146f7f41c77 (at 10.151.4.77@o2ib) [1542493.955973] Lustre: Skipped 67 previous similar messages [1543244.832194] Lustre: MGS: Connection restored to 7e3ab9b2-8eef-06ef-5ddc-65fabd3767a2 (at 10.151.28.97@o2ib) [1543244.832200] Lustre: Skipped 169 previous similar messages [1544064.037790] Lustre: MGS: Connection restored to 3a9909f1-a2c8-ecbc-cd30-79262dcd0a4c (at 10.149.8.223@o2ib313) [1544064.037796] Lustre: Skipped 661 previous similar messages [1544875.263975] Lustre: MGS: Connection restored to 1dac9938-f43d-13c7-d82a-9e4aa012f1db (at 10.151.6.163@o2ib) [1544875.263980] Lustre: Skipped 59 previous similar messages [1545625.755604] Lustre: MGS: Connection restored to 06ca9284-da8c-feaf-8520-b650ed3d379a (at 10.151.8.35@o2ib) [1545625.755610] Lustre: Skipped 63 previous similar messages [1546396.529259] Lustre: MGS: Connection restored to b3adebfe-1118-b6d9-1f36-d342231f2de8 (at 10.151.19.69@o2ib) [1546396.529265] Lustre: Skipped 207 previous similar messages [1547288.326512] Lustre: MGS: Connection restored to 7e3ab9b2-8eef-06ef-5ddc-65fabd3767a2 (at 10.151.28.97@o2ib) [1547288.326517] Lustre: Skipped 69 previous similar messages [1547914.818252] Lustre: MGS: Connection restored to f5e65fe3-96da-c92b-6d6e-7008e6ff8da4 (at 10.151.32.176@o2ib) [1547914.818258] Lustre: Skipped 9 previous similar messages [1548669.724425] Lustre: MGS: Connection restored to 16a1b482-ba85-390d-6ccc-64e209999ded (at 10.151.2.103@o2ib) [1548669.724431] Lustre: Skipped 289 previous similar messages [1548723.817194] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1548723.850989] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.54.104@o2ib (237): c: 32, oc: 0, rc: 32 [1549595.136124] Lustre: MGS: Connection restored to 1e83361b-f6d7-7003-8e21-318a3de58729 (at 10.151.39.118@o2ib) [1549595.136129] Lustre: Skipped 179 previous similar messages [1550621.887242] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1550621.921014] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.37.72@o2ib (227): c: 32, oc: 0, rc: 32 [1550636.323495] Lustre: MGS: Connection restored to 35b1da24-665e-1ec9-6ea0-0adec3fb7384 (at 10.151.17.215@o2ib) [1550636.323501] Lustre: Skipped 1 previous similar message [1551253.409687] Lustre: MGS: Connection restored to c10fe971-ebdc-13ba-976a-7c2baebb361e (at 10.151.54.16@o2ib) [1551253.409693] Lustre: Skipped 389 previous similar messages [1552141.548965] Lustre: MGS: Connection restored to fe10dc02-8ee5-df33-6cc2-9f3eab2dd3c1 (at 10.141.5.144@o2ib417) [1552141.548970] Lustre: Skipped 107 previous similar messages [1553059.495191] Lustre: MGS: Connection restored to d8522f51-9eb8-3e86-a1b0-60ed40952839 (at 10.151.3.179@o2ib) [1553059.495197] Lustre: Skipped 475 previous similar messages [1553687.803809] Lustre: MGS: Connection restored to 1952364d-5b38-31b7-5dc8-017ea19a6858 (at 10.151.18.32@o2ib) [1553687.803815] Lustre: Skipped 291 previous similar messages [1554291.671368] Lustre: MGS: Connection restored to 422d2855-a56f-3024-a8b9-9ab9b814c34e (at 10.151.38.209@o2ib) [1554291.671374] Lustre: Skipped 15 previous similar messages [1555025.680544] Lustre: MGS: Connection restored to d34a7f1e-aabe-16b7-559c-465c7c0a38c6 (at 10.151.28.208@o2ib) [1555025.680550] Lustre: Skipped 55 previous similar messages [1555632.423168] Lustre: MGS: Connection restored to e8ab54c1-77d5-233b-b330-01baab3c43ba (at 10.151.39.246@o2ib) [1555632.423174] Lustre: Skipped 237 previous similar messages [1556236.085430] Lustre: MGS: Connection restored to d08c702f-c082-f9e9-2c83-33f507a4b575 (at 10.151.28.202@o2ib) [1556236.085435] Lustre: Skipped 427 previous similar messages [1556852.766554] Lustre: MGS: Connection restored to 21c9d382-d5ec-b61d-cc4c-3801092b0a15 (at 10.151.15.233@o2ib) [1556852.766560] Lustre: Skipped 51 previous similar messages [1557458.549892] Lustre: MGS: Connection restored to 537a72bc-1d5b-a000-ba93-037dfddb2fa4 (at 10.149.9.194@o2ib313) [1557458.549898] Lustre: Skipped 71 previous similar messages [1558064.886737] Lustre: MGS: Connection restored to 64446787-7070-8f19-01f8-e55d2a4ea0b2 (at 10.151.4.73@o2ib) [1558064.886743] Lustre: Skipped 165 previous similar messages [1558838.226501] Lustre: MGS: Connection restored to 7dbd2182-4910-d3c7-e06b-d9a387247c6f (at 10.151.34.151@o2ib) [1558838.226506] Lustre: Skipped 217 previous similar messages [1559038.289878] Lustre: MGS: haven't heard from client 216c4534-46f0-8370-d015-98ec9779694c (at 10.149.4.49@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8974b8ed1c00, cur 1592241108 expire 1592240958 last 1592240881 [1559038.360556] Lustre: Skipped 1 previous similar message [1559065.291340] Lustre: nbp8-MDT0000: haven't heard from client ac924748-9efd-bc02-37e6-c3443657137f (at 10.149.4.85@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899de2fd7800, cur 1592241135 expire 1592240985 last 1592240908 [1559065.364617] Lustre: Skipped 6 previous similar messages [1559256.297606] Lustre: MGS: haven't heard from client 8f87432b-ddd6-fd8a-c03d-9b7f23f664fe (at 10.151.34.35@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897cfb210400, cur 1592241326 expire 1592241176 last 1592241099 [1559256.367739] Lustre: Skipped 6 previous similar messages [1559263.295513] Lustre: nbp8-MDT0000: haven't heard from client c9d87f70-11dd-d816-055b-48b0b6eb172a (at 10.151.35.124@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897d4b058000, cur 1592241333 expire 1592241183 last 1592241106 [1559263.368476] Lustre: Skipped 6 previous similar messages [1559356.209447] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1559356.243233] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.34.26@o2ib (309): c: 31, oc: 0, rc: 32 [1559360.209668] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [1559360.243457] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.34.35@o2ib (313): c: 31, oc: 0, rc: 32 [1559363.209767] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1559363.243561] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.34.41@o2ib (316): c: 31, oc: 0, rc: 32 [1559376.210241] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1559376.244028] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.36.60@o2ib (339): c: 30, oc: 0, rc: 32 [1559381.210468] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1559381.244260] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [1559381.277754] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.35.124@o2ib (345): c: 30, oc: 0, rc: 32 [1559381.318960] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [1559446.099152] Lustre: MGS: Connection restored to ee2c56e3-fe76-3a1b-740d-142d9b6bc6a0 (at 10.149.16.1@o2ib313) [1559446.099158] Lustre: Skipped 59 previous similar messages [1559787.225381] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1559787.259173] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [1559787.292661] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.3.38@o2ib (303): c: 32, oc: 0, rc: 32 [1559787.333298] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [1559815.226333] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1559815.260118] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.4.39@o2ib (275): c: 32, oc: 0, rc: 32 [1560147.165785] Lustre: MGS: Connection restored to 1e2c7e28-81ec-165e-dbaf-8e6599e43b01 (at 10.151.3.38@o2ib) [1560147.165791] Lustre: Skipped 89 previous similar messages [1560804.452310] Lustre: MGS: Connection restored to 87c3f72b-9357-8e30-e468-60af5a01781c (at 10.151.28.53@o2ib) [1560804.452316] Lustre: Skipped 13 previous similar messages [1560874.265267] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1560874.299068] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.1.97@o2ib (208): c: 32, oc: 0, rc: 32 [1560882.266550] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [1560882.300344] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [1560882.333827] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.1.12@o2ib (304): c: 32, oc: 0, rc: 32 [1560882.374463] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [1560891.266009] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1560891.299801] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 7 previous similar messages [1560891.333575] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.1.31@o2ib (304): c: 32, oc: 0, rc: 32 [1560891.374209] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 7 previous similar messages [1560929.267288] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1560929.301083] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 5 previous similar messages [1560929.334863] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.1.106@o2ib (304): c: 32, oc: 0, rc: 32 [1560929.375785] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 5 previous similar messages [1560967.269764] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [1560967.303550] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [1560967.337049] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.1.82@o2ib (304): c: 32, oc: 0, rc: 32 [1560967.377685] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [1561098.273510] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1561098.307289] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 5 previous similar messages [1561098.341074] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.1.40@o2ib (226): c: 32, oc: 0, rc: 32 [1561098.381711] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 5 previous similar messages [1561500.170250] Lustre: MGS: Connection restored to 4f58ab6e-4d3a-2dde-2c88-cdeb6d1e0621 (at 10.151.38.202@o2ib) [1561500.170256] Lustre: Skipped 81 previous similar messages [1561784.605830] LNet: 39423:0:(o2iblnd_cb.c:2602:kiblnd_passive_connect()) Conn stale 10.151.54.23@o2ib version 12/12 incarnation 1591730171436002/1592243805495602 [1561801.389880] Lustre: nbp8-MDT0000: haven't heard from client dfd0ed18-2cb0-6f0e-2809-fd4bedb2731d (at 10.151.54.23@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8982abf17400, cur 1592243871 expire 1592243721 last 1592243644 [1561801.462565] Lustre: Skipped 6 previous similar messages [1561994.398036] Lustre: nbp8-MDT0000: haven't heard from client b49456fb-c0a9-756f-8179-7b00566790d0 (at 10.151.49.127@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8980b476ac00, cur 1592244064 expire 1592243914 last 1592243837 [1561994.471013] Lustre: Skipped 1 previous similar message [1562080.745426] LNet: 86814:0:(o2iblnd_cb.c:2602:kiblnd_passive_connect()) Conn stale 10.151.49.127@o2ib version 12/12 incarnation 1591718594944181/1592244002190039 [1562105.995879] Lustre: MGS: Connection restored to 90a5fcf4-bd67-8e87-1a32-fbc121e02511 (at 10.151.33.13@o2ib) [1562105.995884] Lustre: Skipped 159 previous similar messages [1562446.771022] LNet: 75314:0:(o2iblnd_cb.c:2602:kiblnd_passive_connect()) Conn stale 10.151.54.16@o2ib version 12/12 incarnation 1591730172614366/1592244470804665 [1562457.413508] Lustre: nbp8-MDT0000: haven't heard from client 77d18bac-54dc-808b-3e66-3df07abfd869 (at 10.151.54.16@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a3dbbde400, cur 1592244527 expire 1592244377 last 1592244300 [1562457.486183] Lustre: Skipped 1 previous similar message [1562648.422035] Lustre: MGS: haven't heard from client 8d8e921a-08c0-f015-b08b-35b788817f5f (at 10.151.34.204@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897ee499a800, cur 1592244718 expire 1592244568 last 1592244491 [1562648.492431] Lustre: Skipped 1 previous similar message [1562661.420989] Lustre: nbp8-MDT0000: haven't heard from client f26ba0b4-0249-df6e-28bb-80b26e66878d (at 10.151.34.204@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899ff1ea0000, cur 1592244731 expire 1592244581 last 1592244504 [1562724.423228] Lustre: MGS: haven't heard from client c84b01f2-e751-c831-eea4-264e5d252bd4 (at 10.151.35.242@o2ib) in 200 seconds. I think it's dead, and I am evicting it. exp ffff89831e782400, cur 1592244794 expire 1592244644 last 1592244594 [1562728.301587] Lustre: MGS: Connection restored to 5e89f64c-ef31-49ad-a872-706f27ed46b2 (at 10.151.39.115@o2ib) [1562728.301593] Lustre: Skipped 259 previous similar messages [1562778.335212] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1562778.368984] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 17 previous similar messages [1562778.403052] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.34.204@o2ib (344): c: 22, oc: 0, rc: 32 [1562778.444253] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 17 previous similar messages [1562874.338824] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1562874.372609] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.35.242@o2ib (338): c: 28, oc: 0, rc: 32 [1563356.067232] Lustre: MGS: Connection restored to 539d6222-8063-a658-ac03-9c0f5dc1cec2 (at 10.151.4.174@o2ib) [1563356.067238] Lustre: Skipped 249 previous similar messages [1564053.088628] Lustre: MGS: Connection restored to 18d5f766-ec00-235f-98d1-44ed87d1bc1c (at 10.151.8.150@o2ib) [1564053.088634] Lustre: Skipped 185 previous similar messages [1564996.517976] Lustre: MGS: Connection restored to b454b8e6-4481-f959-6899-584188191898 (at 10.151.13.187@o2ib) [1564996.517982] Lustre: Skipped 33 previous similar messages [1565641.340371] Lustre: MGS: Connection restored to 964821a5-cfd0-d790-c835-4a4ba8dc7d5a (at 10.151.24.30@o2ib) [1565641.340377] Lustre: Skipped 25 previous similar messages [1566299.600164] Lustre: MGS: Connection restored to cef050ba-84e9-b48a-16f8-dd6f08bca90c (at 10.151.38.207@o2ib) [1566299.600170] Lustre: Skipped 19 previous similar messages [1566904.906248] Lustre: MGS: Connection restored to 9d21eda3-8f0d-0cd0-a1c7-6d6779bb9357 (at 10.151.10.118@o2ib) [1566904.906254] Lustre: Skipped 233 previous similar messages [1567581.488463] Lustre: MGS: Connection restored to 7e3ab9b2-8eef-06ef-5ddc-65fabd3767a2 (at 10.151.28.97@o2ib) [1567581.488468] Lustre: Skipped 125 previous similar messages [1568400.041007] Lustre: MGS: Connection restored to 547d7fc1-cace-55d4-64c0-ee30017e5452 (at 10.151.1.67@o2ib) [1568400.041013] Lustre: Skipped 89 previous similar messages [1569006.253202] Lustre: MGS: Connection restored to 8672ebad-a4b5-f191-0806-4e92d93fc08e (at 10.141.3.95@o2ib417) [1569006.253208] Lustre: Skipped 75 previous similar messages [1569862.568750] Lustre: MGS: Connection restored to 99da455c-a4e4-6871-2e90-a20a443b1a5a (at 10.141.2.166@o2ib417) [1569862.568756] Lustre: Skipped 441 previous similar messages [1570523.421174] Lustre: MGS: Connection restored to 03cf96c9-cc95-e8d5-d7a2-860787c744dd (at 10.151.47.94@o2ib) [1570523.421180] Lustre: Skipped 83 previous similar messages [1571222.072585] Lustre: MGS: Connection restored to 8be5ceba-10cc-d606-7a15-9f15346f4d79 (at 10.149.3.0@o2ib313) [1571222.072591] Lustre: Skipped 91 previous similar messages [1571928.956468] LNet: 39423:0:(o2iblnd_cb.c:2602:kiblnd_passive_connect()) Conn stale 10.151.14.184@o2ib version 12/12 incarnation 1591790016327316/1592253992039154 [1571929.010686] Lustre: MGS: Connection restored to a3a1da54-3ea3-dd12-7cb0-6d41cd85ab71 (at 10.151.14.184@o2ib) [1571929.010691] Lustre: Skipped 447 previous similar messages [1571996.761993] Lustre: MGS: haven't heard from client 8d4ebc48-242a-b3e0-b8bc-5ae45ff94d61 (at 10.151.14.184@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89806d59f000, cur 1592254066 expire 1592253916 last 1592253839 [1571996.832382] Lustre: Skipped 1 previous similar message [1572013.762768] Lustre: nbp8-MDT0000: haven't heard from client fcb320f0-102e-eb4d-6218-82205b15f046 (at 10.151.14.184@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff898c06e97000, cur 1592254083 expire 1592253933 last 1592253856 [1572558.560820] Lustre: MGS: Connection restored to eb861dff-6a24-b365-f9f8-56c722907c88 (at 10.141.5.223@o2ib417) [1572558.560826] Lustre: Skipped 161 previous similar messages [1573175.490278] Lustre: MGS: Connection restored to 15753a52-7258-d72e-05e1-5bd43bff4107 (at 10.151.3.99@o2ib) [1573175.490284] Lustre: Skipped 291 previous similar messages [1573466.729948] LNet: 77837:0:(o2iblnd_cb.c:2602:kiblnd_passive_connect()) Conn stale 10.151.24.238@o2ib version 12/12 incarnation 1591658086919237/1592255523987346 [1573515.816562] Lustre: MGS: haven't heard from client 5c784884-e389-c747-32c5-ba60fcf5bc5c (at 10.151.24.243@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8982853cac00, cur 1592255585 expire 1592255435 last 1592255358 [1573617.732402] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [1573617.766197] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.42.140@o2ib (327): c: 30, oc: 0, rc: 32 [1573637.732100] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1573637.765895] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.24.243@o2ib (348): c: 30, oc: 0, rc: 32 [1573803.126565] Lustre: MGS: Connection restored to 102b6d6b-b10c-189d-852c-d4859c1cdc28 (at 10.151.15.93@o2ib) [1573803.126571] Lustre: Skipped 433 previous similar messages [1574454.431733] Lustre: MGS: Connection restored to a8cf0837-5cf3-076c-e658-9b42039d5f78 (at 10.151.52.212@o2ib) [1574454.431738] Lustre: Skipped 51 previous similar messages [1575086.735563] Lustre: MGS: Connection restored to 43798ae1-6b8b-0c10-445e-8c892ba08045 (at 10.149.1.177@o2ib313) [1575086.735569] Lustre: Skipped 37 previous similar messages [1575738.037056] Lustre: MGS: Connection restored to 392e65f8-428c-cc85-158b-990f8565f30d (at 10.151.5.233@o2ib) [1575738.037061] Lustre: Skipped 135 previous similar messages [1576616.422142] Lustre: MGS: Connection restored to 67e1035b-6987-c7a0-bd87-08adc3f76715 (at 10.151.46.171@o2ib) [1576616.422147] Lustre: Skipped 221 previous similar messages [1577216.948203] Lustre: MGS: Connection restored to b845d414-afd1-9ee4-341e-b208a2671f7a (at 10.151.24.170@o2ib) [1577216.948209] Lustre: Skipped 301 previous similar messages [1577404.150186] Lustre: 14119:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1592259315/real 1592259473] req@ffff89875af28480 x1667979536335040/t0(0) o104->nbp8-MDT0000@10.151.59.253@o2ib:15/16 lens 296/224 e 0 to 1 dl 1592259938 ref 1 fl Rpc:eX/0/ffffffff rc 0/-1 [1577404.244124] LustreError: 14119:0:(ldlm_lockd.c:681:ldlm_handle_ast_error()) ### client (nid 10.151.59.253@o2ib) returned error from blocking AST (req@ffff89875af28480 x1667979536335040 status -107 rc -107), evict it ns: mdt-nbp8-MDT0000_UUID lock: ffff896a3e3b9200/0xa22cee3e7f83efef lrc: 4/0,0 mode: PR/PR res: [0x3608b5622:0x14aea:0x0].0x0 bits 0x13/0x0 rrc: 31 type: IBT flags: 0x60200400000020 nid: 10.151.59.253@o2ib remote: 0xa47167351fd3140 expref: 33 pid: 8587 timeout: 1577757 lvb_type: 0 [1577404.388380] LustreError: 138-a: nbp8-MDT0000: A client on nid 10.151.59.253@o2ib was evicted due to a lock blocking callback time out: rc -107 [1577404.431062] LustreError: 5711:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 158s: evicting client at 10.151.59.253@o2ib ns: mdt-nbp8-MDT0000_UUID lock: ffff896a3e3b9200/0xa22cee3e7f83efef lrc: 3/0,0 mode: PR/PR res: [0x3608b5622:0x14aea:0x0].0x0 bits 0x13/0x0 rrc: 31 type: IBT flags: 0x60200400000020 nid: 10.151.59.253@o2ib remote: 0xa47167351fd3140 expref: 34 pid: 8587 timeout: 0 lvb_type: 0 [1577448.962121] Lustre: nbp8-MDT0000: haven't heard from client 264632c7-6caa-09b8-7c6f-2b8f923dcafc (at 10.151.1.33@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a3f6e19400, cur 1592259518 expire 1592259368 last 1592259291 [1577449.034511] Lustre: Skipped 5 previous similar messages [1577454.872891] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 8 seconds [1577454.906684] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.1.28@o2ib (231): c: 29, oc: 0, rc: 32 [1577455.873027] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 9 seconds [1577456.872056] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.1.33@o2ib (234): c: 29, oc: 0, rc: 32 [1577456.912696] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [1577457.873074] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 11 seconds [1577457.907159] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [1577459.872202] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.1.39@o2ib (236): c: 29, oc: 0, rc: 32 [1577459.912843] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 2 previous similar messages [1577462.872186] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 16 seconds [1577462.906266] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 3 previous similar messages [1577468.872499] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.1.57@o2ib (241): c: 29, oc: 0, rc: 32 [1577468.913155] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 3 previous similar messages [1577471.872519] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 25 seconds [1577471.906576] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 5 previous similar messages [1577477.872817] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.1.75@o2ib (253): c: 29, oc: 0, rc: 32 [1577477.913465] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 13 previous similar messages [1577870.059564] Lustre: MGS: Connection restored to ee99a52f-8a9c-99f8-e31d-2dc7c9b7754f (at 10.149.5.92@o2ib313) [1577870.059569] Lustre: Skipped 177 previous similar messages [1578567.126570] Lustre: MGS: Connection restored to 7b49f8f8-2761-99fc-4a17-2d78a2f2ea84 (at 10.141.2.33@o2ib417) [1578567.126576] Lustre: Skipped 57 previous similar messages [1579302.664972] Lustre: MGS: Connection restored to 769d709e-b115-c48b-f899-1827a08d5d10 (at 10.151.32.185@o2ib) [1579302.664978] Lustre: Skipped 113 previous similar messages [1579626.048196] Lustre: MGS: haven't heard from client ad3cc2ac-88ba-d08f-fb38-e42f4ecde283 (at 10.151.1.28@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89752ab2f400, cur 1592261695 expire 1592261545 last 1592261468 [1579626.118010] Lustre: Skipped 54 previous similar messages [1579726.955642] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1579726.989419] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 13 previous similar messages [1579727.023492] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.1.28@o2ib (328): c: 29, oc: 0, rc: 32 [1579727.064134] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 3 previous similar messages [1579743.956257] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1579743.990050] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [1579744.023539] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.1.62@o2ib (343): c: 29, oc: 0, rc: 32 [1579744.064165] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [1579748.956377] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1579748.990162] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 7 previous similar messages [1579749.023943] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.1.71@o2ib (349): c: 29, oc: 0, rc: 32 [1579749.064572] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 7 previous similar messages [1579992.584909] Lustre: MGS: Connection restored to 79cf7622-1186-f63a-af87-836804807519 (at 10.151.9.198@o2ib) [1579992.584914] Lustre: Skipped 23 previous similar messages [1580605.527798] Lustre: MGS: Connection restored to 79960ef8-f288-a1a5-588f-0c8c728a9d6d (at 10.149.14.114@o2ib313) [1580605.527804] Lustre: Skipped 499 previous similar messages [1581290.678014] Lustre: MGS: Connection restored to deca845a-b624-6fb5-5a20-adcbab24191e (at 10.149.14.238@o2ib313) [1581290.678020] Lustre: Skipped 385 previous similar messages [1581701.029533] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [1581701.063327] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 4 previous similar messages [1581701.097108] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.32.180@o2ib (263): c: 32, oc: 0, rc: 32 [1581701.138308] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 4 previous similar messages [1581903.137781] Lustre: MGS: Connection restored to 4bf5f763-f751-1422-b0b0-a7e933ad9c2e (at 10.151.51.135@o2ib) [1581903.137787] Lustre: Skipped 425 previous similar messages [1582552.320797] Lustre: MGS: Connection restored to 6654983a-b380-89ec-3259-588b2c3c1b42 (at 10.151.4.41@o2ib) [1582552.320803] Lustre: Skipped 103 previous similar messages [1583460.856931] Lustre: MGS: Connection restored to 92ad17a7-2682-18b1-a7c3-be173b8d5724 (at 10.149.15.16@o2ib313) [1583460.856937] Lustre: Skipped 27 previous similar messages [1584226.377163] Lustre: MGS: Connection restored to 558a87aa-677a-eda4-dd71-28ab92d98068 (at 10.151.6.166@o2ib) [1584226.377169] Lustre: Skipped 519 previous similar messages [1585353.293602] Lustre: MGS: Connection restored to 9b66254e-06ef-ed9f-fc4c-2b09462df9c3 (at 10.151.57.209@o2ib) [1585353.293608] Lustre: Skipped 239 previous similar messages [1585573.172051] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [1585573.205837] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.30.53@o2ib (298): c: 32, oc: 0, rc: 32 [1585964.725071] Lustre: MGS: Connection restored to 420d6afe-be5e-1b6a-dcb3-64926f3c1c40 (at 10.141.5.193@o2ib417) [1585964.725076] Lustre: Skipped 33 previous similar messages [1586685.236913] Lustre: MGS: Connection restored to b1465c7e-3a59-61f6-884b-b984b59cbccd (at 10.151.42.165@o2ib) [1586685.236919] Lustre: Skipped 459 previous similar messages [1587400.734041] Lustre: MGS: Connection restored to 7c9d6508-5157-532a-f793-a9a748e4116f (at 10.149.5.21@o2ib313) [1587400.734047] Lustre: Skipped 101 previous similar messages [1587601.337234] Lustre: nbp8-MDT0000: haven't heard from client 01e8b943-3eb3-201d-6515-799eab9a998e (at 10.151.11.91@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a3f12fd800, cur 1592269670 expire 1592269520 last 1592269443 [1587601.409913] Lustre: Skipped 29 previous similar messages [1587705.250920] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1587705.284692] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.11.91@o2ib (330): c: 30, oc: 0, rc: 32 [1588082.594066] Lustre: MGS: Connection restored to 33074ccd-a623-11b1-016f-3fd0b96f6dc4 (at 10.151.31.139@o2ib) [1588082.594071] Lustre: Skipped 251 previous similar messages [1588733.692568] Lustre: MGS: Connection restored to cc98ef92-042c-6558-0674-f2c04dd506db (at 10.151.50.109@o2ib) [1588733.692574] Lustre: Skipped 85 previous similar messages [1589477.610383] Lustre: MGS: Connection restored to dee77a4d-f0e9-c8f2-7ac2-00a3e6ce3b76 (at 10.151.47.173@o2ib) [1589477.610389] Lustre: Skipped 291 previous similar messages [1589783.417497] Lustre: nbp8-MDT0000: haven't heard from client 6ce2ca35-0472-1b2f-d393-405a99be049b (at 10.151.47.173@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8973d4374c00, cur 1592271852 expire 1592271702 last 1592271625 [1589783.490461] Lustre: Skipped 1 previous similar message [1589877.330886] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1589877.364681] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.48.13@o2ib (318): c: 30, oc: 0, rc: 32 [1589879.331959] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [1589879.365737] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.48.17@o2ib (321): c: 30, oc: 0, rc: 32 [1589882.331179] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1589882.364976] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 2 previous similar messages [1589882.398751] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.47.179@o2ib (325): c: 30, oc: 0, rc: 32 [1589882.439951] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 2 previous similar messages [1589929.332785] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1589929.366577] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [1589929.400064] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.48.16@o2ib (345): c: 30, oc: 0, rc: 32 [1589929.440986] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [1590033.336675] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1590033.370464] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.4.76@o2ib (292): c: 32, oc: 0, rc: 32 [1590259.950911] Lustre: MGS: Connection restored to 51291f8d-331d-8c35-d695-8893baee3e9b (at 10.151.4.76@o2ib) [1590259.950916] Lustre: Skipped 129 previous similar messages [1590935.131665] Lustre: MGS: Connection restored to 82382329-d059-a55c-c080-ceee9e0ebc70 (at 10.151.23.23@o2ib) [1590935.131671] Lustre: Skipped 77 previous similar messages [1591592.168362] Lustre: MGS: Connection restored to 3d234f71-d483-f5f5-3158-c54bb14df38b (at 10.151.32.180@o2ib) [1591592.168367] Lustre: Skipped 19 previous similar messages [1592214.292581] Lustre: MGS: Connection restored to 1a0d93e1-e5b8-1b6c-2ea0-280e93ce3ba6 (at 10.149.16.5@o2ib313) [1592214.292587] Lustre: Skipped 109 previous similar messages [1592941.840935] Lustre: MGS: Connection restored to 9621e51f-47c8-8409-4bfe-8d58f3622b0f (at 10.149.16.10@o2ib313) [1592941.840941] Lustre: Skipped 27 previous similar messages [1593542.731713] Lustre: MGS: Connection restored to 9b038494-0f48-b928-57c2-c03479061fe8 (at 10.151.4.102@o2ib) [1593542.731719] Lustre: Skipped 217 previous similar messages [1594177.457705] Lustre: MGS: Connection restored to 5d0820db-00c4-8f50-cc3c-d72a1bc55dae (at 10.141.6.136@o2ib417) [1594177.457711] Lustre: Skipped 293 previous similar messages [1594573.502172] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1594573.535958] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.34.162@o2ib (303): c: 32, oc: 0, rc: 32 [1594796.686172] Lustre: MGS: Connection restored to becc3399-0d45-8973-b3fc-2900b72ea16c (at 10.149.14.84@o2ib313) [1594796.686178] Lustre: Skipped 263 previous similar messages [1595227.616517] Lustre: nbp8-MDT0000: haven't heard from client 614edecd-fd8e-9c37-c27e-f8fdac5add6b (at 10.151.13.202@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a3ab25e800, cur 1592277296 expire 1592277146 last 1592277069 [1595227.689482] Lustre: Skipped 15 previous similar messages [1595303.618819] Lustre: nbp8-MDT0000: haven't heard from client b1c9bfe1-eb1d-763b-462d-7a7641aa0f32 (at 10.151.1.106@o2ib) in 211 seconds. I think it's dead, and I am evicting it. exp ffff89a38694a000, cur 1592277372 expire 1592277222 last 1592277161 [1595303.691505] Lustre: Skipped 1 previous similar message [1595339.531051] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [1595339.564832] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.13.202@o2ib (338): c: 30, oc: 0, rc: 32 [1595415.532853] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1595415.566641] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.1.93@o2ib (306): c: 30, oc: 0, rc: 32 [1595416.533892] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [1595416.567685] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [1595416.601181] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.1.95@o2ib (308): c: 30, oc: 0, rc: 32 [1595416.641824] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [1595419.532998] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1595419.566777] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [1595419.600271] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.1.101@o2ib (312): c: 30, oc: 0, rc: 32 [1595419.641193] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [1595424.534279] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [1595424.568070] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 9 previous similar messages [1595424.601854] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.1.112@o2ib (315): c: 30, oc: 0, rc: 32 [1595424.642783] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 9 previous similar messages [1595458.946701] Lustre: MGS: Connection restored to e4f417d7-1bd3-6aa3-f9e5-f7a6d62a24a9 (at 10.149.6.63@o2ib313) [1595458.946708] Lustre: Skipped 137 previous similar messages [1595467.534874] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1595467.568653] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.1.97@o2ib (332): c: 29, oc: 0, rc: 32 [1596060.471524] Lustre: MGS: Connection restored to 7e3ab9b2-8eef-06ef-5ddc-65fabd3767a2 (at 10.151.28.97@o2ib) [1596060.471530] Lustre: Skipped 243 previous similar messages [1596706.361445] Lustre: MGS: Connection restored to 15ed3306-ed93-d6ab-ebbe-87098322fe69 (at 10.151.47.74@o2ib) [1596706.361450] Lustre: Skipped 87 previous similar messages [1597365.719039] Lustre: MGS: Connection restored to d8522f51-9eb8-3e86-a1b0-60ed40952839 (at 10.151.3.179@o2ib) [1597365.719045] Lustre: Skipped 23 previous similar messages [1598085.589798] Lustre: MGS: Connection restored to fb4a225e-236c-3883-6d3e-c5fd3cec39c3 (at 10.151.9.10@o2ib) [1598085.589803] Lustre: Skipped 611 previous similar messages [1598832.009036] Lustre: MGS: Connection restored to 9e9aa743-8c85-0fbe-310a-942865aa987d (at 10.141.6.44@o2ib417) [1598832.009042] Lustre: Skipped 233 previous similar messages [1599517.872070] Lustre: MGS: Connection restored to d4612f6f-619e-ecbb-8531-7676043c8c65 (at 10.151.19.92@o2ib) [1599517.872076] Lustre: Skipped 19 previous similar messages [1600346.978233] Lustre: MGS: Connection restored to 8e318c0c-62a4-0f50-cc1b-b0149dc95679 (at 10.151.9.49@o2ib) [1600346.978239] Lustre: Skipped 253 previous similar messages [1600998.075852] Lustre: MGS: Connection restored to 32ac41ff-de56-b8b0-3b0b-ad6c5a6a0257 (at 10.151.53.29@o2ib) [1600998.075858] Lustre: Skipped 395 previous similar messages [1601599.780296] Lustre: MGS: Connection restored to 06fe1647-283b-7656-1c5c-f2cde7368b7a (at 10.141.5.58@o2ib417) [1601599.780301] Lustre: Skipped 45 previous similar messages [1602357.721198] Lustre: MGS: Connection restored to 7598e18b-531c-a6dd-acbf-1d8fff7adfe1 (at 10.141.5.162@o2ib417) [1602357.721204] Lustre: Skipped 223 previous similar messages [1602964.077953] Lustre: MGS: Connection restored to ccd92ed3-3388-72e0-013e-81bfad043fb1 (at 10.151.38.155@o2ib) [1602964.077959] Lustre: Skipped 107 previous similar messages [1603389.914611] Lustre: nbp8-MDT0000: haven't heard from client ad01228a-9978-712f-e72a-1759aa4cdbfd (at 10.151.44.242@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897950f7a400, cur 1592285458 expire 1592285308 last 1592285231 [1603389.987572] Lustre: Skipped 37 previous similar messages [1603465.827077] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1603465.860869] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 3 previous similar messages [1603465.894650] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.44.236@o2ib (303): c: 29, oc: 0, rc: 32 [1603465.935859] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 3 previous similar messages [1603468.827183] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1603468.860975] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 2 previous similar messages [1603468.894756] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.44.242@o2ib (306): c: 29, oc: 0, rc: 32 [1603468.935965] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 2 previous similar messages [1603474.827409] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1603474.861202] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 3 previous similar messages [1603474.894978] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.45.200@o2ib (312): c: 29, oc: 0, rc: 32 [1603474.936185] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 3 previous similar messages [1603503.828467] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1603503.862246] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 2 previous similar messages [1603503.896032] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.40.225@o2ib (340): c: 29, oc: 0, rc: 32 [1603503.937240] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 2 previous similar messages [1603679.211898] Lustre: MGS: Connection restored to a7e49a56-e2b6-31c6-f86f-4b6bff21b8e5 (at 10.151.1.103@o2ib) [1603679.211904] Lustre: Skipped 121 previous similar messages [1604146.941244] Lustre: nbp8-MDT0000: haven't heard from client 635062c2-26f5-ba01-d5a9-444d792ce017 (at 10.151.45.203@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897d94142000, cur 1592286215 expire 1592286065 last 1592285988 [1604147.014208] Lustre: Skipped 33 previous similar messages [1604161.951368] Lustre: MGS: haven't heard from client 9431efcf-026d-b66a-25ce-f291ed94416e (at 10.151.45.203@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8997b7312800, cur 1592286230 expire 1592286080 last 1592286003 [1604162.021757] Lustre: Skipped 14 previous similar messages [1604253.856043] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [1604253.889836] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 6 previous similar messages [1604253.923606] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.1.94@o2ib (311): c: 29, oc: 0, rc: 32 [1604253.964240] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 6 previous similar messages [1604255.857130] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [1604255.890909] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 3 previous similar messages [1604255.924681] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.1.98@o2ib (314): c: 29, oc: 0, rc: 32 [1604255.965317] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 3 previous similar messages [1604276.857879] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [1604276.891657] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 4 previous similar messages [1604276.925440] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.44.242@o2ib (335): c: 29, oc: 0, rc: 32 [1604276.966641] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 4 previous similar messages [1604317.916789] Lustre: MGS: Connection restored to e302294f-18ad-f68b-f18f-317eeaa1ea47 (at 10.141.6.70@o2ib417) [1604317.916795] Lustre: Skipped 369 previous similar messages [1605002.973523] Lustre: nbp8-MDT0000: haven't heard from client ea8d6a76-d279-3e52-821c-bb7c7639cfc1 (at 10.151.46.238@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8978ef2ca000, cur 1592287071 expire 1592286921 last 1592286844 [1605003.046501] Lustre: Skipped 14 previous similar messages [1605022.974470] Lustre: MGS: haven't heard from client 059e53ec-db69-15f0-7ae0-16032f35baf5 (at 10.151.46.223@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff896e66b5a800, cur 1592287091 expire 1592286941 last 1592286864 [1605023.044895] Lustre: Skipped 20 previous similar messages [1605101.887149] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1605101.920943] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 5 previous similar messages [1605101.954711] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.47.215@o2ib (305): c: 30, oc: 0, rc: 32 [1605101.995910] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 5 previous similar messages [1605103.928117] Lustre: MGS: Connection restored to 15ed3306-ed93-d6ab-ebbe-87098322fe69 (at 10.151.47.74@o2ib) [1605103.928123] Lustre: Skipped 439 previous similar messages [1605104.887379] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1605104.921159] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.47.222@o2ib (306): c: 30, oc: 0, rc: 32 [1605107.887370] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1605107.921150] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.47.227@o2ib (309): c: 29, oc: 0, rc: 32 [1605127.888105] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1605127.921891] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 5 previous similar messages [1605127.955664] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.46.220@o2ib (331): c: 30, oc: 0, rc: 32 [1605127.996864] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 5 previous similar messages [1605136.889438] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1605136.923222] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 5 previous similar messages [1605136.957002] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.46.238@o2ib (340): c: 30, oc: 0, rc: 32 [1605136.998207] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 5 previous similar messages [1605722.742991] Lustre: MGS: Connection restored to ce5edce1-a1b0-4765-ade0-b4405b473505 (at 10.151.32.248@o2ib) [1605722.743004] Lustre: Skipped 89 previous similar messages [1606144.926612] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [1606144.960399] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 6 previous similar messages [1606144.994180] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.53.159@o2ib (304): c: 32, oc: 0, rc: 32 [1606145.035382] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 6 previous similar messages [1606382.055663] Lustre: MGS: Connection restored to 05d9f6ce-f065-77d6-95a8-8b646e73a08d (at 10.151.14.198@o2ib) [1606382.055670] Lustre: Skipped 143 previous similar messages [1607025.068284] Lustre: MGS: Connection restored to 907b55f3-f455-8f9d-ecfa-1a42fa8045c5 (at 10.151.45.105@o2ib) [1607025.068290] Lustre: Skipped 1187 previous similar messages [1607854.077844] Lustre: nbp8-MDT0000: haven't heard from client 671e29be-4e8a-5e05-b2a6-955f96ff5b08 (at 10.151.11.238@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899e0f25b400, cur 1592289922 expire 1592289772 last 1592289695 [1607854.150813] Lustre: Skipped 20 previous similar messages [1607978.993404] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1607979.027188] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.11.238@o2ib (351): c: 30, oc: 0, rc: 32 [1608111.279316] Lustre: MGS: Connection restored to b21df20f-ed61-07b2-d32b-dd33cfdcf98c (at 10.151.37.119@o2ib) [1608111.279322] Lustre: Skipped 349 previous similar messages [1608123.088813] Lustre: nbp8-MDT0000: haven't heard from client 9c8a2798-51dd-19f2-d774-4b005218a564 (at 10.151.4.155@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a40852c000, cur 1592290191 expire 1592290041 last 1592289964 [1608123.161487] Lustre: Skipped 1 previous similar message [1608203.001679] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1608203.035465] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.4.155@o2ib (306): c: 30, oc: 0, rc: 32 [1608924.566886] Lustre: MGS: Connection restored to 4e0983b4-35e2-d287-673e-bc844792e3d3 (at 10.151.44.133@o2ib) [1608924.566892] Lustre: Skipped 145 previous similar messages [1609546.263847] Lustre: MGS: Connection restored to a6375365-9cda-6aad-e7e9-5dd7e323f95b (at 10.151.38.141@o2ib) [1609546.263853] Lustre: Skipped 927 previous similar messages [1609666.055746] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1609666.089535] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.52.84@o2ib (303): c: 32, oc: 0, rc: 32 [1609754.448394] Process accounting resumed [1610252.949225] Lustre: MGS: Connection restored to fc2055b9-e023-00f7-7145-7c1326fd9dfa (at 10.141.6.138@o2ib417) [1610252.949230] Lustre: Skipped 377 previous similar messages [1610894.897882] Lustre: MGS: Connection restored to 781aa790-f031-de61-e376-f489a75d8eed (at 10.151.54.118@o2ib) [1610894.897887] Lustre: Skipped 7 previous similar messages [1611580.199376] Lustre: MGS: Connection restored to 15ed3306-ed93-d6ab-ebbe-87098322fe69 (at 10.151.47.74@o2ib) [1611580.199382] Lustre: Skipped 583 previous similar messages [1612053.233957] Lustre: nbp8-MDT0000: haven't heard from client d77e8b0e-cd11-6579-a2e1-816f04c77344 (at 10.151.18.129@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a21a273800, cur 1592294121 expire 1592293971 last 1592293894 [1612053.306916] Lustre: Skipped 1 previous similar message [1612154.147707] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1612154.181485] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.18.129@o2ib (327): c: 30, oc: 0, rc: 32 [1612271.658219] Lustre: MGS: Connection restored to 7e3ab9b2-8eef-06ef-5ddc-65fabd3767a2 (at 10.151.28.97@o2ib) [1612271.658225] Lustre: Skipped 27 previous similar messages [1612962.130138] Lustre: MGS: Connection restored to 3e45a5c9-61b8-a2b5-40dc-584635afca5e (at 10.151.9.190@o2ib) [1612962.130144] Lustre: Skipped 15 previous similar messages [1613616.140851] Lustre: MGS: Connection restored to 5705739e-5730-cfb3-b04d-280b634fda4a (at 10.151.54.187@o2ib) [1613616.140856] Lustre: Skipped 479 previous similar messages [1614222.696736] Lustre: MGS: Connection restored to f644028e-39a9-e7fd-69a2-c2ca6a2f1c08 (at 10.151.12.99@o2ib) [1614222.696741] Lustre: Skipped 117 previous similar messages [1614865.219344] Lustre: MGS: Connection restored to 6b8db23c-3ce6-6b3a-98e7-6481b2b14e9a (at 10.151.0.41@o2ib) [1614865.219349] Lustre: Skipped 239 previous similar messages [1616122.180579] Lustre: MGS: Connection restored to 569ce80b-1a96-2de5-7f58-0f5ec839e1be (at 10.149.6.49@o2ib313) [1616122.180585] Lustre: Skipped 69 previous similar messages [1616282.731063] Lustre: MGS: Connection restored to c325219e-bd46-2b00-b384-c1da5a10713e (at 10.151.44.167@o2ib) [1616282.731069] Lustre: Skipped 1363 previous similar messages [1616612.707873] Lustre: MGS: Connection restored to 49f1dce2-6488-4c54-95f4-50b9154c58ea (at 10.141.2.186@o2ib417) [1616612.707879] Lustre: Skipped 5 previous similar messages [1617039.030587] Lustre: MGS: Connection restored to 3de2dd24-b5b4-3d65-a7ad-b2dab62dad78 (at 10.151.4.206@o2ib) [1617039.030593] Lustre: Skipped 123 previous similar messages [1618076.300282] Lustre: MGS: Connection restored to d792b5db-e078-5ec7-3be6-e8f4da1137d7 (at 10.149.9.83@o2ib313) [1618076.300286] Lustre: Skipped 91 previous similar messages [1618679.788655] Lustre: MGS: Connection restored to b8e2a3ab-8ef7-74b7-cf09-b205f410abc7 (at 10.151.3.32@o2ib) [1618679.788661] Lustre: Skipped 109 previous similar messages [1619300.126892] Lustre: MGS: Connection restored to 49f1dce2-6488-4c54-95f4-50b9154c58ea (at 10.141.2.186@o2ib417) [1619300.126898] Lustre: Skipped 671 previous similar messages [1620347.459148] Lustre: MGS: Connection restored to 7e3ab9b2-8eef-06ef-5ddc-65fabd3767a2 (at 10.151.28.97@o2ib) [1620347.459154] Lustre: Skipped 77 previous similar messages [1620991.377919] Lustre: MGS: Connection restored to 78b5ea1e-5834-ebef-9b29-bc670016154d (at 10.151.28.152@o2ib) [1620991.377925] Lustre: Skipped 73 previous similar messages [1622036.965166] Lustre: MGS: Connection restored to 0f342fd9-2ba8-74db-4884-bc8c482015d5 (at 10.141.4.67@o2ib417) [1622036.965172] Lustre: Skipped 7 previous similar messages [1622651.111598] Lustre: MGS: Connection restored to c92480b6-3f2b-8128-5f0a-332752380cc0 (at 10.141.2.219@o2ib417) [1622651.111604] Lustre: Skipped 1429 previous similar messages [1623263.852128] Lustre: MGS: Connection restored to a131e984-750d-4219-aa56-18179fec8e5c (at 10.151.9.21@o2ib) [1623263.852135] Lustre: Skipped 183 previous similar messages [1624039.347926] Lustre: MGS: Connection restored to 21cc3b4c-0156-3298-6f5d-1d2c4f50cdb5 (at 10.141.6.16@o2ib417) [1624039.347932] Lustre: Skipped 281 previous similar messages [1625072.678794] Lustre: MGS: Connection restored to 49f1dce2-6488-4c54-95f4-50b9154c58ea (at 10.141.2.186@o2ib417) [1625072.678800] Lustre: Skipped 203 previous similar messages [1626537.663221] Lustre: MGS: Connection restored to 971bac28-46b8-82d0-f761-4c2b3823e65f (at 10.151.7.79@o2ib) [1626537.663226] Lustre: Skipped 1381 previous similar messages [1626706.689296] Lustre: MGS: Connection restored to 33d2242f-c131-e51a-3f9f-11cf79808631 (at 10.149.15.108@o2ib313) [1626706.689302] Lustre: Skipped 1 previous similar message [1627118.600911] Lustre: MGS: Connection restored to d167b6f4-d853-5996-4493-60f44baf09f8 (at 10.151.15.225@o2ib) [1627118.600917] Lustre: Skipped 295 previous similar messages [1628218.014528] Lustre: MGS: Connection restored to 11af63d7-1f37-88bd-f2da-ce1543f6d702 (at 10.151.28.160@o2ib) [1628218.014534] Lustre: Skipped 91 previous similar messages [1628392.320881] Lustre: MGS: Connection restored to bd9ba461-200b-1442-d888-cad91bededbd (at 10.149.6.51@o2ib313) [1628392.320885] Lustre: Skipped 1 previous similar message [1628502.945950] Lustre: MGS: Connection restored to afedfcec-07e5-48eb-17e2-27a832ce4550 (at 10.151.54.166@o2ib) [1628502.945955] Lustre: Skipped 61 previous similar messages [1628726.455473] Lustre: MGS: Connection restored to 2660ea28-e1b1-add1-9043-619691f48bce (at 10.151.29.116@o2ib) [1628726.455479] Lustre: Skipped 1265 previous similar messages [1629353.242527] Lustre: MGS: Connection restored to be4f7e0e-b040-c70c-5719-6d3c12143f42 (at 10.151.29.115@o2ib) [1629353.242532] Lustre: Skipped 39 previous similar messages [1630279.448625] Lustre: MGS: Connection restored to fe8e297d-24d4-abf3-7bff-c889113b3cc2 (at 10.149.1.50@o2ib313) [1630279.448631] Lustre: Skipped 3033 previous similar messages [1631118.026624] Lustre: MGS: Connection restored to bba846d2-8c01-15da-358b-2dce1ec5744e (at 10.151.3.133@o2ib) [1631118.026630] Lustre: Skipped 149 previous similar messages [1631829.960395] Lustre: nbp8-MDT0000: haven't heard from client f3725815-4b4d-4ee5-098d-2e5efd59e5de (at 10.151.28.211@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a2a2fec000, cur 1592313897 expire 1592313747 last 1592313670 [1631830.033357] Lustre: Skipped 1 previous similar message [1631907.873862] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [1631907.907648] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.28.211@o2ib (304): c: 29, oc: 0, rc: 32 [1631908.872958] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [1631908.906728] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.28.213@o2ib (305): c: 29, oc: 0, rc: 32 [1631913.874026] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [1631913.907809] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.28.222@o2ib (310): c: 29, oc: 0, rc: 32 [1631925.874538] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1631925.908332] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.28.247@o2ib (321): c: 29, oc: 0, rc: 32 [1631932.873827] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1631932.907622] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.1.103@o2ib (326): c: 29, oc: 0, rc: 32 [1632469.874213] Lustre: MGS: Connection restored to 7e3ab9b2-8eef-06ef-5ddc-65fabd3767a2 (at 10.151.28.97@o2ib) [1632469.874218] Lustre: Skipped 453 previous similar messages [1632588.916274] Lustre: MGS: Connection restored to 330ae797-ac9d-5edb-603d-41471ac48d7b (at 10.151.37.97@o2ib) [1632588.916279] Lustre: Skipped 29 previous similar messages [1632824.592787] Lustre: MGS: Connection restored to aedf05c2-7619-1a12-a5a6-fb97e5dcc9d6 (at 10.151.36.148@o2ib) [1632824.592793] Lustre: Skipped 31 previous similar messages [1633334.634397] Lustre: MGS: Connection restored to 0a00ca04-b5ad-9608-9de3-1021988971f8 (at 10.151.16.79@o2ib) [1633334.634402] Lustre: Skipped 107 previous similar messages [1633979.374642] Lustre: MGS: Connection restored to 26bb70bf-8749-628c-370a-bbacabfe99a5 (at 10.141.5.69@o2ib417) [1633979.374648] Lustre: Skipped 329 previous similar messages [1634669.347919] Lustre: MGS: Connection restored to 21e280de-7010-c78f-dc13-72f84da8e1cf (at 10.151.57.109@o2ib) [1634669.347925] Lustre: Skipped 317 previous similar messages [1635237.086025] Lustre: nbp8-MDT0000: haven't heard from client b3e3a856-d92c-13eb-ff12-229dc479373e (at 10.151.30.253@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899fd0a21c00, cur 1592317304 expire 1592317154 last 1592317077 [1635237.158991] Lustre: Skipped 13 previous similar messages [1635257.099506] Lustre: MGS: haven't heard from client 295689af-cf73-001b-68de-c31cee19acb2 (at 10.151.30.234@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8976aca44800, cur 1592317324 expire 1592317174 last 1592317097 [1635257.169897] Lustre: Skipped 2 previous similar messages [1635347.849582] Lustre: MGS: Connection restored to b387afc7-b4df-4c24-993f-a7f3a19322a3 (at 10.151.52.10@o2ib) [1635347.849588] Lustre: Skipped 449 previous similar messages [1635357.000541] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1635357.034325] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 2 previous similar messages [1635357.068111] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.30.234@o2ib (326): c: 30, oc: 0, rc: 32 [1635357.109316] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 2 previous similar messages [1635367.000798] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1635367.034589] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.30.253@o2ib (335): c: 30, oc: 0, rc: 32 [1635375.001118] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [1635375.034922] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.31.215@o2ib (342): c: 29, oc: 0, rc: 32 [1635841.107340] Lustre: nbp8-MDT0000: haven't heard from client 5015a8e2-2b84-7faa-771f-3700f9675af0 (at 10.151.34.52@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a3ae661400, cur 1592317908 expire 1592317758 last 1592317681 [1635841.180019] Lustre: Skipped 2 previous similar messages [1635917.110293] Lustre: nbp8-MDT0000: haven't heard from client 22688d9a-b838-0ef8-c993-8e0d83f325f7 (at 10.151.33.147@o2ib) in 197 seconds. I think it's dead, and I am evicting it. exp ffff8979cfe2dc00, cur 1592317984 expire 1592317834 last 1592317787 [1635917.183257] Lustre: Skipped 5 previous similar messages [1635930.021612] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1635930.055394] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.34.52@o2ib (315): c: 30, oc: 0, rc: 32 [1635981.023609] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1635981.057398] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.34.54@o2ib (314): c: 30, oc: 0, rc: 32 [1636051.026205] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [1636051.059982] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [1636051.093469] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.33.147@o2ib (331): c: 30, oc: 0, rc: 32 [1636051.134673] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [1636178.864303] Lustre: MGS: Connection restored to 62e56b2b-7233-ea83-56f1-dacf03906d22 (at 10.151.31.175@o2ib) [1636178.864309] Lustre: Skipped 287 previous similar messages [1636792.660381] Lustre: MGS: Connection restored to 6e181442-c786-dad8-5372-f85caf7543a1 (at 10.149.14.43@o2ib313) [1636792.660387] Lustre: Skipped 649 previous similar messages [1637234.160081] Lustre: nbp8-MDT0000: haven't heard from client e0701692-d47b-158c-9324-acbff5888927 (at 10.151.29.236@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff898b2534a400, cur 1592319301 expire 1592319151 last 1592319074 [1637234.233045] Lustre: Skipped 1 previous similar message [1637351.074316] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1637351.108101] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.29.236@o2ib (343): c: 29, oc: 0, rc: 32 [1637596.229516] Lustre: MGS: Connection restored to f14d00a5-3f98-b068-349f-f8bdf79eb12c (at 10.151.53.171@o2ib) [1637596.229521] Lustre: Skipped 627 previous similar messages [1638203.223311] Lustre: MGS: Connection restored to e2a31e0a-ac7d-7187-06a0-bb52068ce202 (at 10.151.47.210@o2ib) [1638203.223317] Lustre: Skipped 159 previous similar messages [1638857.568217] Lustre: MGS: Connection restored to 653d4952-d115-89cd-994d-8f43123b47fb (at 10.151.11.105@o2ib) [1638857.568223] Lustre: Skipped 101 previous similar messages [1639467.999437] Lustre: MGS: Connection restored to babe1f9d-10fb-5489-082f-7c3cb2fb07ed (at 10.151.56.52@o2ib) [1639467.999443] Lustre: Skipped 821 previous similar messages [1640076.946170] Lustre: MGS: Connection restored to 662551ea-4a1e-fce3-e23b-278fd99012ca (at 10.151.44.160@o2ib) [1640076.946176] Lustre: Skipped 199 previous similar messages [1640800.921433] Lustre: MGS: Connection restored to 420d6afe-be5e-1b6a-dcb3-64926f3c1c40 (at 10.141.5.193@o2ib417) [1640800.921439] Lustre: Skipped 193 previous similar messages [1640989.299535] Lustre: nbp8-MDT0000: haven't heard from client c9e80984-3245-ad44-5733-d0a7372d732e (at 10.151.35.185@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a3808d8400, cur 1592323056 expire 1592322906 last 1592322829 [1640989.372496] Lustre: Skipped 1 previous similar message [1641073.211688] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1641073.245477] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.35.185@o2ib (310): c: 30, oc: 0, rc: 32 [1641458.994632] Lustre: MGS: Connection restored to 6a729034-e8b1-c021-fec0-ad6f1ab4539a (at 10.151.0.206@o2ib) [1641458.994638] Lustre: Skipped 15 previous similar messages [1641751.326710] Lustre: nbp8-MDT0000: haven't heard from client 9388a90d-8189-4600-4801-14cf5140434a (at 10.151.32.97@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a36226a000, cur 1592323818 expire 1592323668 last 1592323591 [1641751.399389] Lustre: Skipped 1 previous similar message [1641857.240517] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1641857.274313] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.32.97@o2ib (332): c: 30, oc: 0, rc: 32 [1642103.711847] Lustre: MGS: Connection restored to c6ff4a91-8ed3-6ad9-9546-136448eb8a14 (at 10.149.1.2@o2ib313) [1642103.711853] Lustre: Skipped 129 previous similar messages [1642485.353453] Lustre: nbp8-MDT0000: haven't heard from client b7a920f8-8477-10e2-6324-f6ee74b36819 (at 10.151.44.185@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff898db06a3000, cur 1592324552 expire 1592324402 last 1592324325 [1642485.426428] Lustre: Skipped 1 previous similar message [1642579.266900] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1642579.300687] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.44.185@o2ib (320): c: 29, oc: 0, rc: 32 [1642901.016465] Lustre: MGS: Connection restored to f3c1b4c0-83de-84cc-a61c-d1570bc3ace1 (at 10.151.56.249@o2ib) [1642901.016471] Lustre: Skipped 349 previous similar messages [1643294.293104] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1643294.326890] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.28.156@o2ib (296): c: 32, oc: 0, rc: 32 [1643667.432309] Lustre: MGS: Connection restored to 1b3c2708-e86e-b1b6-d429-56c0fef74a72 (at 10.151.54.129@o2ib) [1643667.432315] Lustre: Skipped 209 previous similar messages [1644276.606701] Lustre: MGS: Connection restored to b8e2a3ab-8ef7-74b7-cf09-b205f410abc7 (at 10.151.3.32@o2ib) [1644276.606707] Lustre: Skipped 97 previous similar messages [1644435.426928] Lustre: MGS: haven't heard from client b6922902-13e4-d54c-943b-da717af72ca4 (at 10.151.59.253@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8982f1ae5c00, cur 1592326502 expire 1592326352 last 1592326275 [1644435.497327] Lustre: Skipped 1 previous similar message [1644533.338490] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1644533.372284] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.59.253@o2ib (324): c: 30, oc: 0, rc: 32 [1644953.881525] Lustre: MGS: Connection restored to d5fca56f-4975-7843-b8d8-e246d9a5254c (at 10.151.15.177@o2ib) [1644953.881530] Lustre: Skipped 115 previous similar messages [1645737.017345] Lustre: MGS: Connection restored to 36640225-a121-c81f-a222-5085697f5dc8 (at 10.151.45.165@o2ib) [1645737.017350] Lustre: Skipped 113 previous similar messages [1646241.490617] Lustre: nbp8-MDT0000: haven't heard from client 36d72b2f-1210-57e8-11ca-18373f4b80f6 (at 10.151.43.205@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8974c4fc1c00, cur 1592328308 expire 1592328158 last 1592328081 [1646241.563578] Lustre: Skipped 1 previous similar message [1646339.656316] Lustre: MGS: Connection restored to 307cf496-e211-93a3-86aa-73c5df24de91 (at 10.149.1.98@o2ib313) [1646339.656322] Lustre: Skipped 3 previous similar messages [1646349.405175] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1646349.438969] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.43.205@o2ib (334): c: 30, oc: 0, rc: 32 [1647264.905531] Lustre: MGS: Connection restored to e8f4e8a8-72a7-0b09-4f97-fe721a271298 (at 10.151.7.84@o2ib) [1647264.905537] Lustre: Skipped 85 previous similar messages [1647899.427023] Lustre: MGS: Connection restored to 5c68be17-be6c-a30b-7c74-cf882edc0679 (at 10.151.14.160@o2ib) [1647899.427028] Lustre: Skipped 109 previous similar messages [1648513.257384] Lustre: MGS: Connection restored to 8319b1f7-2374-4bcc-3b5c-a90318db2648 (at 10.141.4.13@o2ib417) [1648513.257390] Lustre: Skipped 725 previous similar messages [1648949.500026] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1648949.533812] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.28.154@o2ib (291): c: 32, oc: 0, rc: 32 [1649171.591393] Lustre: MGS: Connection restored to e61fbb03-2bb6-f29a-5b6d-d8d50670e00b (at 10.151.54.69@o2ib) [1649171.591399] Lustre: Skipped 27 previous similar messages [1649704.618025] Lustre: nbp8-MDT0000: haven't heard from client 60017343-93b3-26bf-1e13-76f92fc570e9 (at 10.151.45.22@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897dd6dbbc00, cur 1592331771 expire 1592331621 last 1592331544 [1649704.690707] Lustre: Skipped 1 previous similar message [1649715.628868] Lustre: MGS: haven't heard from client 827b2e8b-6a65-955e-ff4f-a3b1c28a0a6d (at 10.151.45.22@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897f9a22d800, cur 1592331782 expire 1592331632 last 1592331555 [1649796.530992] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1649796.564764] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.45.22@o2ib (307): c: 31, oc: 0, rc: 32 [1649837.774783] Lustre: MGS: Connection restored to 937b3612-ecfc-9dd2-6a41-1abbdaa061a4 (at 10.151.18.183@o2ib) [1649837.774789] Lustre: Skipped 371 previous similar messages [1649902.624970] Lustre: nbp8-MDT0000: haven't heard from client 6bb1509d-4cf9-ce8f-26e6-a386cccfdaca (at 10.151.44.71@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a059e5e000, cur 1592331969 expire 1592331819 last 1592331742 [1649996.538198] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1649996.571984] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.44.71@o2ib (320): c: 30, oc: 0, rc: 32 [1650545.074317] Lustre: MGS: Connection restored to 6c542f42-2173-16bd-c803-4dd4722115e6 (at 10.151.2.232@o2ib) [1650545.074323] Lustre: Skipped 53 previous similar messages [1651395.736536] Lustre: MGS: Connection restored to 70cf6a7c-f63e-f0c3-d60a-3067cb032b8e (at 10.149.1.206@o2ib313) [1651395.736542] Lustre: Skipped 389 previous similar messages [1651684.601005] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [1651684.634796] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.1.13@o2ib (289): c: 32, oc: 0, rc: 32 [1651701.600625] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1651701.634405] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.59.248@o2ib (286): c: 32, oc: 0, rc: 32 [1652072.756838] Lustre: MGS: Connection restored to e0cc6e20-bdcc-28dc-874b-a3ed559e77e5 (at 10.149.13.71@o2ib313) [1652072.756843] Lustre: Skipped 91 previous similar messages [1652732.900805] Lustre: MGS: Connection restored to b8e2a3ab-8ef7-74b7-cf09-b205f410abc7 (at 10.151.3.32@o2ib) [1652732.900811] Lustre: Skipped 237 previous similar messages [1653369.589801] Lustre: MGS: Connection restored to e8f4e8a8-72a7-0b09-4f97-fe721a271298 (at 10.151.7.84@o2ib) [1653369.589807] Lustre: Skipped 167 previous similar messages [1654072.616193] Lustre: MGS: Connection restored to 2a14c995-bd22-3f4d-2f79-1873d738fb33 (at 10.149.2.12@o2ib313) [1654072.616198] Lustre: Skipped 427 previous similar messages [1654775.803576] Lustre: MGS: haven't heard from client a52d0a81-8741-68d8-1e0d-87c4fce8ecf1 (at 10.151.37.173@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff899017e1e400, cur 1592336842 expire 1592336692 last 1592336615 [1654775.873936] Lustre: Skipped 1 previous similar message [1654857.717308] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [1654857.751087] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.37.173@o2ib (307): c: 30, oc: 0, rc: 32 [1654944.542322] Lustre: MGS: Connection restored to 4b4d762e-5eca-07d3-7568-e2d079f1af4a (at 10.149.14.16@o2ib313) [1654944.542335] Lustre: Skipped 175 previous similar messages [1655405.736395] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1655405.770190] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.54.149@o2ib (279): c: 32, oc: 0, rc: 32 [1655470.828855] Lustre: nbp8-MDT0000: haven't heard from client 30d5bfe3-62b1-a7a0-f0b8-83d2ba513c1a (at 10.151.33.120@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8974febbec00, cur 1592337537 expire 1592337387 last 1592337310 [1655470.901818] Lustre: Skipped 1 previous similar message [1655577.658306] Lustre: MGS: Connection restored to 1ffbbb88-d90c-f314-3803-3ecc5ed9b5c6 (at 10.151.16.133@o2ib) [1655577.658311] Lustre: Skipped 81 previous similar messages [1655581.742858] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1655581.776639] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.33.120@o2ib (337): c: 30, oc: 0, rc: 32 [1656179.573248] Lustre: MGS: Connection restored to 48537540-8322-d13d-f9c1-105952acbe6e (at 10.141.4.3@o2ib417) [1656179.573254] Lustre: Skipped 127 previous similar messages [1656842.802889] Lustre: MGS: Connection restored to 73c9f535-fa25-1539-3d64-0acead3a9625 (at 10.151.17.183@o2ib) [1656842.802895] Lustre: Skipped 33 previous similar messages [1657360.808288] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1657360.842084] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.34.190@o2ib (270): c: 32, oc: 0, rc: 32 [1657366.808506] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1657366.842291] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.29.68@o2ib (303): c: 32, oc: 0, rc: 32 [1657429.810844] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1657429.844623] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.38.112@o2ib (303): c: 32, oc: 0, rc: 32 [1657590.732327] Lustre: MGS: Connection restored to 6603439a-f437-450f-a8e5-ee18c432bd24 (at 10.151.9.33@o2ib) [1657590.732332] Lustre: Skipped 193 previous similar messages [1658367.648424] Lustre: MGS: Connection restored to ff62bd13-e96d-4eb6-5ee3-e32457be4236 (at 10.151.1.203@o2ib) [1658367.648429] Lustre: Skipped 31 previous similar messages [1658944.866508] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1658944.900289] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.37.165@o2ib (303): c: 32, oc: 0, rc: 32 [1659076.849627] Lustre: MGS: Connection restored to 08472217-d939-c2c3-7d6a-db99b3620f0b (at 10.149.1.26@o2ib313) [1659076.849632] Lustre: Skipped 31 previous similar messages [1659418.973063] Lustre: nbp8-MDT0000: haven't heard from client 18898e85-4238-7d06-ab10-30c421c7c42c (at 10.151.36.93@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff898479772000, cur 1592341485 expire 1592341335 last 1592341258 [1659419.045738] Lustre: Skipped 1 previous similar message [1659494.976631] Lustre: MGS: haven't heard from client e3c1bb68-4299-2f87-73e4-a4349e241f8b (at 10.151.29.150@o2ib) in 164 seconds. I think it's dead, and I am evicting it. exp ffff897cd9d00000, cur 1592341561 expire 1592341411 last 1592341397 [1659495.047019] Lustre: Skipped 3 previous similar messages [1659519.887705] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1659519.921493] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.33.117@o2ib (320): c: 30, oc: 0, rc: 32 [1659537.888448] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1659537.922244] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.36.93@o2ib (345): c: 30, oc: 0, rc: 32 [1659565.889489] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1659565.923268] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.38.142@o2ib (303): c: 32, oc: 0, rc: 32 [1659680.893646] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1659680.927432] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.29.150@o2ib (349): c: 30, oc: 0, rc: 32 [1659697.058970] Lustre: MGS: Connection restored to c985c367-f810-dc38-b847-879e8c138fb0 (at 10.151.32.33@o2ib) [1659697.058975] Lustre: Skipped 205 previous similar messages [1660200.003018] Lustre: nbp8-MDT0000: haven't heard from client 5cb5b600-0eb5-1f7c-abc0-5899f3f4d7ff (at 10.151.54.129@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff898d37fbb800, cur 1592342266 expire 1592342116 last 1592342039 [1660200.075983] Lustre: Skipped 1 previous similar message [1660215.013107] Lustre: MGS: haven't heard from client 755bf389-5276-a9a8-4fd5-c95b880da21d (at 10.151.54.129@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a3f8ee7400, cur 1592342281 expire 1592342131 last 1592342054 [1660294.917324] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [1660294.951110] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.54.129@o2ib (306): c: 31, oc: 0, rc: 32 [1660354.377038] Lustre: MGS: Connection restored to 09397a7f-3fe2-2dc8-d25a-74d157cb2008 (at 10.151.50.72@o2ib) [1660354.377044] Lustre: Skipped 277 previous similar messages [1660474.852981] Lustre: 8579:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (30/30), not sending early reply req@ffff8979f9e16300 x1668974534442592/t1531807577764(0) o36->67852282-a3ed-5acb-a9e2-3cae43fe0406@10.151.0.201@o2ib:5/0 lens 488/3152 e 0 to 0 dl 1592342570 ref 2 fl Interpret:/0/0 rc 0/0 [1660474.952277] Lustre: 8579:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 134 previous similar messages [1660708.333617] LNet: Service thread pid 14118 was inactive for 551.86s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [1660708.389992] LNet: Skipped 3 previous similar messages [1660708.407185] Pid: 14118, comm: mdt00_115 3.10.0-1062.12.1.el7_lustre2124.x86_64 #1 SMP Tue Mar 17 13:32:19 PDT 2020 [1660708.407190] Call Trace: [1660708.407202] [] ldlm_completion_ast+0x430/0x860 [ptlrpc] [1660708.412473] [] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] [1660708.412496] [] mdt_object_local_lock+0x50b/0xb20 [mdt] [1660708.412506] [] mdt_object_lock_internal+0x70/0x360 [mdt] [1660708.412516] [] mdt_reint_object_lock+0x2c/0x60 [mdt] [1660708.412530] [] mdt_reint_striped_lock+0x8c/0x510 [mdt] [1660708.412541] [] mdt_reint_setattr+0x676/0x1290 [mdt] [1660708.412552] [] mdt_reint_rec+0x83/0x210 [mdt] [1660708.412562] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [1660708.412571] [] mdt_reint+0x67/0x140 [mdt] [1660708.412623] [] tgt_request_handle+0xada/0x1570 [ptlrpc] [1660708.412659] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [1660708.412691] [] ptlrpc_main+0xb34/0x1470 [ptlrpc] [1660708.412696] [] kthread+0xd1/0xe0 [1660708.412699] [] ret_from_fork_nospec_end+0x0/0x39 [1660708.412722] [] 0xffffffffffffffff [1660708.412725] LustreError: dumping log to /tmp/lustre-log.1592342774.14118 [1660709.178343] LNet: Service thread pid 8575 was inactive for 552.75s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [1660709.234458] Pid: 8575, comm: mdt00_043 3.10.0-1062.12.1.el7_lustre2124.x86_64 #1 SMP Tue Mar 17 13:32:19 PDT 2020 [1660709.234459] Call Trace: [1660709.234472] [] ldlm_completion_ast+0x430/0x860 [ptlrpc] [1660709.239700] [1660709.239728] [] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] [1660709.239749] [] mdt_object_local_lock+0x50b/0xb20 [mdt] [1660709.239763] [] mdt_object_lock_internal+0x70/0x360 [mdt] [1660709.239773] [] mdt_reint_object_lock+0x2c/0x60 [mdt] [1660709.239785] [] mdt_reint_striped_lock+0x8c/0x510 [mdt] [1660709.239797] [] mdt_reint_setattr+0x676/0x1290 [mdt] [1660709.239809] [] mdt_reint_rec+0x83/0x210 [mdt] [1660709.239819] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [1660709.239829] [] mdt_reint+0x67/0x140 [mdt] [1660709.239874] [] tgt_request_handle+0xada/0x1570 [ptlrpc] [1660709.239909] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [1660709.239945] [] ptlrpc_main+0xb34/0x1470 [ptlrpc] [1660709.239952] [] kthread+0xd1/0xe0 [1660709.239958] [] ret_from_fork_nospec_end+0x0/0x39 [1660709.239982] [] 0xffffffffffffffff [1660709.239988] Pid: 14083, comm: mdt00_088 3.10.0-1062.12.1.el7_lustre2124.x86_64 #1 SMP Tue Mar 17 13:32:19 PDT 2020 [1660709.239988] Call Trace: [1660709.240020] [] ldlm_completion_ast+0x430/0x860 [ptlrpc] [1660709.240050] [] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] [1660709.240060] [] mdt_object_local_lock+0x50b/0xb20 [mdt] [1660709.240071] [] mdt_object_lock_internal+0x70/0x360 [mdt] [1660709.240082] [] mdt_reint_object_lock+0x2c/0x60 [mdt] [1660709.240093] [] mdt_reint_striped_lock+0x8c/0x510 [mdt] [1660709.240104] [] mdt_reint_setattr+0x676/0x1290 [mdt] [1660709.240116] [] mdt_reint_rec+0x83/0x210 [mdt] [1660709.240127] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [1660709.240137] [] mdt_reint+0x67/0x140 [mdt] [1660709.240174] [] tgt_request_handle+0xada/0x1570 [ptlrpc] [1660709.240207] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [1660709.240240] [] ptlrpc_main+0xb34/0x1470 [ptlrpc] [1660709.240244] [] kthread+0xd1/0xe0 [1660709.240247] [] ret_from_fork_nospec_end+0x0/0x39 [1660709.240251] [] 0xffffffffffffffff [1660709.240254] Pid: 10527, comm: mdt00_069 3.10.0-1062.12.1.el7_lustre2124.x86_64 #1 SMP Tue Mar 17 13:32:19 PDT 2020 [1660709.240254] Call Trace: [1660709.240285] [] ldlm_completion_ast+0x430/0x860 [ptlrpc] [1660709.240313] [] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] [1660709.240325] [] mdt_object_local_lock+0x50b/0xb20 [mdt] [1660709.240335] [] mdt_object_lock_internal+0x70/0x360 [mdt] [1660709.240344] [] mdt_reint_object_lock+0x2c/0x60 [mdt] [1660709.240357] [] mdt_reint_striped_lock+0x8c/0x510 [mdt] [1660709.240369] [] mdt_reint_setattr+0x676/0x1290 [mdt] [1660709.240380] [] mdt_reint_rec+0x83/0x210 [mdt] [1660709.240389] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [1660709.240400] [] mdt_reint+0x67/0x140 [mdt] [1660709.240439] [] tgt_request_handle+0xada/0x1570 [ptlrpc] [1660709.240478] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [1660709.240510] [] ptlrpc_main+0xb34/0x1470 [ptlrpc] [1660709.240514] [] kthread+0xd1/0xe0 [1660709.240516] [] ret_from_fork_nospec_end+0x0/0x39 [1660709.240520] [] 0xffffffffffffffff [1660709.240523] Pid: 14084, comm: mdt00_089 3.10.0-1062.12.1.el7_lustre2124.x86_64 #1 SMP Tue Mar 17 13:32:19 PDT 2020 [1660709.240523] Call Trace: [1660709.240554] [] ldlm_completion_ast+0x430/0x860 [ptlrpc] [1660709.240585] [] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] [1660709.240598] [] mdt_object_local_lock+0x50b/0xb20 [mdt] [1660709.240610] [] mdt_object_lock_internal+0x70/0x360 [mdt] [1660709.240620] [] mdt_reint_object_lock+0x2c/0x60 [mdt] [1660709.240631] [] mdt_reint_striped_lock+0x8c/0x510 [mdt] [1660709.240647] [] mdt_reint_setattr+0x676/0x1290 [mdt] [1660709.240657] [] mdt_reint_rec+0x83/0x210 [mdt] [1660709.240666] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [1660709.240677] [] mdt_reint+0x67/0x140 [mdt] [1660709.240716] [] tgt_request_handle+0xada/0x1570 [ptlrpc] [1660709.240749] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [1660709.240781] [] ptlrpc_main+0xb34/0x1470 [ptlrpc] [1660709.240784] [] kthread+0xd1/0xe0 [1660709.240788] [] ret_from_fork_nospec_end+0x0/0x39 [1660709.240792] [] 0xffffffffffffffff [1660709.240796] LNet: Service thread pid 8568 was inactive for 552.81s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. [1660709.240798] LNet: Skipped 6 previous similar messages [1660779.310962] Lustre: nbp8-MDT0000: Client d6ebca78-1698-ff71-cc82-e5c13a10c47a (at 10.151.0.204@o2ib) reconnecting [1660779.345326] Lustre: Skipped 172 previous similar messages [1660981.423716] LustreError: 8549:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1592342222, 825s ago); not entering recovery in server code, just going back to sleep ns: mdt-nbp8-MDT0000_UUID lock: ffff897a45721d40/0xa22cee3fa4b7a84d lrc: 3/0,1 mode: --/PW res: [0x3608b904f:0x844:0x0].0x0 bits 0x2/0x0 rrc: 119 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 8549 timeout: 0 lvb_type: 0 [1660981.554214] LustreError: 8549:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 56 previous similar messages [1661097.468022] Lustre: 10524:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (30/30), not sending early reply req@ffff899b5d28bf00 x1668974534442608/t1531808847908(0) o36->67852282-a3ed-5acb-a9e2-3cae43fe0406@10.151.0.201@o2ib:628/0 lens 488/3152 e 0 to 0 dl 1592343193 ref 2 fl Interpret:/0/0 rc 0/0 [1661097.568195] Lustre: 10524:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 62 previous similar messages [1661163.733387] Lustre: MGS: Connection restored to e6e2d1f7-8309-42b3-f1d5-f8ed8080b6dd (at 10.151.18.62@o2ib) [1661163.733392] Lustre: Skipped 286 previous similar messages [1661402.238161] Lustre: nbp8-MDT0000: Client f526bc10-3412-a8ef-f1ec-57c0dec6eabe (at 10.151.0.185@o2ib) reconnecting [1661402.272535] Lustre: Skipped 8 previous similar messages [1661476.362007] LNet: Service thread pid 14109 was inactive for 696.92s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [1661476.418393] LNet: Skipped 3 previous similar messages [1661476.435635] Pid: 14109, comm: mdt00_106 3.10.0-1062.12.1.el7_lustre2124.x86_64 #1 SMP Tue Mar 17 13:32:19 PDT 2020 [1661476.435637] Call Trace: [1661476.435649] [] ldlm_completion_ast+0x430/0x860 [ptlrpc] [1661476.440911] [] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] [1661476.440928] [] mdt_object_local_lock+0x50b/0xb20 [mdt] [1661476.440938] [] mdt_object_lock_internal+0x70/0x360 [mdt] [1661476.440948] [] mdt_reint_object_lock+0x2c/0x60 [mdt] [1661476.440960] [] mdt_reint_striped_lock+0x8c/0x510 [mdt] [1661476.440971] [] mdt_reint_setattr+0x676/0x1290 [mdt] [1661476.440983] [] mdt_reint_rec+0x83/0x210 [mdt] [1661476.440993] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [1661476.441008] [] mdt_reint+0x67/0x140 [mdt] [1661476.441054] [] tgt_request_handle+0xada/0x1570 [ptlrpc] [1661476.441088] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [1661476.441120] [] ptlrpc_main+0xb34/0x1470 [ptlrpc] [1661476.441126] [] kthread+0xd1/0xe0 [1661476.441130] [] ret_from_fork_nospec_end+0x0/0x39 [1661476.441155] [] 0xffffffffffffffff [1661476.441157] LustreError: dumping log to /tmp/lustre-log.1592343542.14109 [1661477.379386] LNet: Service thread pid 8579 was inactive for 697.98s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [1661477.435494] Pid: 8579, comm: mdt00_047 3.10.0-1062.12.1.el7_lustre2124.x86_64 #1 SMP Tue Mar 17 13:32:19 PDT 2020 [1661477.435495] Call Trace: [1661477.435508] [] ldlm_completion_ast+0x430/0x860 [ptlrpc] [1661477.458805] [] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] [1661477.458826] [] mdt_object_local_lock+0x50b/0xb20 [mdt] [1661477.458836] [] mdt_object_lock_internal+0x70/0x360 [mdt] [1661477.458846] [] mdt_reint_object_lock+0x2c/0x60 [mdt] [1661477.458860] [] mdt_reint_striped_lock+0x8c/0x510 [mdt] [1661477.458871] [] mdt_reint_setattr+0x676/0x1290 [mdt] [1661477.458882] [] mdt_reint_rec+0x83/0x210 [mdt] [1661477.458892] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [1661477.458902] [] mdt_reint+0x67/0x140 [mdt] [1661477.458945] [] tgt_request_handle+0xada/0x1570 [ptlrpc] [1661477.458980] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [1661477.459012] [] ptlrpc_main+0xb34/0x1470 [ptlrpc] [1661477.459019] [] kthread+0xd1/0xe0 [1661477.459023] [] ret_from_fork_nospec_end+0x0/0x39 [1661477.459048] [] 0xffffffffffffffff [1661477.459062] Pid: 8555, comm: mdt00_035 3.10.0-1062.12.1.el7_lustre2124.x86_64 #1 SMP Tue Mar 17 13:32:19 PDT 2020 [1661477.459065] Call Trace: [1661477.459097] [] ldlm_completion_ast+0x430/0x860 [ptlrpc] [1661477.459132] [] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] [1661477.459149] [] mdt_object_local_lock+0x50b/0xb20 [mdt] [1661477.459166] [] mdt_object_lock_internal+0x70/0x360 [mdt] [1661477.459180] [] mdt_reint_object_lock+0x2c/0x60 [mdt] [1661477.459198] [] mdt_reint_striped_lock+0x8c/0x510 [mdt] [1661477.459216] [] mdt_reint_setattr+0x676/0x1290 [mdt] [1661477.459232] [] mdt_reint_rec+0x83/0x210 [mdt] [1661477.459243] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [1661477.459259] [] mdt_reint+0x67/0x140 [mdt] [1661477.459301] [] tgt_request_handle+0xada/0x1570 [ptlrpc] [1661477.459339] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [1661477.459373] [] ptlrpc_main+0xb34/0x1470 [ptlrpc] [1661477.459376] [] kthread+0xd1/0xe0 [1661477.459379] [] ret_from_fork_nospec_end+0x0/0x39 [1661477.459384] [] 0xffffffffffffffff [1661477.459391] Pid: 14121, comm: mdt00_118 3.10.0-1062.12.1.el7_lustre2124.x86_64 #1 SMP Tue Mar 17 13:32:19 PDT 2020 [1661477.459392] Call Trace: [1661477.459423] [] ldlm_completion_ast+0x430/0x860 [ptlrpc] [1661477.459451] [] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] [1661477.459463] [] mdt_object_local_lock+0x50b/0xb20 [mdt] [1661477.459473] [] mdt_object_lock_internal+0x70/0x360 [mdt] [1661477.459482] [] mdt_reint_object_lock+0x2c/0x60 [mdt] [1661477.459495] [] mdt_reint_striped_lock+0x8c/0x510 [mdt] [1661477.459507] [] mdt_reint_setattr+0x676/0x1290 [mdt] [1661477.459518] [] mdt_reint_rec+0x83/0x210 [mdt] [1661477.459527] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [1661477.459538] [] mdt_reint+0x67/0x140 [mdt] [1661477.459577] [] tgt_request_handle+0xada/0x1570 [ptlrpc] [1661477.459610] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [1661477.459642] [] ptlrpc_main+0xb34/0x1470 [ptlrpc] [1661477.459646] [] kthread+0xd1/0xe0 [1661477.459649] [] ret_from_fork_nospec_end+0x0/0x39 [1661477.459653] [] 0xffffffffffffffff [1661477.459656] Pid: 14067, comm: mdt00_075 3.10.0-1062.12.1.el7_lustre2124.x86_64 #1 SMP Tue Mar 17 13:32:19 PDT 2020 [1661477.459656] Call Trace: [1661477.459687] [] ldlm_completion_ast+0x430/0x860 [ptlrpc] [1661477.459717] [] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] [1661477.459730] [] mdt_object_local_lock+0x50b/0xb20 [mdt] [1661477.459740] [] mdt_object_lock_internal+0x70/0x360 [mdt] [1661477.459750] [] mdt_reint_object_lock+0x2c/0x60 [mdt] [1661477.459763] [] mdt_reint_striped_lock+0x8c/0x510 [mdt] [1661477.459776] [] mdt_reint_setattr+0x676/0x1290 [mdt] [1661477.459787] [] mdt_reint_rec+0x83/0x210 [mdt] [1661477.459796] [] mdt_reint_internal+0x6e3/0xaf0 [mdt] [1661477.459808] [] mdt_reint+0x67/0x140 [mdt] [1661477.459847] [] tgt_request_handle+0xada/0x1570 [ptlrpc] [1661477.459879] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [1661477.459913] [] ptlrpc_main+0xb34/0x1470 [ptlrpc] [1661477.459916] [] kthread+0xd1/0xe0 [1661477.459920] [] ret_from_fork_nospec_end+0x0/0x39 [1661477.459923] [] 0xffffffffffffffff [1661477.459929] LNet: Service thread pid 8541 was inactive for 698.06s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. [1661477.459931] LNet: Skipped 57 previous similar messages [1661478.410085] LNet: Service thread pid 8525 was inactive for 698.50s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. [1661478.410087] LNet: Skipped 20 previous similar messages [1661478.410088] LustreError: dumping log to /tmp/lustre-log.1592343544.8525 [1661604.395768] LustreError: 14121:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1592342845, 825s ago); not entering recovery in server code, just going back to sleep ns: mdt-nbp8-MDT0000_UUID lock: ffff897a04edd580/0xa22cee3fa6a6c53f lrc: 3/0,1 mode: --/PW res: [0x3608b904f:0x844:0x0].0x0 bits 0x2/0x0 rrc: 136 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 14121 timeout: 0 lvb_type: 0 [1661604.526807] LustreError: 14121:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 20 previous similar messages [1661604.905748] LustreError: 8525:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1592342845, 825s ago); not entering recovery in server code, just going back to sleep ns: mdt-nbp8-MDT0000_UUID lock: ffff897c1f2bc5c0/0xa22cee3fa6a74545 lrc: 3/0,1 mode: --/PW res: [0x3608b904f:0x844:0x0].0x0 bits 0x2/0x0 rrc: 135 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 8525 timeout: 0 lvb_type: 0 [1661605.036237] LustreError: 8525:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 13 previous similar messages [1661666.058702] Lustre: nbp8-MDT0000: haven't heard from client f58e364f-e0cd-4927-7e75-650eba6e1fc8 (at 10.151.28.112@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8983a8464c00, cur 1592343732 expire 1592343582 last 1592343505 [1661721.107079] Lustre: 14105:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (30/30), not sending early reply req@ffff898094bac800 x1668980190298800/t0(0) o101->8500cdd6-5107-c70b-2314-70bf72be3bc5@10.149.1.34@o2ib313:497/0 lens 4512/0 e 0 to 0 dl 1592343817 ref 2 fl New:/0/ffffffff rc 0/-1 [1661721.204706] Lustre: 14105:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 42 previous similar messages [1661742.060195] Lustre: nbp8-MDT0000: haven't heard from client 76730fec-e23b-1f8f-2552-fbe37bc1e3d7 (at 10.141.4.6@o2ib417) in 165 seconds. I think it's dead, and I am evicting it. exp ffff898d506a7c00, cur 1592343808 expire 1592343658 last 1592343643 [1661742.133168] Lustre: Skipped 21 previous similar messages [1661778.952358] Lustre: MGS: Connection restored to 40731e36-a628-16c5-70c1-8fefc3275247 (at 10.151.56.124@o2ib) [1661778.952364] Lustre: Skipped 149 previous similar messages [1661780.686776] Lustre: 8549:0:(service.c:2165:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (348:1276s); client may timeout. req@ffff898090922400 x1668974511336624/t1531807577464(0) o36->525530a9-ed22-eee4-952f-a907c2e9759f@10.151.0.59@o2ib:5/0 lens 488/424 e 0 to 0 dl 1592342570 ref 1 fl Complete:/0/0 rc 0/0 [1661780.686919] LNet: Service thread pid 14116 completed after 1624.23s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). [1661780.686921] LNet: Skipped 121 previous similar messages [1661780.688880] LustreError: 8575:0:(service.c:2128:ptlrpc_server_handle_request()) @@@ Dropping timed-out request from 12345-10.149.6.166@o2ib313: deadline 349:14s ago req@ffff897a45641680 x1669205065911824/t0(0) o101->65f6fa1f-0398-6259-d521-40b99eafcd18@10.149.6.166@o2ib313:512/0 lens 4528/0 e 1 to 0 dl 1592343832 ref 1 fl Interpret:/0/ffffffff rc 0/-1 [1661780.688882] LustreError: 8575:0:(service.c:2128:ptlrpc_server_handle_request()) Skipped 394 previous similar messages [1661781.000334] Lustre: 8549:0:(service.c:2165:ptlrpc_server_handle_request()) Skipped 1849 previous similar messages [1661895.071095] Lustre: MGS: haven't heard from client 539c5359-7351-4775-a139-2d01f142b38d (at 10.151.48.64@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff898d66ffdc00, cur 1592343961 expire 1592343811 last 1592343734 [1662018.980182] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 38 seconds [1662019.014246] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.48.64@o2ib (336): c: 21, oc: 0, rc: 32 [1662025.721803] Lustre: nbp8-MDT0000: Client 65d96295-eabe-2354-0f69-0938c16b8844 (at 10.151.0.206@o2ib) reconnecting [1662025.756159] Lustre: Skipped 8 previous similar messages [1662379.773423] Lustre: MGS: Connection restored to 9d0eb1c2-9f5d-5532-0edc-0512a98cfce2 (at 10.149.1.1@o2ib313) [1662379.773429] Lustre: Skipped 126 previous similar messages [1662755.099069] Lustre: nbp8-MDT0000: haven't heard from client 85aa860f-3f46-a5be-0f22-a9fbc005d43a (at 10.151.48.64@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff898d27740000, cur 1592344821 expire 1592344671 last 1592344594 [1663027.213187] Lustre: MGS: Connection restored to 3e25300e-a165-177a-6fd2-836cedad28b5 (at 10.151.48.66@o2ib) [1663027.213193] Lustre: Skipped 87 previous similar messages [1663630.138656] Lustre: MGS: Connection restored to 8ca99ff2-77ca-34a8-8b58-ce92ead0bbd4 (at 10.141.4.9@o2ib417) [1663630.138660] Lustre: Skipped 129 previous similar messages [1664300.529848] Lustre: MGS: Connection restored to 44ccc1e4-d3dc-201d-5c06-4d4d15ad3c1b (at 10.151.28.154@o2ib) [1664300.529854] Lustre: Skipped 37 previous similar messages [1665027.647771] Lustre: MGS: Connection restored to f7cef802-a306-28ce-0249-7021960d1514 (at 10.151.32.124@o2ib) [1665027.647777] Lustre: Skipped 261 previous similar messages [1665656.661241] Lustre: MGS: Connection restored to 4aae11c3-46a4-5152-ed72-3d436bb2e62c (at 10.149.7.154@o2ib313) [1665656.661247] Lustre: Skipped 287 previous similar messages [1666385.122491] Lustre: MGS: Connection restored to e0a7e21b-c5dd-33bc-f44f-ab0b7434f2ca (at 10.151.36.175@o2ib) [1666385.122496] Lustre: Skipped 451 previous similar messages [1666533.236806] Lustre: nbp8-MDT0000: haven't heard from client 8bcf7ab8-e59c-71b3-4cef-b7e8e01ebd32 (at 10.151.57.130@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a304eadc00, cur 1592348599 expire 1592348449 last 1592348372 [1666638.150641] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1666638.184426] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.57.130@o2ib (331): c: 30, oc: 0, rc: 32 [1667132.739189] Lustre: MGS: Connection restored to 7fcd2184-f9da-50f7-46a6-5e8e8e5fe45d (at 10.149.4.234@o2ib313) [1667132.739195] Lustre: Skipped 129 previous similar messages [1667226.172293] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1667226.206073] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.32.232@o2ib (303): c: 32, oc: 0, rc: 32 [1667238.172739] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1667238.206543] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.1.11@o2ib (299): c: 32, oc: 0, rc: 32 [1667285.174342] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1667285.208130] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.59.204@o2ib (303): c: 32, oc: 0, rc: 32 [1667733.485182] Lustre: MGS: Connection restored to 3e5dd4d8-6bc6-cb66-b1d8-091472a18673 (at 10.151.33.124@o2ib) [1667733.485188] Lustre: Skipped 79 previous similar messages [1668742.609197] Lustre: MGS: Connection restored to 64bc51ad-3275-2eb9-38dd-991e9fdf0a9f (at 10.151.37.183@o2ib) [1668742.609203] Lustre: Skipped 367 previous similar messages [1669365.833389] Lustre: MGS: Connection restored to f049bff7-3ad8-7d91-aada-879c7d132788 (at 10.149.1.145@o2ib313) [1669365.833395] Lustre: Skipped 45 previous similar messages [1670015.492443] Lustre: MGS: Connection restored to aa2b6c34-50a8-b4cc-84b3-8c9f7ed46386 (at 10.151.32.158@o2ib) [1670015.492452] Lustre: Skipped 111 previous similar messages [1670617.642869] Lustre: MGS: Connection restored to de8dcc7d-7f0c-08aa-56e2-2b2c1a95787e (at 10.151.54.149@o2ib) [1670617.642875] Lustre: Skipped 97 previous similar messages [1671272.634696] Lustre: MGS: Connection restored to d60344de-4528-9d6c-a48d-6be9781dc6fd (at 10.151.32.39@o2ib) [1671272.634702] Lustre: Skipped 101 previous similar messages [1671960.492103] Lustre: MGS: Connection restored to a3fd8e6e-277e-a291-1a88-d4e46de5d08c (at 10.149.9.91@o2ib313) [1671960.492109] Lustre: Skipped 127 previous similar messages [1672650.712283] Lustre: MGS: Connection restored to e406ed8d-2bab-f64c-cf4d-c7e4bf4517c9 (at 10.151.12.47@o2ib) [1672650.712289] Lustre: Skipped 209 previous similar messages [1672972.480092] Lustre: MGS: haven't heard from client 3a53359f-65e3-0dc9-9751-9e14aabaa76f (at 10.151.3.129@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897d4b05ac00, cur 1592355038 expire 1592354888 last 1592354811 [1672972.550191] Lustre: Skipped 1 previous similar message [1672991.475257] Lustre: nbp8-MDT0000: haven't heard from client 66ca6019-09a1-9296-034a-5018c94da59a (at 10.151.3.129@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8973ace58400, cur 1592355057 expire 1592354907 last 1592354830 [1672991.547948] Lustre: Skipped 1 previous similar message [1673007.384275] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 34 seconds [1673007.418347] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.3.129@o2ib (241): c: 30, oc: 0, rc: 32 [1673090.388304] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1673090.422091] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.3.194@o2ib (317): c: 31, oc: 0, rc: 32 [1673265.080228] Lustre: MGS: Connection restored to 1a323d68-73bd-46da-dce1-bc105fd9d4b3 (at 10.151.12.137@o2ib) [1673265.080234] Lustre: Skipped 135 previous similar messages [1673879.935018] Lustre: MGS: Connection restored to 2b6f9495-7209-7ebb-e328-4088f0c416d7 (at 10.149.14.246@o2ib313) [1673879.935024] Lustre: Skipped 151 previous similar messages [1674879.750756] Lustre: MGS: Connection restored to 78f22f6c-b13a-d96a-afa1-66d75553e939 (at 10.151.4.40@o2ib) [1674879.750762] Lustre: Skipped 17 previous similar messages [1675492.625803] Lustre: MGS: Connection restored to 5f8e3240-b861-e061-bee7-48d0101208e2 (at 10.151.9.59@o2ib) [1675492.625808] Lustre: Skipped 11 previous similar messages [1676099.055876] Lustre: MGS: Connection restored to fb960295-1858-c86f-c9b1-e47cb007efd0 (at 10.151.28.105@o2ib) [1676099.055882] Lustre: Skipped 143 previous similar messages [1676781.423259] Lustre: MGS: Connection restored to 3eb4df1f-0daf-39a7-a6df-2fed1991f8c2 (at 10.151.28.109@o2ib) [1676781.423265] Lustre: Skipped 63 previous similar messages [1677532.516355] Lustre: MGS: Connection restored to 67f7f895-fb06-d917-3d30-89f4df0c823b (at 10.151.38.129@o2ib) [1677532.516361] Lustre: Skipped 145 previous similar messages [1678172.774645] Lustre: MGS: Connection restored to 89e2c365-fcb2-5a16-8e07-a8c10bd4f953 (at 10.149.14.5@o2ib313) [1678172.774650] Lustre: Skipped 129 previous similar messages [1678845.016906] Lustre: MGS: Connection restored to 00062ffe-9beb-0c95-4d8a-a286495ea877 (at 10.151.1.94@o2ib) [1678845.016912] Lustre: Skipped 345 previous similar messages [1679491.827713] Lustre: MGS: Connection restored to 064721c5-bfe1-78cb-6715-f8b67995f990 (at 10.149.9.35@o2ib313) [1679491.827719] Lustre: Skipped 731 previous similar messages [1680314.069348] Lustre: MGS: Connection restored to a3b24be3-e3de-90e9-0616-fb82a84e3a5b (at 10.151.51.34@o2ib) [1680314.069353] Lustre: Skipped 87 previous similar messages [1680439.438943] LustreError: 11-0: nbp8-OST00f3-osc-MDT0000: operation ost_statfs to node 10.151.27.71@o2ib failed: rc = -107 [1680439.439082] Lustre: nbp8-OST0023-osc-MDT0000: Connection to nbp8-OST0023 (at 10.151.27.71@o2ib) was lost; in progress operations using this service will wait for recovery to complete [1680439.439088] Lustre: Skipped 11 previous similar messages [1680439.547788] LustreError: Skipped 2 previous similar messages [1680439.589995] LustreError: 167-0: nbp8-OST010d-osc-MDT0000: This client was evicted by nbp8-OST010d; in progress operations using this service will fail. [1680939.357006] Lustre: MGS: Connection restored to e9a8a34e-f160-6bd6-f848-7bfd7429269d (at 10.149.2.72@o2ib313) [1680939.357011] Lustre: Skipped 74 previous similar messages [1681517.728932] LNet: 3587:0:(o2iblnd_cb.c:2602:kiblnd_passive_connect()) Conn stale 10.151.52.182@o2ib version 12/12 incarnation 1591824193735513/1592363512171550 [1681566.697786] Lustre: MGS: Connection restored to 94e6b241-6e2e-9ab0-aa93-92260fe6f23e (at 10.149.1.219@o2ib313) [1681566.697792] Lustre: Skipped 51 previous similar messages [1682167.372275] Lustre: MGS: Connection restored to ed6580ba-dcd5-e426-5507-e68e87ae8288 (at 10.141.7.23@o2ib417) [1682167.372281] Lustre: Skipped 479 previous similar messages [1682606.735949] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [1682606.769728] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.32.86@o2ib (303): c: 32, oc: 0, rc: 32 [1682781.777083] Lustre: MGS: Connection restored to 1cbdb607-eb00-feb8-cc7d-dfa4ef3f3ac9 (at 10.149.2.21@o2ib313) [1682781.777089] Lustre: Skipped 97 previous similar messages [1683513.563892] Lustre: MGS: Connection restored to 9eec93b6-c090-5306-656d-7f87f48e7b9b (at 10.151.38.123@o2ib) [1683513.563898] Lustre: Skipped 297 previous similar messages [1684273.593826] Lustre: MGS: Connection restored to 46d6d806-37b0-e7ca-5989-622052ace292 (at 10.151.32.140@o2ib) [1684273.593832] Lustre: Skipped 161 previous similar messages [1684922.586599] Lustre: MGS: Connection restored to 0d25fd87-5e8e-7ef9-a66b-9c62d49f02ac (at 10.151.35.26@o2ib) [1684922.586605] Lustre: Skipped 167 previous similar messages [1685579.982967] Lustre: MGS: Connection restored to f74a66f9-df4e-a310-9bf3-728ad0672516 (at 10.151.33.119@o2ib) [1685579.982972] Lustre: Skipped 167 previous similar messages [1686229.219295] Lustre: MGS: Connection restored to d60344de-4528-9d6c-a48d-6be9781dc6fd (at 10.151.32.39@o2ib) [1686229.219300] Lustre: Skipped 111 previous similar messages [1686969.105426] Lustre: MGS: Connection restored to 4c08829f-58b3-6900-39b9-b1a10c734de6 (at 10.151.28.145@o2ib) [1686969.105432] Lustre: Skipped 55 previous similar messages [1687688.126263] Lustre: MGS: Connection restored to 1288088f-168d-a1c7-ced4-7f8d94ba61c2 (at 10.141.2.118@o2ib417) [1687688.126268] Lustre: Skipped 11 previous similar messages [1688337.516095] Lustre: MGS: Connection restored to df30c234-47fc-75c3-2862-6b5920be9504 (at 10.151.12.97@o2ib) [1688337.516101] Lustre: Skipped 77 previous similar messages [1689190.907244] Lustre: MGS: Connection restored to addca060-27d6-54d9-48fc-8278e4d906f5 (at 10.149.10.23@o2ib313) [1689190.907248] Lustre: Skipped 175 previous similar messages [1689863.637321] Lustre: MGS: Connection restored to 2b6f9495-7209-7ebb-e328-4088f0c416d7 (at 10.149.14.246@o2ib313) [1689863.637326] Lustre: Skipped 15 previous similar messages [1690616.427484] Lustre: MGS: Connection restored to cbb4bb61-73ae-6e51-0006-90c03c62126b (at 10.151.28.77@o2ib) [1690616.427491] Lustre: Skipped 181 previous similar messages [1691570.564842] Lustre: MGS: Connection restored to 737c101f-ff18-11bc-3935-440aa82f8a1a (at 10.151.44.34@o2ib) [1691570.564848] Lustre: Skipped 15 previous similar messages [1692285.285113] Lustre: MGS: Connection restored to 2ec8060f-6e0b-63ce-f183-f0b409adfa2c (at 10.151.9.45@o2ib) [1692285.285119] Lustre: Skipped 15 previous similar messages [1692916.966737] Lustre: MGS: Connection restored to 5850ef56-bacd-1051-77b6-cc5ecacc103a (at 10.149.15.107@o2ib313) [1692916.966742] Lustre: Skipped 19 previous similar messages [1693649.612725] Lustre: MGS: Connection restored to bda2718b-1d41-d1dd-d72c-09cee1d41fe4 (at 10.151.55.135@o2ib) [1693649.612731] Lustre: Skipped 463 previous similar messages [1693711.238420] Lustre: nbp8-MDT0000: haven't heard from client cb3dff33-ebd4-8391-c26a-d4db9d3c4788 (at 10.149.12.57@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff898c3e2d8000, cur 1592375776 expire 1592375626 last 1592375549 [1693711.311990] Lustre: Skipped 1 previous similar message [1693729.237316] Lustre: MGS: haven't heard from client 727a3156-b83f-2a04-2cfb-6e5510ef0ebf (at 10.149.12.57@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff897f2c617400, cur 1592375794 expire 1592375644 last 1592375567 [1694316.500667] Lustre: MGS: Connection restored to 4bbc1fde-60c9-b48b-22fc-f2c4631d69b7 (at 10.151.55.45@o2ib) [1694316.500672] Lustre: Skipped 69 previous similar messages [1694984.703846] Lustre: MGS: Connection restored to 35bee4af-bdfa-8f5b-fc56-fe831377de89 (at 10.151.55.93@o2ib) [1694984.703852] Lustre: Skipped 613 previous similar messages [1695780.708583] Lustre: MGS: Connection restored to c01a53a3-bde8-2111-ccb0-8d13eafb77d9 (at 10.149.1.80@o2ib313) [1695780.708588] Lustre: Skipped 17 previous similar messages [1696420.859382] Lustre: MGS: Connection restored to 9ca20b9f-cfa3-7c87-ad34-10d3171fe6ae (at 10.151.51.11@o2ib) [1696420.859388] Lustre: Skipped 321 previous similar messages [1696457.636540] Process accounting resumed [1697995.301918] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1697995.335700] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.36.43@o2ib (299): c: 32, oc: 0, rc: 32 [1697996.301953] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1697996.335735] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [1697996.369222] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.36.45@o2ib (304): c: 32, oc: 0, rc: 32 [1697996.410141] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [1697998.301907] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1697998.335685] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 3 previous similar messages [1697998.369470] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.36.149@o2ib (281): c: 32, oc: 0, rc: 32 [1697998.410671] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 3 previous similar messages [1698001.302007] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1698001.335783] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 2 previous similar messages [1698001.369562] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.36.54@o2ib (304): c: 32, oc: 0, rc: 32 [1698001.410483] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 2 previous similar messages [1698056.304065] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1698056.337857] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 2 previous similar messages [1698056.371638] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.36.164@o2ib (304): c: 32, oc: 0, rc: 32 [1698056.412853] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 2 previous similar messages [1698290.451271] Lustre: MGS: Connection restored to 15ed3306-ed93-d6ab-ebbe-87098322fe69 (at 10.151.47.74@o2ib) [1698290.451277] Lustre: Skipped 95 previous similar messages [1698822.059404] Lustre: MGS: Connection restored to 1679909c-f283-613d-4ace-d95fceddd9fd (at 10.151.51.208@o2ib) [1698822.059409] Lustre: Skipped 7 previous similar messages [1698981.337897] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1698981.371680] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Skipped 1 previous similar message [1698981.405167] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.33.156@o2ib (300): c: 32, oc: 0, rc: 32 [1698981.446368] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Skipped 1 previous similar message [1699104.020353] Lustre: MGS: Connection restored to 754fd4ad-250f-da52-8c98-5066a1ca2eb2 (at 10.149.11.178@o2ib313) [1699104.020357] Lustre: Skipped 27 previous similar messages [1699518.330263] Lustre: MGS: Connection restored to 11aec4d9-9dae-f05d-e04c-d3e743cc22f1 (at 10.151.54.23@o2ib) [1699518.330268] Lustre: Skipped 121 previous similar messages [1699518.536411] LNet: 69410:0:(o2iblnd_cb.c:2602:kiblnd_passive_connect()) Conn stale 10.151.54.37@o2ib version 12/12 incarnation 1591802588701684/1592381488072810 [1700197.849901] Lustre: MGS: Connection restored to 21a8d1f3-f7e5-1758-af01-c28781a6c765 (at 10.151.54.131@o2ib) [1700197.849908] Lustre: Skipped 77 previous similar messages [1700598.397103] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1700598.430895] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.35.151@o2ib (287): c: 32, oc: 0, rc: 32 [1700709.401246] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1700709.435040] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.36.17@o2ib (303): c: 32, oc: 0, rc: 32 [1700916.264740] Lustre: MGS: Connection restored to c21e97ae-e6b7-a97c-1e34-b80025cba84a (at 10.149.5.129@o2ib313) [1700916.264745] Lustre: Skipped 3 previous similar messages [1701548.117856] Lustre: MGS: Connection restored to 9063fab8-d512-b310-854b-1cf8d7028fd3 (at 10.149.14.1@o2ib313) [1701548.117861] Lustre: Skipped 751 previous similar messages [1701572.433720] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 2 seconds [1701572.467513] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.36.126@o2ib (285): c: 32, oc: 0, rc: 32 [1701629.434815] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1701629.468592] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.36.139@o2ib (303): c: 32, oc: 0, rc: 32 [1702107.452410] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1702107.486183] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.32.100@o2ib (286): c: 32, oc: 0, rc: 32 [1702173.454711] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1702173.488498] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.32.29@o2ib (302): c: 32, oc: 0, rc: 32 [1702699.072064] Lustre: MGS: Connection restored to 85222724-004f-61a4-5c14-81d3db44c5ca (at 10.151.3.39@o2ib) [1702699.072070] Lustre: Skipped 271 previous similar messages [1703403.944624] Lustre: MGS: Connection restored to 989212b1-00a4-50ab-2523-95d4e0ef98da (at 10.151.37.165@o2ib) [1703403.944630] Lustre: Skipped 661 previous similar messages [1704180.879177] Lustre: MGS: Connection restored to 9f4a12b6-7001-540f-b1c7-49b7a4c77bae (at 10.151.50.46@o2ib) [1704180.879183] Lustre: Skipped 61 previous similar messages [1704219.529615] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1704219.563403] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.2.187@o2ib (282): c: 32, oc: 0, rc: 32 [1704981.486093] Lustre: MGS: Connection restored to f1128db7-a1b5-2e57-ba56-92a2e025b867 (at 10.151.38.56@o2ib) [1704981.486098] Lustre: Skipped 5 previous similar messages [1705711.298356] Lustre: MGS: Connection restored to b9f94f6d-e7a7-7be7-b232-46efc4c33024 (at 10.151.52.92@o2ib) [1705711.298361] Lustre: Skipped 1 previous similar message [1706382.180556] Lustre: MGS: Connection restored to 064721c5-bfe1-78cb-6715-f8b67995f990 (at 10.149.9.35@o2ib313) [1706382.180561] Lustre: Skipped 66 previous similar messages [1707856.433666] Lustre: MGS: Connection restored to 3f03c423-1840-ea18-b183-5810f0854488 (at 10.149.15.123@o2ib313) [1707856.433671] Lustre: Skipped 136 previous similar messages [1707947.327187] Lustre: MGS: Connection restored to da34485e-ac06-af04-cb21-1e65c256a8a7 (at 10.149.9.189@o2ib313) [1707947.327193] Lustre: Skipped 59 previous similar messages [1708198.584685] Lustre: MGS: Connection restored to cebc4208-ab17-f2aa-b56a-685ad0e97a47 (at 10.149.11.50@o2ib313) [1708198.584691] Lustre: Skipped 59 previous similar messages [1708637.988796] Lustre: MGS: Connection restored to ec3ab769-223f-1867-089c-1469d913821e (at 10.151.28.59@o2ib) [1708637.988802] Lustre: Skipped 157 previous similar messages [1709516.073036] Lustre: MGS: Connection restored to 5e2d78f6-e5a2-32de-9ea8-06fd724c24b1 (at 10.149.8.65@o2ib313) [1709516.073042] Lustre: Skipped 483 previous similar messages [1710266.752189] Lustre: MGS: Connection restored to ea0e1eaf-c60c-b1b6-0cf6-5e46c5c4e68d (at 10.151.43.151@o2ib) [1710266.752195] Lustre: Skipped 259 previous similar messages [1710869.215964] Lustre: MGS: Connection restored to 841cf1ab-782c-6bfc-ed2f-baacffbb4609 (at 10.149.9.1@o2ib313) [1710869.215970] Lustre: Skipped 117 previous similar messages [1711527.918373] Lustre: MGS: Connection restored to 3f03c423-1840-ea18-b183-5810f0854488 (at 10.149.15.123@o2ib313) [1711527.918379] Lustre: Skipped 283 previous similar messages [1712258.731608] Lustre: MGS: Connection restored to 021360a6-5521-c8d4-5c13-166bdfe52cbe (at 10.149.14.17@o2ib313) [1712258.731614] Lustre: Skipped 123 previous similar messages [1713515.267083] Lustre: MGS: Connection restored to a150e7a1-c1c5-59f5-2edb-9ec578dcb214 (at 10.141.2.55@o2ib417) [1713515.267089] Lustre: Skipped 7 previous similar messages [1713659.458153] Lustre: MGS: Connection restored to 1ad03996-bd4d-dc9d-8702-4fa5a72a887b (at 10.149.3.123@o2ib313) [1713659.458159] Lustre: Skipped 103 previous similar messages [1713872.422569] Lustre: MGS: Connection restored to 4a6c80ae-ff59-fb66-d8f2-e34280309352 (at 10.149.3.69@o2ib313) [1713872.422575] Lustre: Skipped 1 previous similar message [1714679.187617] Lustre: MGS: Connection restored to a041e8bc-df77-f027-14ae-2d51c857c6e8 (at 10.151.33.200@o2ib) [1714679.187622] Lustre: Skipped 77 previous similar messages [1715332.556456] Lustre: MGS: Connection restored to 7a3f88cb-67f1-e00d-64d8-28fcdb6e9ff3 (at 10.151.32.86@o2ib) [1715332.556461] Lustre: Skipped 307 previous similar messages [1715948.553456] Lustre: MGS: Connection restored to 47fc7a36-f569-c415-5345-985f9af11063 (at 10.149.14.14@o2ib313) [1715948.553461] Lustre: Skipped 301 previous similar messages [1716576.612516] Lustre: MGS: Connection restored to caeb7583-63d5-2bad-2d03-7f3205fab2f3 (at 10.149.14.191@o2ib313) [1716576.612521] Lustre: Skipped 127 previous similar messages [1717270.649929] Lustre: MGS: Connection restored to 9570049d-2c13-9daf-a0f6-650a73255625 (at 10.149.9.105@o2ib313) [1717270.649934] Lustre: Skipped 109 previous similar messages [1718344.462907] Lustre: MGS: Connection restored to 1ad03996-bd4d-dc9d-8702-4fa5a72a887b (at 10.149.3.123@o2ib313) [1718344.462913] Lustre: Skipped 127 previous similar messages [1718947.070688] Lustre: MGS: Connection restored to 5d27c3ee-9ac6-d726-2e49-843e88e1f3e6 (at 10.151.35.186@o2ib) [1718947.070693] Lustre: Skipped 71 previous similar messages [1719672.549273] Lustre: MGS: Connection restored to e0f50835-2494-502f-06a0-cde678c617b6 (at 10.141.5.148@o2ib417) [1719672.549279] Lustre: Skipped 91 previous similar messages [1720588.938605] Lustre: MGS: Connection restored to 06ca9284-da8c-feaf-8520-b650ed3d379a (at 10.151.8.35@o2ib) [1720588.938611] Lustre: Skipped 21 previous similar messages [1721221.426037] Lustre: MGS: Connection restored to 3f83b21b-03e5-a508-d946-828a0828fac2 (at 10.151.36.139@o2ib) [1721221.426043] Lustre: Skipped 421 previous similar messages [1722021.685190] Lustre: MGS: Connection restored to edaba6e3-78f8-144e-e9db-c09208e37bb9 (at 10.151.30.236@o2ib) [1722021.685196] Lustre: Skipped 39 previous similar messages [1722758.624536] Lustre: MGS: Connection restored to e302294f-18ad-f68b-f18f-317eeaa1ea47 (at 10.141.6.70@o2ib417) [1722758.624542] Lustre: Skipped 345 previous similar messages [1723372.175916] Lustre: MGS: Connection restored to f5b799e8-b43b-41d2-8f60-25a01c95d4aa (at 10.151.36.87@o2ib) [1723372.175922] Lustre: Skipped 95 previous similar messages [1723998.518885] Lustre: MGS: Connection restored to 40e791bf-6a5f-b363-87d9-81f1842abf17 (at 10.149.15.98@o2ib313) [1723998.518890] Lustre: Skipped 71 previous similar messages [1724638.157814] Lustre: MGS: Connection restored to 2011d8eb-0d92-9f56-8746-8c659f519f86 (at 10.151.29.137@o2ib) [1724638.157819] Lustre: Skipped 1103 previous similar messages [1725244.227365] Lustre: MGS: Connection restored to 2c3cf03c-a5e4-3604-be1d-a5148cb632d2 (at 10.151.32.145@o2ib) [1725244.227371] Lustre: Skipped 153 previous similar messages [1725958.780847] Lustre: MGS: Connection restored to d4d3c562-fd68-3735-e4bd-3a7ba4004370 (at 10.151.33.27@o2ib) [1725958.780852] Lustre: Skipped 101 previous similar messages [1726634.229555] Lustre: MGS: Connection restored to 9fd728a5-912f-4d08-ead5-8e763e526542 (at 10.151.36.174@o2ib) [1726634.229561] Lustre: Skipped 309 previous similar messages [1727238.531746] Lustre: MGS: Connection restored to 4dcbae0d-e3c2-58e0-62b6-3c9f6d3f7e41 (at 10.151.0.77@o2ib) [1727238.531752] Lustre: Skipped 335 previous similar messages [1727847.945182] Lustre: MGS: Connection restored to 9370f8c3-9f59-68f4-782e-8b7d544b42ec (at 10.149.9.201@o2ib313) [1727847.945188] Lustre: Skipped 103 previous similar messages [1728647.381202] Lustre: MGS: Connection restored to eb95c421-6a2f-cc81-4928-4dc43feeea69 (at 10.151.39.70@o2ib) [1728647.381207] Lustre: Skipped 127 previous similar messages [1729399.390864] Lustre: MGS: Connection restored to d34a7f1e-aabe-16b7-559c-465c7c0a38c6 (at 10.151.28.208@o2ib) [1729399.390869] Lustre: Skipped 37 previous similar messages [1730009.244599] Lustre: MGS: Connection restored to 66fae42f-8bef-6f1b-6967-4279fb40aec1 (at 10.151.56.131@o2ib) [1730009.244606] Lustre: Skipped 69 previous similar messages [1730250.486811] LNetError: 4812:0:(o2iblnd_cb.c:3351:kiblnd_check_txs_locked()) Timed out tx: active_txs, 1 seconds [1730250.520605] LNetError: 4812:0:(o2iblnd_cb.c:3426:kiblnd_check_conns()) Timed out RDMA with 10.151.30.33@o2ib (299): c: 32, oc: 0, rc: 32 [1730650.761076] Lustre: MGS: Connection restored to 13cee315-7090-62b7-1a51-871b4aa1faa4 (at 10.151.32.193@o2ib) [1730650.761082] Lustre: Skipped 335 previous similar messages [1730835.599762] Lustre: MGS: haven't heard from client 1e932b72-a3dd-f47f-da45-47a804001c06 (at 10.151.27.18@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a3e6b63c00, cur 1592412899 expire 1592412749 last 1592412672 [1730852.604571] Lustre: nbp8-MDT0000: haven't heard from client c7b34585-cc74-1c2c-40fe-9075ac8b5089 (at 10.151.27.18@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89791d20a800, cur 1592412916 expire 1592412766 last 1592412689 [1731572.522112] Lustre: MGS: Connection restored to 2b506c31-eb65-784f-6409-3dc96b4dc60e (at 10.151.1.33@o2ib) [1731572.522118] Lustre: Skipped 313 previous similar messages [1732599.578090] Lustre: MGS: Connection restored to ecce0121-d1eb-dc13-6ad3-1e09eee96902 (at 10.151.56.123@o2ib) [1732599.578095] Lustre: Skipped 47 previous similar messages [1733156.398461] LNet: 3587:0:(o2iblnd_cb.c:2602:kiblnd_passive_connect()) Conn stale 10.151.27.18@o2ib version 12/12 incarnation 1591240360579877/1592415213938152 [1733545.696711] Lustre: nbp8-MDT0000: haven't heard from client 5997e6f3-c410-a94e-9c24-a0da28d411de (at 10.149.5.142@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a4071b0000, cur 1592415609 expire 1592415459 last 1592415382 [1733597.282981] Lustre: 7282:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (30/30), not sending early reply req@ffff898dedb43180 x1669086539350432/t0(0) o101->66395ede-9692-99a5-5d3d-e0ac4b5112e8@10.149.8.10@o2ib313:645/0 lens 576/3264 e 0 to 0 dl 1592415690 ref 2 fl Interpret:/0/0 rc 0/0 [1733597.380262] Lustre: 7282:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 742 previous similar messages [1733621.703481] Lustre: nbp8-MDT0000: haven't heard from client 11340999-b905-b7d6-f2d7-9285f417ce91 (at 10.151.36.73@o2ib) in 182 seconds. I think it's dead, and I am evicting it. exp ffff89a218b9b400, cur 1592415685 expire 1592415535 last 1592415503 [1733621.776161] Lustre: Skipped 4 previous similar messages [1733637.364442] Lustre: 14060:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (30/30), not sending early reply req@ffff8991f5a8bf00 x1669049263033280/t0(0) o39->5997e6f3-c410-a94e-9c24-a0da28d411de@10.149.5.142@o2ib313:685/0 lens 224/0 e 0 to 0 dl 1592415730 ref 2 fl New:/0/ffffffff rc 0/-1 [1733637.461725] Lustre: 14060:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 856 previous similar messages [1733697.704079] Lustre: nbp8-MDT0000: haven't heard from client b76468c7-3989-b05d-9305-643cf2f5be62 (at 10.149.4.58@o2ib313) in 220 seconds. I think it's dead, and I am evicting it. exp ffff898f3f7fa800, cur 1592415761 expire 1592415611 last 1592415541 [1733712.535190] Lustre: 14060:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (30/30), not sending early reply req@ffff8991bd70d580 x1669163757377968/t0(0) o101->eac32f30-d800-3921-5c6f-9013827f1450@10.151.37.27@o2ib:5/0 lens 4512/0 e 0 to 0 dl 1592415805 ref 2 fl New:/0/ffffffff rc 0/-1 [1733712.631614] Lustre: 14060:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 655 previous similar messages [1733773.705474] Lustre: nbp8-MDT0000: haven't heard from client 83812c74-19f8-4e3f-67fc-e131af375fcb (at 10.151.23.129@o2ib) in 211 seconds. I think it's dead, and I am evicting it. exp ffff89904aac7c00, cur 1592415837 expire 1592415687 last 1592415626 [1733773.778449] Lustre: Skipped 37 previous similar messages [1733830.763535] LNet: Service thread pid 8583 was inactive for 550.95s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [1733830.819635] LNet: Skipped 3 previous similar messages [1733830.836820] Pid: 8583, comm: mdt01_038 3.10.0-1062.12.1.el7_lustre2124.x86_64 #1 SMP Tue Mar 17 13:32:19 PDT 2020 [1733830.836824] Call Trace: [1733830.836837] [] ldlm_completion_ast+0x430/0x860 [ptlrpc] [1733830.842104] [] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] [1733830.842129] [] mdt_object_local_lock+0x50b/0xb20 [mdt] [1733830.842140] [] mdt_object_lock_internal+0x70/0x360 [mdt] [1733830.842150] [] mdt_object_lock_try+0x27/0xb0 [mdt] [1733830.842162] [] mdt_getattr_name_lock+0x1277/0x1c30 [mdt] [1733830.842172] [] mdt_intent_getattr+0x2b5/0x480 [mdt] [1733830.842183] [] mdt_intent_policy+0x435/0xd80 [mdt] [1733830.842213] [] ldlm_lock_enqueue+0x356/0xa20 [ptlrpc] [1733830.842246] [] ldlm_handle_enqueue0+0xa56/0x15f0 [ptlrpc] [1733830.842295] [] tgt_enqueue+0x62/0x210 [ptlrpc] [1733830.842337] [] tgt_request_handle+0xada/0x1570 [ptlrpc] [1733830.842371] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [1733830.842404] [] ptlrpc_main+0xb34/0x1470 [ptlrpc] [1733830.842409] [] kthread+0xd1/0xe0 [1733830.842414] [] ret_from_fork_nospec_end+0x0/0x39 [1733830.842438] [] 0xffffffffffffffff [1733830.842441] LustreError: dumping log to /tmp/lustre-log.1592415894.8583 [1733831.659688] LNet: Service thread pid 10531 was inactive for 551.86s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [1733831.716060] Pid: 10531, comm: mdt01_071 3.10.0-1062.12.1.el7_lustre2124.x86_64 #1 SMP Tue Mar 17 13:32:19 PDT 2020 [1733831.716061] Call Trace: [1733831.716074] [] ldlm_completion_ast+0x430/0x860 [ptlrpc] [1733831.739386] [] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] [1733831.739412] [] mdt_object_local_lock+0x50b/0xb20 [mdt] [1733831.739423] [] mdt_object_lock_internal+0x70/0x360 [mdt] [1733831.739433] [] mdt_object_lock_try+0x27/0xb0 [mdt] [1733831.739442] [] mdt_getattr_name_lock+0x1277/0x1c30 [mdt] [1733831.739453] [] mdt_intent_getattr+0x2b5/0x480 [mdt] [1733831.739463] [] mdt_intent_policy+0x435/0xd80 [mdt] [1733831.739489] [] ldlm_lock_enqueue+0x356/0xa20 [ptlrpc] [1733831.739519] [] ldlm_handle_enqueue0+0xa56/0x15f0 [ptlrpc] [1733831.739575] [] tgt_enqueue+0x62/0x210 [ptlrpc] [1733831.739619] [] tgt_request_handle+0xada/0x1570 [ptlrpc] [1733831.739656] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [1733831.739691] [] ptlrpc_main+0xb34/0x1470 [ptlrpc] [1733831.739700] [] kthread+0xd1/0xe0 [1733831.739708] [] ret_from_fork_nospec_end+0x0/0x39 [1733831.739737] [] 0xffffffffffffffff [1733831.739742] Pid: 14089, comm: mdt01_118 3.10.0-1062.12.1.el7_lustre2124.x86_64 #1 SMP Tue Mar 17 13:32:19 PDT 2020 [1733831.739746] Call Trace: [1733831.739777] [] ldlm_completion_ast+0x430/0x860 [ptlrpc] [1733831.739804] [] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] [1733831.739816] [] mdt_object_local_lock+0x50b/0xb20 [mdt] [1733831.739826] [] mdt_object_lock_internal+0x70/0x360 [mdt] [1733831.739836] [] mdt_object_lock_try+0x27/0xb0 [mdt] [1733831.739847] [] mdt_getattr_name_lock+0x1277/0x1c30 [mdt] [1733831.739858] [] mdt_intent_getattr+0x2b5/0x480 [mdt] [1733831.739868] [] mdt_intent_policy+0x435/0xd80 [mdt] [1733831.739893] [] ldlm_lock_enqueue+0x356/0xa20 [ptlrpc] [1733831.739922] [] ldlm_handle_enqueue0+0xa56/0x15f0 [ptlrpc] [1733831.739961] [] tgt_enqueue+0x62/0x210 [ptlrpc] [1733831.739998] [] tgt_request_handle+0xada/0x1570 [ptlrpc] [1733831.740031] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [1733831.740063] [] ptlrpc_main+0xb34/0x1470 [ptlrpc] [1733831.740067] [] kthread+0xd1/0xe0 [1733831.740069] [] ret_from_fork_nospec_end+0x0/0x39 [1733831.740073] [] 0xffffffffffffffff [1733831.740077] Pid: 12642, comm: mdt01_091 3.10.0-1062.12.1.el7_lustre2124.x86_64 #1 SMP Tue Mar 17 13:32:19 PDT 2020 [1733831.740077] Call Trace: [1733831.740108] [] ldlm_completion_ast+0x430/0x860 [ptlrpc] [1733831.740136] [] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] [1733831.740148] [] mdt_object_local_lock+0x50b/0xb20 [mdt] [1733831.740158] [] mdt_object_lock_internal+0x70/0x360 [mdt] [1733831.740167] [] mdt_object_lock_try+0x27/0xb0 [mdt] [1733831.740177] [] mdt_getattr_name_lock+0x1277/0x1c30 [mdt] [1733831.740188] [] mdt_intent_getattr+0x2b5/0x480 [mdt] [1733831.740200] [] mdt_intent_policy+0x435/0xd80 [mdt] [1733831.740224] [] ldlm_lock_enqueue+0x356/0xa20 [ptlrpc] [1733831.740253] [] ldlm_handle_enqueue0+0xa56/0x15f0 [ptlrpc] [1733831.740291] [] tgt_enqueue+0x62/0x210 [ptlrpc] [1733831.740328] [] tgt_request_handle+0xada/0x1570 [ptlrpc] [1733831.740361] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [1733831.740392] [] ptlrpc_main+0xb34/0x1470 [ptlrpc] [1733831.740395] [] kthread+0xd1/0xe0 [1733831.740399] [] ret_from_fork_nospec_end+0x0/0x39 [1733831.740402] [] 0xffffffffffffffff [1733831.740405] Pid: 8616, comm: mdt01_066 3.10.0-1062.12.1.el7_lustre2124.x86_64 #1 SMP Tue Mar 17 13:32:19 PDT 2020 [1733831.740406] Call Trace: [1733831.740436] [] ldlm_completion_ast+0x430/0x860 [ptlrpc] [1733831.740464] [] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] [1733831.740476] [] mdt_object_local_lock+0x50b/0xb20 [mdt] [1733831.740485] [] mdt_object_lock_internal+0x70/0x360 [mdt] [1733831.740495] [] mdt_object_lock_try+0x27/0xb0 [mdt] [1733831.740505] [] mdt_getattr_name_lock+0x1277/0x1c30 [mdt] [1733831.740517] [] mdt_intent_getattr+0x2b5/0x480 [mdt] [1733831.740531] [] mdt_intent_policy+0x435/0xd80 [mdt] [1733831.740555] [] ldlm_lock_enqueue+0x356/0xa20 [ptlrpc] [1733831.740584] [] ldlm_handle_enqueue0+0xa56/0x15f0 [ptlrpc] [1733831.740622] [] tgt_enqueue+0x62/0x210 [ptlrpc] [1733831.740658] [] tgt_request_handle+0xada/0x1570 [ptlrpc] [1733831.740692] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [1733831.740722] [] ptlrpc_main+0xb34/0x1470 [ptlrpc] [1733831.740725] [] kthread+0xd1/0xe0 [1733831.740727] [] ret_from_fork_nospec_end+0x0/0x39 [1733831.740733] [] 0xffffffffffffffff [1733831.740737] LNet: Service thread pid 12627 was inactive for 551.95s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. [1733831.740739] LNet: Skipped 16 previous similar messages [1733832.811609] LNet: Service thread pid 12622 was inactive for 551.96s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. [1733832.811611] LNet: Skipped 54 previous similar messages [1733832.811614] LustreError: dumping log to /tmp/lustre-log.1592415896.12622 [1733849.709090] Lustre: nbp8-MDT0000: haven't heard from client 945826d6-1cc2-c138-9c19-be2127c9e8c7 (at 10.151.23.132@o2ib) in 215 seconds. I think it's dead, and I am evicting it. exp ffff898844f47c00, cur 1592415913 expire 1592415763 last 1592415698 [1733849.782064] Lustre: Skipped 10 previous similar messages [1733862.822678] Lustre: 14060:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (30/30), not sending early reply req@ffff8991e5774800 x1668974497630960/t0(0) o101->40911b12-9f8c-db4a-a3f4-9e466fb4fcfc@10.151.1.216@o2ib:156/0 lens 576/0 e 1 to 0 dl 1592415956 ref 2 fl New:/0/ffffffff rc 0/-1 [1733862.919383] Lustre: 14060:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1036 previous similar messages [1733901.873070] Lustre: nbp8-MDT0000: Client 0ba6a333-f5c0-5356-5aec-6f99b1acdebd (at 10.149.7.39@o2ib313) reconnecting [1733901.907994] Lustre: Skipped 61 previous similar messages [1733901.926080] Lustre: nbp8-MDT0000: Connection restored to cdf18a0f-9a51-219f-ba03-6b67c78c4853 (at 10.149.7.39@o2ib313) [1733901.926087] Lustre: Skipped 47 previous similar messages [1733925.710333] Lustre: nbp8-MDT0000: haven't heard from client fae4f8e2-306d-e882-eb5d-07df5dc200b8 (at 10.151.37.155@o2ib) in 225 seconds. I think it's dead, and I am evicting it. exp ffff8986343c4400, cur 1592415989 expire 1592415839 last 1592415764 [1733925.783298] Lustre: Skipped 2 previous similar messages [1733939.745110] Lustre: nbp8-MDT0000: Client 03ce6a1a-6a6b-2649-dd95-a57298fc7fda (at 10.151.6.36@o2ib) reconnecting [1733939.779171] Lustre: Skipped 160 previous similar messages [1733977.470727] Lustre: MGS: Connection restored to 06ca9284-da8c-feaf-8520-b650ed3d379a (at 10.151.8.35@o2ib) [1733977.470733] Lustre: Skipped 254 previous similar messages [1734001.712802] Lustre: nbp8-MDT0000: haven't heard from client d8f25440-1dfb-1352-9765-08ded09334aa (at 10.151.37.61@o2ib) in 157 seconds. I think it's dead, and I am evicting it. exp ffff8986a620dc00, cur 1592416065 expire 1592415915 last 1592415908 [1734001.785511] Lustre: Skipped 13 previous similar messages [1734016.761994] Lustre: nbp8-MDT0000: Client eac32f30-d800-3921-5c6f-9013827f1450 (at 10.151.37.27@o2ib) reconnecting [1734016.796351] Lustre: Skipped 99 previous similar messages [1734077.718774] Lustre: nbp8-MDT0000: haven't heard from client 62d3fc4d-d60c-4dbf-ae7d-d583c5c419e1 (at 10.151.34.52@o2ib) in 192 seconds. I think it's dead, and I am evicting it. exp ffff89858dee5000, cur 1592416141 expire 1592415991 last 1592415949 [1734104.795516] LustreError: 7286:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1592415343, 825s ago); not entering recovery in server code, just going back to sleep ns: mdt-nbp8-MDT0000_UUID lock: ffff89918b689f80/0xa22cee4044517cdc lrc: 3/1,0 mode: --/PR res: [0x3608b98c3:0x4:0x0].0x0 bits 0x13/0x8 rrc: 334 type: IBT flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 7286 timeout: 0 lvb_type: 0 [1734104.925683] LustreError: 7286:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 19 previous similar messages [1734105.458542] LustreError: 8631:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1592415343, 825s ago); not entering recovery in server code, just going back to sleep ns: mdt-nbp8-MDT0000_UUID lock: ffff898a93669f80/0xa22cee4044522c37 lrc: 3/1,0 mode: --/PR res: [0x3608b98c3:0x4:0x0].0x0 bits 0x13/0x8 rrc: 334 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 8631 timeout: 0 lvb_type: 0 [1734105.588750] LustreError: 8631:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 28 previous similar messages [1734128.009849] Lustre: nbp8-MDT0000: Connection restored to 729ebfc5-544c-a648-aaf1-77306e42f8c5 (at 10.151.36.128@o2ib) [1734128.009854] Lustre: Skipped 239 previous similar messages [1734153.719568] Lustre: nbp8-MDT0000: haven't heard from client edbfca03-3409-4d90-fedf-37f430169f98 (at 10.149.1.79@o2ib313) in 209 seconds. I think it's dead, and I am evicting it. exp ffff898d42289400, cur 1592416217 expire 1592416067 last 1592416008 [1734165.471751] Lustre: 7282:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (30/30), not sending early reply req@ffff8993303d8000 x1669041722258448/t0(0) o39->329cb904-1e19-8b4c-cf6a-6660ff47c5dc@10.151.6.226@o2ib:458/0 lens 224/0 e 0 to 0 dl 1592416258 ref 2 fl New:/0/ffffffff rc 0/-1 [1734165.567891] Lustre: 7282:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 100 previous similar messages [1734168.369067] Lustre: nbp8-MDT0000: Client 40911b12-9f8c-db4a-a3f4-9e466fb4fcfc (at 10.151.1.216@o2ib) reconnecting [1734168.403417] Lustre: Skipped 150 previous similar messages [1734290.153199] LustreError: 8077:0:(service.c:3361:ptlrpc_svcpt_health_check()) mdt: unhealthy - request has been waiting 718s [1734307.739638] Lustre: nbp8-MDT0000: haven't heard from client 214f0134-faba-649a-f177-0a63ec98ae59 (at 10.151.34.143@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89985c23a000, cur 1592416371 expire 1592416221 last 1592416144 [1734307.812609] Lustre: Skipped 10 previous similar messages [1734431.009759] Lustre: nbp8-MDT0000: Connection restored to 33d2242f-c131-e51a-3f9f-11cf79808631 (at 10.149.15.108@o2ib313) [1734431.009764] Lustre: Skipped 95 previous similar messages [1734470.391532] LustreError: 8835:0:(service.c:3361:ptlrpc_svcpt_health_check()) mdt: unhealthy - request has been waiting 898s [1734471.271685] Lustre: nbp8-MDT0000: Client 28a4f7e2-f02c-23ec-27be-833bf3eeab69 (at 10.151.5.25@o2ib) reconnecting [1734471.305755] Lustre: Skipped 35 previous similar messages [1734759.740985] Lustre: nbp8-MDT0000: haven't heard from client 9c63e84a-7b49-4fb8-ed0b-0627921b3a3d (at 10.149.9.91@o2ib313) in 227 seconds. I think it's dead, and I am evicting it. exp ffff8990077c6c00, cur 1592416823 expire 1592416673 last 1592416596 [1734759.814489] Lustre: Skipped 14 previous similar messages [1734769.761857] Lustre: 14060:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (30/30), not sending early reply req@ffff8991b4e39200 x1669115549137312/t0(0) o101->e184e0fb-473c-03a2-5ac3-d3780676a78b@10.149.11.215@o2ib313:308/0 lens 576/0 e 0 to 0 dl 1592416863 ref 2 fl New:/0/ffffffff rc 0/-1 [1734769.859706] Lustre: 14060:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1944 previous similar messages [1735034.916876] Lustre: nbp8-MDT0000: Connection restored to 1c02c475-14e6-44be-d6c6-739117dbd963 (at 10.151.59.213@o2ib) [1735034.916881] Lustre: Skipped 619 previous similar messages [1735075.538183] Lustre: nbp8-MDT0000: Client e184e0fb-473c-03a2-5ac3-d3780676a78b (at 10.149.11.215@o2ib313) reconnecting [1735075.573674] Lustre: Skipped 540 previous similar messages [1735078.042090] LNet: Service thread pid 14081 was inactive for 550.73s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [1735078.098472] LNet: Skipped 3 previous similar messages [1735078.115673] Pid: 14081, comm: mdt00_086 3.10.0-1062.12.1.el7_lustre2124.x86_64 #1 SMP Tue Mar 17 13:32:19 PDT 2020 [1735078.115677] Call Trace: [1735078.115691] [] ldlm_completion_ast+0x430/0x860 [ptlrpc] [1735078.120966] [] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] [1735078.120988] [] mdt_object_local_lock+0x50b/0xb20 [mdt] [1735078.120999] [] mdt_object_lock_internal+0x70/0x360 [mdt] [1735078.121009] [] mdt_object_lock_try+0x27/0xb0 [mdt] [1735078.121018] [] mdt_getattr_name_lock+0x1277/0x1c30 [mdt] [1735078.121029] [] mdt_intent_getattr+0x2b5/0x480 [mdt] [1735078.121039] [] mdt_intent_policy+0x435/0xd80 [mdt] [1735078.121064] [] ldlm_lock_enqueue+0x356/0xa20 [ptlrpc] [1735078.121094] [] ldlm_handle_enqueue0+0xa56/0x15f0 [ptlrpc] [1735078.121146] [] tgt_enqueue+0x62/0x210 [ptlrpc] [1735078.121186] [] tgt_request_handle+0xada/0x1570 [ptlrpc] [1735078.121219] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [1735078.121250] [] ptlrpc_main+0xb34/0x1470 [ptlrpc] [1735078.121256] [] kthread+0xd1/0xe0 [1735078.121259] [] ret_from_fork_nospec_end+0x0/0x39 [1735078.121283] [] 0xffffffffffffffff [1735078.121286] LustreError: dumping log to /tmp/lustre-log.1592417141.14081 [1735078.980762] LNet: Service thread pid 8530 was inactive for 551.67s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [1735079.036848] Pid: 8530, comm: mdt00_022 3.10.0-1062.12.1.el7_lustre2124.x86_64 #1 SMP Tue Mar 17 13:32:19 PDT 2020 [1735079.036850] Call Trace: [1735079.036863] [] ldlm_completion_ast+0x430/0x860 [ptlrpc] [1735079.060163] [] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] [1735079.060191] [] mdt_object_local_lock+0x50b/0xb20 [mdt] [1735079.060202] [] mdt_object_lock_internal+0x70/0x360 [mdt] [1735079.060212] [] mdt_object_lock_try+0x27/0xb0 [mdt] [1735079.060222] [] mdt_getattr_name_lock+0x1277/0x1c30 [mdt] [1735079.060232] [] mdt_intent_getattr+0x2b5/0x480 [mdt] [1735079.060242] [] mdt_intent_policy+0x435/0xd80 [mdt] [1735079.060267] [] ldlm_lock_enqueue+0x356/0xa20 [ptlrpc] [1735079.060299] [] ldlm_handle_enqueue0+0xa56/0x15f0 [ptlrpc] [1735079.060349] [] tgt_enqueue+0x62/0x210 [ptlrpc] [1735079.060389] [] tgt_request_handle+0xada/0x1570 [ptlrpc] [1735079.060422] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [1735079.060454] [] ptlrpc_main+0xb34/0x1470 [ptlrpc] [1735079.060458] [] kthread+0xd1/0xe0 [1735079.060462] [] ret_from_fork_nospec_end+0x0/0x39 [1735079.060487] [] 0xffffffffffffffff [1735190.222015] LustreError: 10113:0:(service.c:3361:ptlrpc_svcpt_health_check()) mdt: unhealthy - request has been waiting 602s [1735352.313155] LustreError: 8530:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1592416590, 825s ago); not entering recovery in server code, just going back to sleep ns: mdt-nbp8-MDT0000_UUID lock: ffff897cc4a15c40/0xa22cee40458d1a86 lrc: 3/1,0 mode: --/PR res: [0x3608b98c3:0x4:0x0].0x0 bits 0x13/0x8 rrc: 339 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 8530 timeout: 0 lvb_type: 0 [1735352.443341] LustreError: 8530:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) Skipped 104 previous similar messages [1735370.456666] LustreError: 10375:0:(service.c:3361:ptlrpc_svcpt_health_check()) mdt: unhealthy - request has been waiting 782s [1735375.028012] Lustre: 14060:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (30/30), not sending early reply req@ffff89942969b180 x1668977129372752/t0(0) o101->1a05bdd0-fad2-f66f-f29d-1d009da267a4@10.151.16.154@o2ib:158/0 lens 576/0 e 2 to 0 dl 1592417468 ref 2 fl New:/2/ffffffff rc 0/-1 [1735375.125006] Lustre: 14060:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1926 previous similar messages [1735435.769792] Lustre: nbp8-MDT0000: haven't heard from client a0eebb83-f273-e9ca-f04a-205407bdf5bd (at 10.151.32.100@o2ib) in 227 seconds. I think it's dead, and I am evicting it. exp ffff89a2e83a3400, cur 1592417499 expire 1592417349 last 1592417272 [1735435.842770] Lustre: Skipped 44 previous similar messages [1735550.695233] LustreError: 10630:0:(service.c:3361:ptlrpc_svcpt_health_check()) mdt: unhealthy - request has been waiting 962s [1735635.943043] Lustre: nbp8-MDT0000: Connection restored to 9e00432c-6d8f-8557-8f0b-1596b413df2b (at 10.149.12.97@o2ib313) [1735635.943048] Lustre: Skipped 658 previous similar messages [1735680.485715] Lustre: nbp8-MDT0000: Client 1a05bdd0-fad2-f66f-f29d-1d009da267a4 (at 10.151.16.154@o2ib) reconnecting [1735680.520383] Lustre: Skipped 571 previous similar messages [1735730.929726] LustreError: 10876:0:(service.c:3361:ptlrpc_svcpt_health_check()) mdt: unhealthy - request has been waiting 1142s [1735910.172387] LustreError: 11244:0:(service.c:3361:ptlrpc_svcpt_health_check()) mdt: unhealthy - request has been waiting 1322s [1735980.296157] Lustre: 14060:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (30/30), not sending early reply req@ffff8992c9e11b00 x1669343847086976/t0(0) o101->8bc31fdc-e04b-e241-4399-c121f5caff47@10.141.6.183@o2ib417:8/0 lens 584/0 e 1 to 0 dl 1592418073 ref 2 fl New:/2/ffffffff rc 0/-1 [1735980.393160] Lustre: 14060:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1218 previous similar messages [1736090.410842] LustreError: 11510:0:(service.c:3361:ptlrpc_svcpt_health_check()) mdt: unhealthy - request has been waiting 1502s [1736240.235044] Lustre: nbp8-MDT0000: Connection restored to 9bde665a-7b50-d628-3a6d-766a2b0deb49 (at 10.151.15.25@o2ib) [1736240.235049] Lustre: Skipped 601 previous similar messages [1736285.041652] Lustre: nbp8-MDT0000: Client 1f465a23-dc5e-ba3f-f1db-394df16ee6c0 (at 10.141.6.158@o2ib417) reconnecting [1736285.076875] Lustre: Skipped 471 previous similar messages [1736325.318737] LNet: Service thread pid 8565 was inactive for 551.51s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [1736325.374836] Pid: 8565, comm: mdt00_039 3.10.0-1062.12.1.el7_lustre2124.x86_64 #1 SMP Tue Mar 17 13:32:19 PDT 2020 [1736325.374838] Call Trace: [1736325.374852] [] ldlm_completion_ast+0x430/0x860 [ptlrpc] [1736325.398161] [] ldlm_cli_enqueue_local+0x231/0x830 [ptlrpc] [1736325.398181] [] mdt_object_local_lock+0x50b/0xb20 [mdt] [1736325.398191] [] mdt_object_lock_internal+0x70/0x360 [mdt] [1736325.398201] [] mdt_object_lock_try+0x27/0xb0 [mdt] [1736325.398210] [] mdt_getattr_name_lock+0x1277/0x1c30 [mdt] [1736325.398221] [] mdt_intent_getattr+0x2b5/0x480 [mdt] [1736325.398231] [] mdt_intent_policy+0x435/0xd80 [mdt] [1736325.398255] [] ldlm_lock_enqueue+0x356/0xa20 [ptlrpc] [1736325.398286] [] ldlm_handle_enqueue0+0xa56/0x15f0 [ptlrpc] [1736325.398331] [] tgt_enqueue+0x62/0x210 [ptlrpc] [1736325.398370] [] tgt_request_handle+0xada/0x1570 [ptlrpc] [1736325.398403] [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [1736325.398434] [] ptlrpc_main+0xb34/0x1470 [ptlrpc] [1736325.398440] [] kthread+0xd1/0xe0 [1736325.398445] [] ret_from_fork_nospec_end+0x0/0x39 [1736325.398470] [] 0xffffffffffffffff [1736325.398471] LustreError: dumping log to /tmp/lustre-log.1592418388.8565 [1736450.848084] LustreError: 12578:0:(service.c:3361:ptlrpc_svcpt_health_check()) mdt: unhealthy - request has been waiting 1862s [1736450.885877] LustreError: 12578:0:(service.c:3361:ptlrpc_svcpt_health_check()) Skipped 1 previous similar message [1736580.496134] Lustre: 14060:0:(service.c:1372:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (30/-125), not sending early reply req@ffff899aa3a1d580 x1669030209379968/t0(0) o101->29e9480a-ada2-cd39-cdb4-881b0782ae58@10.149.8.106@o2ib313:608/0 lens 576/0 e 0 to 0 dl 1592418673 ref 2 fl New:/0/ffffffff rc 0/-1 [1736580.594281] Lustre: 14060:0:(service.c:1372:ptlrpc_at_send_early_reply()) Skipped 1683 previous similar messages [1736598.816763] LustreError: 8565:0:(ldlm_request.c:129:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1592417837, 825s ago); not entering recovery in server code, just going back to sleep ns: mdt-nbp8-MDT0000_UUID lock: ffff89838a43f980/0xa22cee4046a11c08 lrc: 3/1,0 mode: --/PR res: [0x3608b98c3:0x4:0x0].0x0 bits 0x13/0x8 rrc: 342 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 8565 timeout: 0 lvb_type: 0 [1736840.768134] Lustre: MGS: Connection restored to 02eaa079-4544-1890-7c12-58e638416bfb (at 10.151.59.210@o2ib) [1736840.768139] Lustre: Skipped 634 previous similar messages [1736885.794635] Lustre: nbp8-MDT0000: Client 29e9480a-ada2-cd39-cdb4-881b0782ae58 (at 10.149.8.106@o2ib313) reconnecting [1736885.829853] Lustre: Skipped 579 previous similar messages [1736944.416693] SysRq : Trigger a crash [1736944.428771] BUG: unable to handle kernel NULL pointer dereference at (null) [1736944.455120] IP: [] sysrq_handle_crash+0x16/0x20 [1736944.475743] PGD 8000003fa9f90067 PUD 3fabea2067 PMD 0 [1736944.493236] Oops: 0002 [#1] SMP [1736944.504427] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) scsi_transport_iscsi rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx4_ib(OE) ib_uverbs(OE) ib_core(OE) bonding sunrpc dm_mirror dm_region_hash dm_log iTCO_wdt iTCO_vendor_support intel_powerclamp coretemp intel_rapl iosf_mbi kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper mgag200 cryptd ttm pcspkr drm_kms_helper igb syscopyarea sysfillrect sysimgblt fb_sys_fops ptp drm lpc_ich mei_me pps_core i2c_algo_bit joydev mei drm_panel_orientation_quirks i2c_i801 dca wmi ipmi_si ipmi_devintf [1736944.740040] ipmi_msghandler pcc_cpufreq dm_round_robin dm_multipath binfmt_misc tcp_bic ip_tables dm_mod virtio_scsi virtio_ring virtio xfs libcrc32c ext4 mbcache jbd2 isci ahci mpt2sas libsas sd_mod libahci raid_class libata sg scsi_transport_sas lpfc(OE) nvmet_fc nvmet nvme_fc mlx4_core(OE) nvme_fabrics nvme_core scsi_transport_fc scsi_tgt crc_t10dif crct10dif_generic crct10dif_pclmul mlx_compat(OE) crc32c_intel crct10dif_common devlink [last unloaded: i2c_algo_bit] [1736944.873258] CPU: 8 PID: 0 Comm: swapper/8 Kdump: loaded Tainted: G OE ------------ 3.10.0-1062.12.1.el7_lustre2124.x86_64 #1 [1736944.914458] Hardware name: SGI.COM SUMMIT/S2600GZ, BIOS SE5C600.86B.02.03.0003.041920141333 04/19/2014 [1736944.945648] task: ffff89855397e2a0 ti: ffff898553a20000 task.ti: ffff898553a20000 [1736944.970839] RIP: 0010:[] [] sysrq_handle_crash+0x16/0x20 [1736944.999471] RSP: 0018:ffff89a47e403d98 EFLAGS: 00010046 [1736945.017501] RAX: ffffffff85c6ffb0 RBX: ffffffff864e6560 RCX: 0000000000000000 [1736945.041539] RDX: 0000000000000000 RSI: ffff89a47e413898 RDI: 0000000000000063 [1736945.065580] RBP: ffff89a47e403d98 R08: ffffffff868018bc R09: ffffffff868277bb [1736945.089618] R10: 0000000000002b78 R11: 0000000000002b77 R12: 0000000000000063 [1736945.113656] R13: 0000000000000001 R14: 0000000000000006 R15: 0000000000000063 [1736945.137693] FS: 0000000000000000(0000) GS:ffff89a47e400000(0000) knlGS:0000000000000000 [1736945.164876] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [1736945.184345] CR2: 0000000000000000 CR3: 0000003fbd70e000 CR4: 00000000000607e0 [1736945.208406] Call Trace: [1736945.217004] [1736945.223899] [] __handle_sysrq+0x10d/0x170 [1736945.243659] [] handle_sysrq+0x26/0x30 [1736945.261698] [] serial8250_rx_chars+0xf7/0x200 [1736945.282017] [] serial8250_handle_irq.part.13+0x6a/0xb0 [1736945.304913] [] serial8250_default_handle_irq+0x2d/0x30 [1736945.327813] [] serial8250_interrupt+0x68/0xe0 [1736945.348163] [] __handle_irq_event_percpu+0x44/0x1c0 [1736945.370200] [] handle_irq_event_percpu+0x32/0x80 [1736945.391380] [] handle_irq_event+0x3c/0x60 [1736945.410562] [] handle_edge_irq+0x7f/0x150 [1736945.429742] [] handle_irq+0xe4/0x1a0 [1736945.447495] [] ? tick_check_idle+0x8c/0xd0 [1736945.466988] [] do_IRQ+0x4d/0xf0 [1736945.483333] [] common_interrupt+0x16a/0x16a [1736945.503081] [1736945.509961] [] ? cpuidle_enter_state+0x54/0xd0 [1736945.531160] [] cpuidle_idle_call+0xde/0x230 [1736945.550914] [] arch_cpu_idle+0xe/0xc0 [1736945.568972] [] cpu_startup_entry+0x14a/0x1e0 [1736945.589009] [] start_secondary+0x1f7/0x270 [1736945.608475] [] start_cpu+0x5/0x14 [1736945.625363] Code: eb 9b 45 01 f4 45 39 65 34 75 e5 4c 89 ef e8 e2 f7 ff ff eb db 66 66 66 66 90 55 48 89 e5 c7 05 91 77 7d 00 01 00 00 00 0f ae f8 04 25 00 00 00 00 01 5d c3 66 66 66 66 90 55 31 c0 c7 05 0e [1736945.688719] RIP [] sysrq_handle_crash+0x16/0x20 [1736945.709622] RSP [1736945.721647] CR2: 0000000000000000